Cloud-Computing

This repository has all of my cloud computing related projects source code.

Hadoop MapReduce
Spark

Hadoop - Map Reduce projects

1. DocWordCount

DocWordCount.java outputs the word count for each distinct word in each file. Output will in the form 'word#####filename count' where '#####' is the delimiter.

Execution :

argument 1 : input directory where files are stored.
argument 2 : output directory.

2. TermFrequency

TermFrequency.java outputs term frequency(TF) for each word in the corpus in the format 'word#####filename TF' where ##### is delimiter

TF(t,d) = No. of times term t appears in document d

TF=1 + log₁₀ (TF(t,d))

Execution :

argument 1 : input directory where files are stored.
argument 2 : output directory.

3. TFIDF

TFIDF.java calculates Term Frequency for each word in corpus(TF) and Inverse Document Frequency(IDF) for each word and then outputs TF-IDF in the format 'word#####filename TFIDF' where ##### is delimiter.

TF=1 + log₁₀(TF(t,d))

IDF= log₁₀ (Total no. of documents / No. of documents containing term t)

Execution :

argument 1 : input directory.
argument 2 : output directory.

4. BasicSearchEngine

Basic query search engine that takes user query and outputs list of documents that matches the query in the format 'filename TFIDFWeightSum' and input to mapper is output of TFIDF.java

Execution :

argument 1 : input directory.
Note: Give the output files' directory of TFIDF.java as input directory.
argument 2 : output directory.

5. PageRank

Given a graph of hyper-links with out-links from one web page to other this calculates page rank and outputs in descending order of the rank.

Execution :

hadoop jar PageRank.jar edu.cloud.prateek.Driver argument_1 argument_2

argument_1 : input directory.
argument_2 : output directory.

6. Linear Regression

Python-Spark program for finding beta co-efficients of linear regression by computing summation form of closed form expression: β^{^}=(X^TX)⁻¹X^TY

Execution :

spark-submit linreg.py argument_1

argument_1 : input file name

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
BasicSearchEngine		BasicSearchEngine
DocWordCount		DocWordCount
Linear Regression		Linear Regression
PageRank		PageRank
TFIDF		TFIDF
TermFrequency		TermFrequency
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

pmahend1/Cloud-Computing

Folders and files

Latest commit

History

Repository files navigation

Cloud-Computing

Hadoop - Map Reduce projects

Execution :

Execution :

Execution :

Execution :

Execution :

Execution :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages