GitHub - krishnan-mani/mr

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/ncdc/ghcn/daily		data/ncdc/ghcn/daily
m2		m2
words		words
README.txt		README.txt
attributions.txt		attributions.txt
install.README.txt		install.README.txt
map.rb		map.rb
mapReduce.sh		mapReduce.sh
reduce.rb		reduce.rb
words.txt		words.txt

Repository files navigation

(This was first used at a meetup conducted on 27th July 2013. See http://www.meetup.com/expertalks/events/128657552/)

(Also, see install.README.txt)

1) Simulating MapReduce:
cat words.txt | ruby map.rb | sort | ruby reduce.rb

2) Create HDFS locations and copy input files
hadoop fs -mkdir /user/cloudera/data/splitInput
hadoop fs -rmr /user/cloudera/data/output
hadoop fs -put words/* /user/cloudera/data/splitInput/

3) Using MapReduce on Hadoop:
./mapReduce.sh

(Instructions from part 2 meetup. See http://www.meetup.com/expertalks/events/132508222/)

See the data, pig scripts, and awk scripts under m2

# To run pig in local mode, ensure the file is available on the local filesystem at the path specified and use
$ pig -x local pigScript.pig

# To run pig with Hadoop, ensure the file to load is in HDFS, and use:
$ hadoop fs -put dataInputFile
$ pig pigScript.pig

END-OF-FILE