Hadoop Examples

[Sample template] (https://github.com/t-ivanov/HadoopExamples/blob/master/Document_Template.md) that can be used when creating a workload description.
List all available example programs: $yarn jar hadoop-mapreduce-examples.jar
- aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
- aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
- bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
- dbcount: An example job that count the pageview counts from a database.
- distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
- grep: A map/reduce program that counts the matches of a regex in the input.
- join: A job that effects a join over sorted, equally partitioned datasets
- multifilewc: A job that counts words from several files.
- pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
- pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
- randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
- randomwriter: A map/reduce program that writes 10GB of random data per node.
- secondarysort: An example defining a secondary sort to the reduce.
- sort: A map/reduce program that sorts the data written by the random writer.
- [sudoku] (https://github.com/danielOrbegoso/HadoopExamples/blob/master/Sudoku.md): A sudoku solver.
- teragen: Generate data for the terasort
- terasort: Run the terasort
- teravalidate: Checking results of terasort
- [wordcount] (https://github.com/t-ivanov/HadoopExamples/blob/master/Wordcount.md): A map/reduce program that counts the words in the input files.
- wordmean: A map/reduce program that counts the average length of the words in the input files.
- wordmedian: A map/reduce program that counts the median length of the words in the input files.
- wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
List all available tests programs: $yarn jar hadoop-mapreduce-client-jobclient-tests.jar
- TestDFSIO: Distributed i/o benchmark.
- DFSCIOTest: Distributed i/o benchmark of libhdfs.
- DistributedFSCheck: Distributed checkup of the file system consistency.
- JHLogAnalyzer: Job History Log analyzer.
- MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures.
- NNdataGenerator: Generate the data to be used by NNloadGenerator
- NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
- NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
- NNstructureGenerator: Generate the structure to be used by NNdataGenerator
- SliveTest: HDFS Stress Test and Live Data Verification.
- fail: a job that always fails
- filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed).
- largesorter: Large-Sort tester
- loadgen: Generic map/reduce load generator
- mapredtest: A map/reduce test check.
- minicluster: Single process HDFS and MR cluster.
- mrbench: A map/reduce benchmark that can create many small jobs
- nnbench: A benchmark that stresses the namenode.
- sleep: A job that sleeps at each map and reduce task.
- testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
- testfilesystem: A test for FileSystem read/write.
- testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
- testsequencefile: A test for flat files of binary key value pairs.
- testsequencefileinputformat: A test for sequence file input format.
- testtextinputformat: A test for text input format.
- threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
[Hadoop exmaples source code] (https://github.com/apache/hadoop-common/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs)
[Good tutorial on how to run the examples] (http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/)

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Document_Template.md		Document_Template.md
README.md		README.md
TestDFSIO.md		TestDFSIO.md
Wordcount.md		Wordcount.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop Examples

About

Releases

Packages

t-ivanov/HadoopExamples

Folders and files

Latest commit

History

Repository files navigation

Hadoop Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages