Skip to content

t-ivanov/HadoopExamples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop Examples

  • [Sample template] (https://github.com/t-ivanov/HadoopExamples/blob/master/Document_Template.md) that can be used when creating a workload description.

  • List all available example programs: $yarn jar hadoop-mapreduce-examples.jar

    • aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

    • aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

    • bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

    • dbcount: An example job that count the pageview counts from a database.

    • distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

    • grep: A map/reduce program that counts the matches of a regex in the input.

    • join: A job that effects a join over sorted, equally partitioned datasets

    • multifilewc: A job that counts words from several files.

    • pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

    • pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

    • randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

    • randomwriter: A map/reduce program that writes 10GB of random data per node.

    • secondarysort: An example defining a secondary sort to the reduce.

    • sort: A map/reduce program that sorts the data written by the random writer.

    • [sudoku] (https://github.com/danielOrbegoso/HadoopExamples/blob/master/Sudoku.md): A sudoku solver.

    • teragen: Generate data for the terasort

    • terasort: Run the terasort

    • teravalidate: Checking results of terasort

    • [wordcount] (https://github.com/t-ivanov/HadoopExamples/blob/master/Wordcount.md): A map/reduce program that counts the words in the input files.

    • wordmean: A map/reduce program that counts the average length of the words in the input files.

    • wordmedian: A map/reduce program that counts the median length of the words in the input files.

    • wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

  • List all available tests programs: $yarn jar hadoop-mapreduce-client-jobclient-tests.jar

    • TestDFSIO: Distributed i/o benchmark.

    • DFSCIOTest: Distributed i/o benchmark of libhdfs.

    • DistributedFSCheck: Distributed checkup of the file system consistency.

    • JHLogAnalyzer: Job History Log analyzer.

    • MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures.

    • NNdataGenerator: Generate the data to be used by NNloadGenerator

    • NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR

    • NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job

    • NNstructureGenerator: Generate the structure to be used by NNdataGenerator

    • SliveTest: HDFS Stress Test and Live Data Verification.

    • fail: a job that always fails

    • filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed).

    • largesorter: Large-Sort tester

    • loadgen: Generic map/reduce load generator

    • mapredtest: A map/reduce test check.

    • minicluster: Single process HDFS and MR cluster.

    • mrbench: A map/reduce benchmark that can create many small jobs

    • nnbench: A benchmark that stresses the namenode.

    • sleep: A job that sleeps at each map and reduce task.

    • testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

    • testfilesystem: A test for FileSystem read/write.

    • testmapredsort: A map/reduce program that validates the map-reduce framework's sort.

    • testsequencefile: A test for flat files of binary key value pairs.

    • testsequencefileinputformat: A test for sequence file input format.

    • testtextinputformat: A test for text input format.

    • threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

  • [Hadoop exmaples source code] (https://github.com/apache/hadoop-common/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs)

  • [Good tutorial on how to run the examples] (http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published