Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.xml		build.xml
pom.xml		pom.xml

Repository files navigation

ScalaHadoop

This code provides some syntactic sugar on top of Hadoop in order to make it more usable from Scala. Take a look at Examples.scala for more details.

License

Apache License, Version 2.0

Usage

Basic Usage

A basic mapper looks like

object TokenizerMap extends TypedMapper[LongWritable, Text, Text, LongWritable] {
  override def map(k: LongWritable, v: Text, context: ContextType) : Unit =  
    v split " \t" foreach ((word) => context.write(word, 1L))
}

or, you can also write it as

object TokenizerMap1 extends TypedMapper[LongWritable, Text, Text, LongWritable] {
  override def doMap : Unit = v split " |\t" foreach ((word) => context.write(word, 1L))
}

and a reducer

object SumReducer1 extends TypedReducer[Text, LongWritable, Text, LongWritable] {
  override def  doReduce :Unit = context.write(k, (0L /: v) ((total, next) => total+next))
}

Note that implicit conversion is used to convert between LongWritable and longs, as well as Text and Strings. The types of the input and output parameters only need to be stated as the generic specliazers of the class it extends.

These mappers and reducers can be chained together with the --> operator

object WordCount extends ScalaHadoopTool{ 
  def run(args: Array[String]) : Int = {  
    (MapReduceTaskChain.init() -->
     IO.Text[LongWritable, Text](args(0)).input                    -->  
     MapReduceTask.MapReduceTask(TokenizerMap1, SumReducer)         -->
     IO.Text[Text, LongWritable](args(1)).output) execute;
    return 0;
  }
}

Multiple map/reduce

Multiple map/reduce runs can be chained together

object WordsWithSameCount extends ScalaHadoopTool {
  def run(args: Array[String]) : Int = {  
    (MapReduceTaskChain.init() -->
    IO.Text[LongWritable, Text](args(0)).input                    -->  
    MapReduceTask.MapReduceTask(TokenizerMap1, SumReducer)        -->
    MapReduceTask.MapReduceTask(FlipKeyValueMap, WordListReducer) -->
    IO.Text[LongWritable, Text](args(1)).output) execute;
    return 0;
  }
}

Contributor

Alex Simma : Developer of original version of ScalaHadoop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

build.xml

build.xml

pom.xml

pom.xml

Repository files navigation

ScalaHadoop

License

Usage

Basic Usage

Multiple map/reduce

Contributor

About

Releases

Packages

Languages

License

hito-asa/ScalaHadoop

Folders and files

Latest commit

History

Repository files navigation

ScalaHadoop

License

Usage

Basic Usage

Multiple map/reduce

Contributor

About

Resources

License

Stars

Watchers

Forks

Languages