Simple Haskell MapReduce Framework

Objective

To establish a library that is dirt-easy to use in processing fairly large amounts of data. It is meant for datasets that are too much to handle for one laptop through traditional means, but is not necesarrily meant to scale to super-massive levels.

The main benefits are:

Work directly with flat files so we can get around loading data into DBs.
Haskell-only so we can easily automate input feeding through several sources/files.
Keeps it simple so the average Haskell hacker can ramp up operations farily easily.

This package is for the practical-minded Haskell data hacker.

v0.1 Concept & Design Choices

Input data is flat-file (CSV-like)
Reduces are idempotent, Finalizer does the final data transformation
Unified feeder + data-mapper (split input into multiple files for multiple mappers)
Unified reducer and finalizer
Simple locking to enable multiple reducers
Base everything on Redis for now; worry about multiple backends later
Keep everything simple and tightly coupled to test out some of the ideas before blowing into polymorphic hell
Redis backend - mr-jobs db to store job info - each job gets 3 dbs: - mapped data - locked data - atomic move during reduce - finalized data
Finalized data extracted into a CSV file

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/Data		src/Data
.ghci		.ghci
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
mapreduce-simple.cabal		mapreduce-simple.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Haskell MapReduce Framework

Objective

v0.1 Concept & Design Choices

About

Releases

Packages

Languages

License

ozataman/simple-mapreduce

Folders and files

Latest commit

History

Repository files navigation

Simple Haskell MapReduce Framework

Objective

v0.1 Concept & Design Choices

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages