CollectInject - a tiny MapReduce implementation in Ruby
Do not use this in production. This project is purely for recreational and educational purposes. It was not written with either performance, stability and general usefullness in mind.
Here's a shot on implementing simple MapReduce library in Ruby. I did it for practice, and thought it could be useful for introducing people to the whole concept of distributed processing.
CollectInject is not really distributed, it simulates this using threads, so each worker is started in separate thread.
Also, CollectInject works with data in-memory, no persistance, not intended for large datasets etc. It offers simple class
reduce methods which you should implement in your own worker class. There's also
Manager class for actually running CollectInject tasks.
- create your own class by inheriting
reduce(take look at examples)
Managerinstance with your class and number of workers
runmanager with your dataset
Generally, check out
examples directory, it should be fairly simple.
You are encouraged to fork and play with CollectInject, fix and improve the code, add your own examples, send me pull requests, questions and ideas etc. Have fun coding as much as I am! :)