Map Reduce implemented in Lua
lua-mapreduce is a fast and easy MapReduce implementation for lua inspired by other map-reduce implementation and particularly octopy in python.
It doesn't aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there's a good chance that you can make it distributed with just a few small changes.
It uses following lua modules.
- luasocket: tcp client-server connectivity
- copas: Coroutine Oriented Portable Asynchronous Services for Lua
- serialize(included in this project)
- lanes: multithreading library for Lua
- luafilesystem: Used only in the task-file example to list files from the directory. lua-mapreduce client/server doesn't depend on this module
For windows, you can install luaforwindows which includes these modules.
For Linux/Unix/MacOS and Windows: you can use LuaDist
Under a Debian GNU/Linux system you can install the dependencies with: apt-get install lua-logging lua-copas lua-socket lua-filesystem lanes is not yet packaged for Debian, you can apt-get install luarocks and then do luarocks install lanes (as root or with sudo)
lua-mapreduce-server.lua : It is a map-reduce server which receives the connections from clients, sends them task-file and than sends them tasks to perform map/reduce functionality.
lua-mapreduce-client.lua : It connects to the server, receives the task and executes map/reduce functions defines in the task-file
utils/utils.lua : Provides utility functionality
utils/serialize.lua : Provides table serialization functionality
example/word-count-taskfile.lua : Example task-file for counting words from all .txt files in a given directory (it uses the current directory if none is specified). More details on how to create task file is given in word-count example page of wiki.
Start Server: lua-mapreduce-server.lua -t task-file.lua [-s server-ip -p port -l loglevel -a command-line argument to task]
Start Client: lua-mapreduce-client.lua [-s server-ip -p port -l loglevel]
- Add support to handled failed task. currently if client disconnect, the task handled by the client is lost
- Support for multiple client connections based on number of cores available on the computer. Use copas for async
- Ability to send multiple task-files to the server.
- Add more example of task-files
- Add support for filter after reduce is performed
- Possibly integrate with apache-mesos