run a mapreduce on a filesystem hierarchy
License
jabrcx/fsmr
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
fsmr is for running a mapreduce algorithm where the inputs are every entity in a filesystem hierarchy. It is a combination of two excellent pieces of software -- libcircle/libdftw[1-3] (for the map) and libmrmpi[4] (for the reduce). It is a distributed algorithm using MPI and intended to run on clusters with storage that scales well under such applications. How do dftw and mrmpi complement each other in fsmr? * dftw has no built-in way to combine and process output; mrmpi provides the data structures and reduce algorithm for doing so * mrmpi's built-in file tree walking is a serial step performed before the map and is not designed to scale to processing hundreds of millions of files like dftw is * dftw's map more easily accommodates callbacks on all filesystem entities, including directories -- it has the same interface as ftw(3) To build libfsmr.so, install libdftw and mrmpi and run `make' here. See examples/fsmr.du_by_owner/example.c for a simple example of how to write a map() and reduce() and call fsmr; export TEST_ARGS=/some/path and run `make test' in that directory. [1] http://dl.acm.org/citation.cfm?id=2389114 [2] https://github.com/hpc/libcircle [3] https://github.com/hpc/libdftw [4] http://mapreduce.sandia.gov/
About
run a mapreduce on a filesystem hierarchy
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published