Skip to content

jabrcx/fsmr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fsmr is for running a mapreduce algorithm where the inputs are every entity in 
a filesystem hierarchy.  It is a combination of two excellent pieces of 
software -- libcircle/libdftw[1-3] (for the map) and libmrmpi[4] (for the 
reduce).  It is a distributed algorithm using MPI and intended to run on 
clusters with storage that scales well under such applications.

How do dftw and mrmpi complement each other in fsmr?

    * dftw has no built-in way to combine and process output; mrmpi provides 
      the data structures and reduce algorithm for doing so

    * mrmpi's built-in file tree walking is a serial step performed before the 
      map and is not designed to scale to processing hundreds of millions of 
      files like dftw is

    * dftw's map more easily accommodates callbacks on all filesystem entities, 
      including directories -- it has the same interface as ftw(3)

To build libfsmr.so, install libdftw and mrmpi and run `make' here.  See 
examples/fsmr.du_by_owner/example.c for a simple example of how to write a 
map() and reduce() and call fsmr; export TEST_ARGS=/some/path and run 
`make test' in that directory.

[1] http://dl.acm.org/citation.cfm?id=2389114
[2] https://github.com/hpc/libcircle
[3] https://github.com/hpc/libdftw
[4] http://mapreduce.sandia.gov/

About

run a mapreduce on a filesystem hierarchy

Resources

License

Stars

Watchers

Forks

Packages

No packages published