Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 

elements-data

June 14, 2014

Project applying big data processing to page correspondence data for versions of Diderot's Éléments de Physiologie.

The project is incomplete, but represents a draft of processing methods under the map reduce paradigm for the data compiled by Jean Mayer in his "La composition fragmentaire des Éléments de Physiologie", Studies on Voltaire and the 18th Century, (1988).

Under the map-reduce paradigm, the reducers in this code do two things : create a minimal part-of-grammar / POS for a page correspondence data set, and mark up the digital text of the versions in question with XML or HTML range tags, based upon the page correspondence data set. They both utilize the same mapper, which performs the initial distillation of the dataset from a file in HFDS.

This code is being checked in as is (with minimal cleanup) as a record of its initial Hadoop 1.x.x implementation, and short of full integration testing of the mark up functionality. I envision that its map-reduce code will be converted to run upon YARN and be one of many processing techniques in the project, and having initiated "elements-data" as a public project, its development will visit, then, full integration testing of the markup functionality.

Scan of Mayer's article provided under fair use. All rights reserved, Studies on Voltaire and the 18th Century.

Gregory Bringman

About

Project applying big data processing to page correspondence data for versions of Diderot's Éléments de Physiologie

Resources

License

Releases

No releases published

Packages

No packages published

Languages