Project applying big data processing to page correspondence data for versions of Diderot's Éléments de Physiologie
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.classpath
.gitignore
LICENSE
README.md
mayerFragmentarydesElements.pdf
pom.xml

README.md

elements-data

June 14, 2014

Project applying big data processing to page correspondence data for versions of Diderot's Éléments de Physiologie.

The project is incomplete, but represents a draft of processing methods under the map reduce paradigm for the data compiled by Jean Mayer in his "La composition fragmentaire des Éléments de Physiologie", Studies on Voltaire and the 18th Century, (1988).

Under the map-reduce paradigm, the reducers in this code do two things : create a minimal part-of-grammar / POS for a page correspondence data set, and mark up the digital text of the versions in question with XML or HTML range tags, based upon the page correspondence data set. They both utilize the same mapper, which performs the initial distillation of the dataset from a file in HFDS.

This code is being checked in as is (with minimal cleanup) as a record of its initial Hadoop 1.x.x implementation, and short of full integration testing of the mark up functionality. I envision that its map-reduce code will be converted to run upon YARN and be one of many processing techniques in the project, and having initiated "elements-data" as a public project, its development will visit, then, full integration testing of the markup functionality.

Scan of Mayer's article provided under fair use. All rights reserved, Studies on Voltaire and the 18th Century.

Gregory Bringman