Junius Henderson was the first curator of the University of Colorado Museum of Natural History. Between 1905 and 1931, he kept 13 notebooks (1,672 pages in total) detailing his travels across the Southern Rocky Mountains of North America and elsewhere. These notebooks were scanned by the National Snow and Ice Data Center (NSIDC).
WWW::Wikisourcemodule (from the
wikisource2xml.pl 'Index:Name of Index on Wikisource.djvu' > download.xmlto download an XML version of the Wikisource document identified by the provided Index.
wikisource2xml.plshould have been installed to your path
perl concat.pl download.xml > download_concat.txt; this will create a "concat" file which combines multiple pages so that entries are divided by
perl results.pl download.xmlto calculate the per-page statistics for annotations on this page. Remember to use the
--skipcommand line option to skip front matter.
perl results_concat.pl < download_concat.txtwill generate per-entry statistics for annotations. Remember to use the
--skipcommand line option to skip entries which cover front matter.
perl concat2stuff.pl dwc < download_concat.txt > download_dwc.csvto write out a CSV file using DarwinCore headers.
You can use
list_concat.plto generate a list of all annotations detected in XML and "concat" files respectively.
For more details, please read the following blog posts:
Field Note Challenge Part 2: Veni, Vidi, Wiki
Field Notes Challenge Part 3: New Year's Digital Resolutions
Field Notes Challenge Part 4: Help, 'Cause We Need Somebod(y/ies)
Field Notes Challenge Part 4.5: JHFNP, Post 4.5