Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

The Junius Henderson Field Note Project

Who was Junius Henderson?

Junius Henderson was the first curator of the University of Colorado Museum of Natural History. Between 1905 and 1931, he kept 13 notebooks (1,672 pages in total) detailing his travels across the Southern Rocky Mountains of North America and elsewhere. These notebooks were scanned by the National Snow and Ice Data Center (NSIDC).

You can read more about him on Wikipedia; we have uploaded all his notebooks (and some of his photographs) to the Wikimedia Commons.


  1. Install the WWW::Wikisource module (from the WWW-Wikisource directory).

  2. Run 'Index:Name of Index on Wikisource.djvu' > download.xml to download an XML version of the Wikisource document identified by the provided Index. should have been installed to your path

  3. In the scripts directory:

    1. Run perl download.xml > download_concat.txt; this will create a "concat" file which combines multiple pages so that entries are divided by {{new-entry}} tags.

    2. Run perl download.xml to calculate the per-page statistics for annotations on this page. Remember to use the --skip command line option to skip front matter.

    3. Similarly, perl < download_concat.txt will generate per-entry statistics for annotations. Remember to use the --skip command line option to skip entries which cover front matter.

    4. Finally, run perl dwc < download_concat.txt > download_dwc.csv to write out a CSV file using DarwinCore headers.

    5. You can use and to generate a list of all annotations detected in XML and "concat" files respectively.

External links

For more details, please read the following blog posts:


No releases published


No packages published