Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Vagrant based development environment for HTRC
Extracts features (token counts, POS tags, etc.) from a list of HT volumes, to aid in non-consumptive research.
Scala client for retrieving volumes from the HTRC DataAPI.
Counts tokens and generates a tag cloud for a given list of HT volume ids
Tools for performing named entity recognition on a set of HTRC volumes.
Sample showing how to use running-header identification functionality defined in Scala module
Utility library that can be used for performing header/body/footer identification over a set of pages from a volume.
Contains object models representing volumes, pages, ids and pairtree items in Scala
Code that uses Spark to ingest the Extracted Feature JSON files (bundled as SequenceFiles), and stream the necessary files over to an Solr cloud installation
Bash scripts to get packages and set environment variables suitable for running HTRC-Solr-EF-Ingester and HTRC-Solr-EF-Cloud
Front-end web web interface for searching the Solr ingested Extracted Feature (EF) dataset
Set of utility functions and routines that reduce the boilerplate needed to accomplish some common tasks in Scala.
Python SDK for Data API and Solr API access
Library that adds useful error handling and non-serializable object management capabilities to Apache Spark applications.
Secure environment for text analysis at scale of sensitive digitized content
The setup and configuration files necessary to spin up a cloud-based Solr installation that HTRC-Solr-EF-Ingester can stream its output to for indexing
Mirror of Apache Guacamole Server
Provides various options for managing HT IDs and the Pairtree structure.
Extracts full text from a HT volume stored in Pairtree by concatenating the pages in the correct order, performing optional post-processing to remove hyphenation, empty lines, headers/footers, etc.
HTRC job submission and job management module.
Maintains a central repository of Dockerfile for various software services required by HTRC
Virtuoso Docker image
Workset, file and job persistence API on top of WSO2 Governance Registry
Tool to backup and restore users, roles, files, and worksets from/to a WSO2 server instance.
This is a repository to provide a space to draft and allow for dialog in the creation of a code of conduct for the HTRC UnCamp and other public related events.
HathiTrust Research Center
Various demos used in Data Capsule or other places