HathiTrust Research Center

Code and artifact repository for the HTRC platform

  • Vagrant based development environment for HTRC

    HTML 1 Updated Jul 18, 2018
  • Extracts features (token counts, POS tags, etc.) from a list of HT volumes, to aid in non-consumptive research.

    Scala Updated Jul 6, 2018
  • Scala client for retrieving volumes from the HTRC DataAPI.

    Scala Updated Jul 5, 2018
  • Counts tokens and generates a tag cloud for a given list of HT volume ids

    Scala Updated Jul 5, 2018
  • Tools for performing named entity recognition on a set of HTRC volumes.

    Scala Updated Jul 5, 2018
  • Sample showing how to use running-header identification functionality defined in Scala module

    Java Updated Jul 5, 2018
  • Utility library that can be used for performing header/body/footer identification over a set of pages from a volume.

    Scala Updated Jul 5, 2018
  • Contains object models representing volumes, pages, ids and pairtree items in Scala

    Scala Updated Jul 5, 2018
  • Code that uses Spark to ingest the Extracted Feature JSON files (bundled as SequenceFiles), and stream the necessary files over to an Solr cloud installation

    Java Updated Jul 3, 2018
  • Bash scripts to get packages and set environment variables suitable for running HTRC-Solr-EF-Ingester and HTRC-Solr-EF-Cloud

    Shell Updated Jul 2, 2018
  • 1 Updated Jun 26, 2018
  • Front-end web web interface for searching the Solr ingested Extracted Feature (EF) dataset

    JavaScript 2 GPL-3.0 Updated Jun 21, 2018
  • Set of utility functions and routines that reduce the boilerplate needed to accomplish some common tasks in Scala.

    Scala Updated Jun 19, 2018
  • Python SDK for Data API and Solr API access

    Python 4 3 Updated Jun 18, 2018
  • Library that adds useful error handling and non-serializable object management capabilities to Apache Spark applications.

    Scala Updated Jun 9, 2018
  • Secure environment for text analysis at scale of sensitive digitized content

    Java 2 Updated Jun 7, 2018
  • The setup and configuration files necessary to spin up a cloud-based Solr installation that HTRC-Solr-EF-Ingester can stream its output to for indexing

    Shell Updated May 30, 2018
  • Java Updated May 21, 2018
  • Mirror of Apache Guacamole Server

    C 142 Apache-2.0 Updated May 9, 2018
  • Provides various options for managing HT IDs and the Pairtree structure.

    Scala Updated May 5, 2018
  • Extracts full text from a HT volume stored in Pairtree by concatenating the pages in the correct order, performing optional post-processing to remove hyphenation, empty lines, headers/footers, etc.

    Scala Updated May 5, 2018
  • HTRC job submission and job management module.

    Scala Updated May 4, 2018
  • Maintains a central repository of Dockerfile for various software services required by HTRC

    Java Updated Apr 27, 2018
  • Virtuoso Docker image

    28 MIT Updated Apr 26, 2018
  • Workset, file and job persistence API on top of WSO2 Governance Registry

    Java 1 Updated Apr 21, 2018
  • Tool to backup and restore users, roles, files, and worksets from/to a WSO2 server instance.

    Java Updated Apr 16, 2018
  • This is a repository to provide a space to draft and allow for dialog in the creation of a code of conduct for the HTRC UnCamp and other public related events.

    Unlicense Updated Mar 14, 2018
  • HathiTrust Research Center

    HTML Updated Feb 14, 2018
  • Java Updated Dec 5, 2017
  • Various demos used in Data Capsule or other places

    R Updated Oct 5, 2017