Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Build Status

Chronicling America OCR debatcher

This program takes paths to .tar.bz2 batches of OCR files from the Chronicling America bulk data downloads. It converts each batch into a CSV file, which you can load into a database or do whatever you like with. It will process the batches concurrently.

Usage:

./chronam-ocr-debatcher [--processes=8] <path/to/a/batch.tar.bz2 ...>

You can download binaries from the releases page.

About

Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database

Resources

License

Packages

No packages published