Skip to content
Parse OCR result files for pagenos, tables of contents, etc.
Python PHP
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
fonts
.gitignore
README
analyze_ocr.php
analyze_ocr.py
color.py
diff_match_patch.py
extract_sorted.py
find_header_footer.py
find_pagenos.py
font.py
iabook.py
interval.py
make_toc.py
rnums.py
tuples.py
visualize.py
windowed_iterator.py

README

Some code for analyzing OCR'ed documents.  It's currently pretty
specific to Internet Archive OCR'd books, but it may be generalizable.

Entry point: analyze_ocr.py - run this against an archive scanned book.

Functionality: find headers/footers, page numbers, tables of contents.
Something went wrong with that request. Please try again.