OCRD Project

Description

This is a demo project for "Optical Character Recognition Digitization" of full text pages. It is designed for use as a Hugging Face Gradio app.

The underlying processing pipeline includes:

Image binarization
Text line segmentation
Text line extraction, filtering, and deskewing
OCR on text lines
Printing recognized text on generated image for visualization

Please note:

The app is optimized for English; other languages (e.g., German) may require OCR model fine-tuning.
When running on CPUs, a pipeline run can take over 10 minutes depending on the input image.
For lengthy waits or if the online app is down, look at the pre-computed examples: https://github.com/pluniak/ocrd/tree/main/data/demo_data
The demo is just a first prototype! OCR performance and computation speed should be optimized.

Usage:

Test the the demo online at https://huggingface.co/spaces/pluniak/ocrd
or follow the steps below to install and run the app on your local computer.

Installation

Install Anaconda if you haven't done yet: https://docs.anaconda.com/free/anaconda/install
Clone the repository, then set up and activate the virtual environment:

git clone https://github.com/pluniak/ocrd.git
cd ocrd
./create_conda_env_linux.sh # Linux
create_conda_env_windows.bat # Windows (using Conda terminal)
conda activate ocrd

Run app locally

After activating the virtual environment, you can run the app locally as a Web Server or inside a Jupyter Notebook.

Web Server

Execute this script from CLI:

python ./src/app.py

Then click on the generated local URL (usually: http://127.0.0.1:7860).

Jupyter Notebook

Open and run this notebook:

./notebooks/app.ipynb

OCRD Pipeline Example

Input and Output Image Generated from Recognized Text

For more examples visit: https://github.com/pluniak/ocrd/tree/main/data/demo_data

Screenshot of App User Interface

Acknowledgements and Attributions

This project makes use of significant components from the following open-source projects:

eynollah: An automated layout analysis tool for historical documents, developed as part of the QURATOR project. The eynollah tool is instrumental in facilitating the preprocessing of document images in this project. For more details on eynollah, visit their GitHub repository: qurator-spk/eynollah. The tool is used under the Apache License 2.0.
Microsoft trocr: I utilize Microsoft's trocr models for optical character recognition tasks. The trocr models are highly effective in recognizing text from a variety of document types. For more information on trocr and its usage, please see Microsoft's trocr repository under the MIT License.

I appreciate the efforts of the developers and the community in providing these high-quality open-source resources.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
data		data
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
create_conda_env_linux.sh		create_conda_env_linux.sh
create_conda_env_windows.bat		create_conda_env_windows.bat
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCRD Project

Description

Installation

Run app locally

Web Server

Jupyter Notebook

OCRD Pipeline Example

Input and Output Image Generated from Recognized Text

Screenshot of App User Interface

Acknowledgements and Attributions

About

Releases

Packages

Languages

License

pluniak/ocrd

Folders and files

Latest commit

History

Repository files navigation

OCRD Project

Description

Installation

Run app locally

Web Server

Jupyter Notebook

OCRD Pipeline Example

Input and Output Image Generated from Recognized Text

Screenshot of App User Interface

Acknowledgements and Attributions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages