Skip to content
/ OTR Public
forked from ulikoehler/OTR

Optical table recognition - recognize tables in scan images using OpenCV

License

Notifications You must be signed in to change notification settings

fendaq/OTR

 
 

Repository files navigation

OTR

Optical table recognition - recognize tables in scan images using OpenCV.

OTR uses a raster-based method to recognize tables, even if they have e.g. dashed lines or the page is slightly skewed (such as when scanning a book). OTR can not be used for tables without a visible raster!

Install

Install OpenCV for Python3 e.g. sudo apt-get install python3-opencv in Ubuntu 18.04.

I recommend to install numpy & scipy from apt if you use a deb-based linux system to speed up the dependency install: sudo apt-get install python3-scipy python3-numpy

cv_algorithms is one of the dependencies. See there for some of the algorithms used in OTR in a reusable form.

sudo pip3 install -r requirements.txt

Run

Get a test image, e.g. google for images like "Old naval log table" and select one with a table. Can't share one here due to copyright but if you know a public domain one, please add it via a pull request.

python3 test-otr.py <image filename>

It's currently only a proof of concept. See Algorithm.pdf for details on how it works.

About

Optical table recognition - recognize tables in scan images using OpenCV

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 78.1%
  • Jupyter Notebook 21.9%