- Scrape PDF from Web
- Extract information of coordinations and descriptions from PDF
- Convert PDF to image object(png)
- Make OCR dataset like PyTorch Dataset.
git clone https://github.com/mzntaka0/ocra.git
cd ocra
python setup.py install
- poppler-utils(pdftohtml)
- Python >= 3.6.2