Skip to content

fabriziomiano/pdf2txt-azure-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDF2TXT using Azure cognitive OCR API

This script converts the PDF files in a given directory to TXT through the Microsoft cognitive OCR API. It requires an active Azure subscription as it needs a subscription key to call their API.

Installation

On Ubuntu create a new Python-3 virtual env and install the packages in requirements.txt.

Usage

Within the virtualenv simply run python main.py --dirpath /path/to/dir