OCR stands for Optical Character Recognition. It is a technology used to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera, into editable and searchable data. OCR has a wide range of applications and is used to automate data extraction and to improve the efficiency of data processing in numerous industries.
In this repository, we present our fine-tuned TrOCR model for the text lines dataset from the IAM handwriting database. The IAM is publicly accessible and freely available. This dataset contains a general type of handwritten documents and with the fine-tuned model for it, you can use our implementation to turn documents into machine-readable format.
The purpose of this repository is to suggest a possible fine-tuning for general OCR models.
The TrOCR directory contains several .py files and a configuration file. To run the model:
- Download the TrOCR directory
- Install the requirements.txt file.
- If desired, change the settings of the training through the confing.json file.
- ****************Run the 'train.py'. The model will be saved to a file called 'saved_model' in the directory to which you downloaded the TrOCR directory.
- Run the 'predict.py' file either from the terminal, calling the 'predict' function and enter the file path to an image you would like to convert to machine-readable format.
Tzaji Minuchin
