Python, OpenCV, Tesseract
sudo apt-get install unzip tesseract-ocr
unzip Soroco-task.zip
cd Soroco-task
pip install -r requirements.txt
python3 text_recognize.py <image_path>
For example, I have used an image ocr_input.png which is already present in the the directory.
Input Image:
python3 text_recognize.py ocr_input.png
This will generate input_image_text.json and input_image_redacted.png in the current directory.
- input_image_text.json : JSON file containing the words and their bounding boxes.
- input_image_redacted.png : Redacted image with the text removed.
Redacted Image: