Download the entire folder
Use cmd
python or python3 path/to/folder/highlighterfinal.py -f path/to/image/or/pdf/file.jpg -l list of words to find(can be left empty,e.g- date pin state will mark these three) -d (optional debug mode..if present,shows all intermediate steps) -o path/to/output/directory/ (where json with co-ordinates will be saved,if not explicitly defined,saves in the highlighter code directory)
python highnew/highlighterfinal.py -f /home/entrophy/Pictures/crm.jpeg -l date pin state -d
To check the field names look into the crfform xml files included in the folder
If the field names or file names have spaces,eg 'customer name',write on terminal like customer\ name
Packages needed
-
Tesseract-ocr
-
Pytesseract
-
Opencv
-
Pdf2image
-
PIL
-
Skimage
Once it runs it will prompt you if you do not have any package.
A json file will be created in the specified output folder with all coordinates of various fields after the execution.The output file will have same name plus index as input file
Also you might have to tinker the tesseract file address in code to the appropriate address in your computer.At present it is /usr/bin/tesseract in line 249 of code.
To locate tesseract use 'which tesseract' or 'locate tesseract' commands on terminal