New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running results in OCR-D #21
Comments
OCR-D bindings are next on the agenda (well, first, resolving some issues with tensorflow, pixel density and reading order and THEN OCR-D bindings), so I hope to implement this next week. Until then, you need to manually align how OCR-D expects paths. The code I posted was just illustrative, so let me fill in the blanks wrt to path handling. For simplicity's sake, let's just use one image file # Create the workspace
ocrd workspace -d ws1 init
# chdir to the workspace
cd ws1
# create folders for the image and the segmentation
mkdir IMG SEG
# copy the image file from whereever it is stored to IMG (not sure about the path syntax in windows)
cp C:\Path\to\page1.jpg IMG
# Run eynollah and use a relative filename to refer to the image
eynollah -i IMG/page1.jpg -o SEG
# Now register the new files with the workspace (i.e. add them to mets.xml)
ocrd workspace add -G IMG -i IMG_1 -g page1 IMG/page1.jpg
ocrd workspace add -G SEG -i SEG_1 -g page1 SEG/page1.xml
# Now you can run ocrd-tesserocr-recognize on the results. Specify the right input group SEG
ocrd-tesserocr-recognize -P segmentation_level none -P textequiv_level line -I SEG -O OCR
# Now the output PAGE-XML with text recognition will be in the OCR subdirectory But it would be really best to wait for proper OCR-D bindings because this is a lot of effort, error-prone and brittle to do it like this. |
Thank you, for both the instructions and the advice! |
Hi, @kba . I imagine you have plenty going on. I just wanted to make a friendly check-in and see if perhaps the OCR-D bindings are in place. |
Not yet, I was flummoxed by an issue with the reading order in my refactoring, solved yesterday and now back on the bindings. I'll update this issue once there is something testable. Thanks for checking in. |
I decided to go ahead and take a stab at using your illustration using paths "with the blanks filled in" for me, above. I got this:
Do you happen to understand what is going wrong here? |
By eynollah you need to provide the directory of models by -m option. |
Got it. Enthused by what I'm seeing with my first results! Thank you. |
Happy to hear that :) |
I think this issue can be closed, now that OCR-D support is in. |
Indeed! |
Wonderful! Thank you! |
(How do I update OCR-D in order to use eynollah there now? Is it part of ocrd_all?) |
It is part of the just-released v2021-04-25 of ocrd_all. Docker images are still building but should be deployed in a few hours. If you want to update your standalone eynollah installation: |
Wonderful. Thank you. I decided I would get rid of my standalone instance and just use OCR-D. In my ocrd environment (in Ubuntu 18.04), I ran:
(I don't know if this was overkill or not. But it didn't take long.) Then once I tested by running
Thanks again for everyone's assistance. |
Hello again. :)
In this closed issue , @kba kindly recommended the following workflow to use eynollah results in an OCR-D workflow:
I'm having some challenges implementing this. It may just have to do with folders and paths, or maybe some "blanks" I failed to fill in...
Everything goes smoothly until the last line. (I believe it wants an input parameter?) The output is:
If I try adding
-I SEG
, output includes the following:and
FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg
and
FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: /mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg
and
Exception: Already tried prepending baseurl '/mnt/c/users/scott/desktop/python2/k/eyn_test2'. Cannot retrieve '/mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg'
If I try adding
-I SEG_1
, the output is:Any suggestions welcome and appreciated!
The text was updated successfully, but these errors were encountered: