Running results in OCR-D #21

SB2020-eye · 2021-02-25T18:30:46Z

Hello again. :)

In this closed issue , @kba kindly recommended the following workflow to use eynollah results in an OCR-D workflow:

ocrd workspace init
ocrd workspace add -G IMG -i IMG_1 -g page1 image1.png
ocrd workspace add -G SEG -i SEG_1 -g page1 image1.xml
ocrd-tesserocr-recognize -P segmentation_level none -P textequiv_level line

I'm having some challenges implementing this. It may just have to do with folders and paths, or maybe some "blanks" I failed to fill in...

Everything goes smoothly until the last line. (I believe it wants an input parameter?) The output is:

        Input fileGrp[@USE='INPUT'] not in METS!

If I try adding -I SEG, output includes the following:

Traceback (most recent call last):
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/ocrd/workspace.py", line 111, in download_file
    raise Exception("Not already downloaded, moving on")
Exception: Not already downloaded, moving on

and
FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg
and
FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: /mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg
and
Exception: Already tried prepending baseurl '/mnt/c/users/scott/desktop/python2/k/eyn_test2'. Cannot retrieve '/mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg'

If I try adding -I SEG_1, the output is:

        Input fileGrp[@USE='SEG_1'] not in METS!

Any suggestions welcome and appreciated!

The text was updated successfully, but these errors were encountered:

kba · 2021-02-26T17:22:23Z

OCR-D bindings are next on the agenda (well, first, resolving some issues with tensorflow, pixel density and reading order and THEN OCR-D bindings), so I hope to implement this next week.

Until then, you need to manually align how OCR-D expects paths. The code I posted was just illustrative, so let me fill in the blanks wrt to path handling. For simplicity's sake, let's just use one image file page1.jpg.

# Create the workspace
ocrd workspace -d ws1 init
# chdir to the workspace
cd ws1
# create folders for the image and the segmentation
mkdir IMG SEG
# copy the image file from whereever it is stored to IMG (not sure about the path syntax in windows)
cp C:\Path\to\page1.jpg IMG
# Run eynollah and use a relative filename to refer to the image
eynollah -i IMG/page1.jpg -o SEG
# Now register the new files with the workspace (i.e. add them to mets.xml)
ocrd workspace add -G IMG -i IMG_1 -g page1 IMG/page1.jpg
ocrd workspace add -G SEG -i SEG_1 -g page1 SEG/page1.xml
# Now you can run ocrd-tesserocr-recognize on the results. Specify the right input group SEG
ocrd-tesserocr-recognize -P segmentation_level none -P textequiv_level line -I SEG -O OCR
# Now the output PAGE-XML with text recognition will be in the OCR subdirectory

But it would be really best to wait for proper OCR-D bindings because this is a lot of effort, error-prone and brittle to do it like this.

SB2020-eye · 2021-02-26T21:25:27Z

Thank you, for both the instructions and the advice!

SB2020-eye · 2021-03-09T18:28:39Z

Hi, @kba . I imagine you have plenty going on. I just wanted to make a friendly check-in and see if perhaps the OCR-D bindings are in place.

kba · 2021-03-10T08:45:07Z

Hi, @kba . I imagine you have plenty going on. I just wanted to make a friendly check-in and see if perhaps the OCR-D bindings are in place.

Not yet, I was flummoxed by an issue with the reading order in my refactoring, solved yesterday and now back on the bindings. I'll update this issue once there is something testable. Thanks for checking in.

SB2020-eye · 2021-03-26T15:27:16Z

I decided to go ahead and take a stab at using your illustration using paths "with the blanks filled in" for me, above.

I got this:

(venv) scott@Yogi:~/ws1$ eynollah -i IMG/F073r.jpg -o SEG
Traceback (most recent call last):
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/bin/eynollah", line 8, in <module>
    sys.exit(main())
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/qurator/eynollah/cli.py", line 133, in main
    headers_off,
  File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/qurator/eynollah/eynollah.py", line 122, in __init__
    self.model_dir_of_enhancement = dir_models + "/model_enhancement.h5"
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Do you happen to understand what is going wrong here?

vahidrezanezhad · 2021-03-26T23:40:35Z

By eynollah you need to provide the directory of models by -m option.

SB2020-eye · 2021-03-27T13:28:22Z

Got it. Enthused by what I'm seeing with my first results! Thank you.

vahidrezanezhad · 2021-03-27T13:48:27Z

Got it. Enthused by what I'm seeing with my first results! Thank you.

Happy to hear that :)

mikegerber · 2021-04-23T16:39:32Z

I think this issue can be closed, now that OCR-D support is in.

kba · 2021-04-23T17:28:54Z

Indeed!

SB2020-eye · 2021-04-23T23:54:08Z

Wonderful! Thank you!

SB2020-eye · 2021-04-24T01:28:57Z

(How do I update OCR-D in order to use eynollah there now? Is it part of ocrd_all?)

kba · 2021-04-25T10:52:01Z

(How do I update OCR-D in order to use eynollah there now? Is it part of ocrd_all?)

It is part of the just-released v2021-04-25 of ocrd_all. Docker images are still building but should be deployed in a few hours.

If you want to update your standalone eynollah installation: git checkout master; git pull origin master; pip install . will give you ocrd-eynollah-segment.

SB2020-eye · 2021-04-25T21:49:55Z

Wonderful. Thank you.

I decided I would get rid of my standalone instance and just use OCR-D.

In my ocrd environment (in Ubuntu 18.04), I ran:

git checkout master
git pull origin master
make all

(I don't know if this was overkill or not. But it didn't take long.)

Then once I tested by running ocrd-eynollah-segment --help, I ran the following to include the model:

ocrd resmgr download -n ocrd-eynollah-segment https://qurator-data.de/eynollah/models_eynollah.tar.gz

Thanks again for everyone's assistance.

mikegerber mentioned this issue Apr 20, 2021

Integrate eynollah qurator-spk/ocrd-galley#47

Closed

kba closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running results in OCR-D #21

Running results in OCR-D #21

SB2020-eye commented Feb 25, 2021

kba commented Feb 26, 2021

SB2020-eye commented Feb 26, 2021

SB2020-eye commented Mar 9, 2021

kba commented Mar 10, 2021

SB2020-eye commented Mar 26, 2021

vahidrezanezhad commented Mar 26, 2021

SB2020-eye commented Mar 27, 2021

vahidrezanezhad commented Mar 27, 2021

mikegerber commented Apr 23, 2021

kba commented Apr 23, 2021

SB2020-eye commented Apr 23, 2021

SB2020-eye commented Apr 24, 2021

kba commented Apr 25, 2021

SB2020-eye commented Apr 25, 2021

Running results in OCR-D #21

Running results in OCR-D #21

Comments

SB2020-eye commented Feb 25, 2021

kba commented Feb 26, 2021

SB2020-eye commented Feb 26, 2021

SB2020-eye commented Mar 9, 2021

kba commented Mar 10, 2021

SB2020-eye commented Mar 26, 2021

vahidrezanezhad commented Mar 26, 2021

SB2020-eye commented Mar 27, 2021

vahidrezanezhad commented Mar 27, 2021

mikegerber commented Apr 23, 2021

kba commented Apr 23, 2021

SB2020-eye commented Apr 23, 2021

SB2020-eye commented Apr 24, 2021

kba commented Apr 25, 2021

SB2020-eye commented Apr 25, 2021