regression in OCR-D processor #106

bertsky · 2023-06-10T09:18:44Z

Since the last update I am getting

09:12:50.405 INFO eynollah - INPUT FILE PHYS_0001 (1/3) 
09:12:50.972 INFO eynollah - Resizing and enhancing image...
09:12:50.972 INFO eynollah - Detected 300 DPI
09:13:13.284 INFO eynollah - Found 1 columns ([[1. 0. 0. 0. 0. 0.]])
09:13:13.292 INFO eynollah - Image was not enhanced.
09:13:13.304 INFO eynollah - Enhancing took 22.3s 
09:13:33.396 INFO eynollah - Textregion detection took 20.1s 
09:13:33.651 INFO eynollah - Graphics detection took 0.3s 
09:13:44.518 INFO eynollah - textline detection took 10.9s
09:13:47.943 INFO eynollah - slope_deskew: -0.12°
09:13:47.944 INFO eynollah - deskewing took 3.4s
09:13:48.005 INFO eynollah - detection of marginals took 0.1s
09:14:25.182 INFO eynollah - Job done in 94.2s
09:14:25.182 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-eynollah-segment'
Traceback (most recent call last):
  File "/build/core/ocrd/ocrd/processor/helpers.py", line 128, in run_processor
    processor.process()
  File "/build/eynollah/qurator/eynollah/processor.py", line 59, in process
    Eynollah(**eynollah_kwargs).run()
  File "/build/eynollah/qurator/eynollah/eynollah.py", line 3107, in run
    self.writer.write_pagexml(pcgts)
  File "/build/eynollah/qurator/eynollah/writer.py", line 138, in write_pagexml
    out_fname = os.path.join(self.dir_out, self.image_filename_stem) + ".xml"
  File "/usr/lib/python3.8/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Have not looked further yet.

The text was updated successfully, but these errors were encountered:

bertsky · 2023-06-10T16:15:43Z

Clearly, this is related to c606391 (dir_in mode), but it's a huge changeset, and I don't understand it (also, the diff is suboptimal).

BTW, what was the reason for dir_in (and why not also put the loaded models into class members in the single-image mode, so most of the dir_in conditionals would have been unnecessary)? IMHO for efficiency it would be necessary to separate CPU-bound from GPU-bound computation and make them run quasi parallel (for example via queueing). The dir_in change does avoid reloading models, but still requires many transfers to/from GPU memory and waiting for pre- and postprocessing, which is CPU-bound. Also, it's not possible to write a OCR-D wrapper for this kind of API.

mikegerber · 2023-08-15T17:38:48Z

I have the same problem with 0.3.0:

TypeError: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "/usr/local/share/pyenv/versions/3.7.17/bin/ocrd-eynollah-segment", line 8, in <module>
    sys.exit(main())
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
    return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 126, in ocrd_cli_wrap_processor
    run_processor(processorClass, mets_url=mets, workspace=workspace, **kwargs)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 131, in run_processor
    raise err
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 128, in run_processor
    processor.process()
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/processor.py", line 59, in process
    Eynollah(**eynollah_kwargs).run()
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/eynollah.py", line 3092, in run
    self.writer.write_pagexml(pcgts)
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/writer.py", line 138, in write_pagexml
    out_fname = os.path.join(self.dir_out, self.image_filename_stem) + ".xml"
  File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Reproducer:

#!/bin/sh
set -ex

test_id=`basename $0`
cd `mktemp -d /tmp/$test_id-XXXXX`

# Prepare processors
ocrd resmgr download ocrd-eynollah-segment default

# Prepare test workspace
wget https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip
unzip actevedef_718448162.first-page+binarization+segmentation.zip
cd actevedef_718448162.first-page+binarization+segmentation

# Run tests
ocrd-eynollah-segment -P models default -I OCR-D-IMG-BIN -O TEST-EYNOLLAH-SEG

bertsky · 2023-08-15T17:42:51Z

@mikegerber for a recent version usable in the OCR-D ecosystem, you need https://github.com/qurator-spk/eynollah/tree/706433c5049c63c6e16fee5f71d81a7e507abe06 which is part of #108

bertsky · 2023-08-15T17:53:28Z

Thanks but - I think - I don't currently have this problem - no conflicting qurator packages here. (And my recommendation to the team was to move away from the - useless - common qurator.* namespace.)

The most thorough test regarding the namespace inconsistency problem I know of is to install both sbb_binarize and eynollah in devel mode and then run --help on the OCR-D CLIs.

bertsky · 2023-08-15T18:41:23Z

@bertsky Is this particular problem even connected to the namespace problem? It seems unrelated.

@mikegerber no, but as as argued in #108 and above: you need the namespace problem fixed in the current ecosystem, while due to the present issue the master branch is dysfunctional since #86.

cneud · 2023-08-16T09:07:25Z

Hi, we are aware of the issue(s) with the OCR-D CLI. Please note that these all relate to the OCR-D CLI only, while the tool is otherwise perfectly usable - "completely unusable in certain circumstances" is a bit too harsh/inprecise, this is only the case when the "circumstances" are OCR-D.

I believe we had this discussion already in the past (even multiple times), but please take into account that the OCR-D CLI is not the top priority for our development here, as this is not an OCR-D project. In fact, we are currently hiring another position to work on this, so please be a bit patient.

We try to balance the project work plan, issues reported here and the availability of staff when planning the work and priorities, so please refrain from introducing new labels for urgency! But if it helps we can introduce an "OCR-D" label for all issues relating to that.

bertsky · 2023-08-16T09:29:57Z

I also don't see the point of urging this right now. We had a bit of a hustle when all this surfaced during the last ocrd_all release/repair sprint, but as explained above we did find a workable interim solution (ie. reverting #86 and #97 while cherry-picking #100).

cneud · 2024-02-28T17:06:53Z

I finally found some time to dive a little deeper into this issue, and now I must agree with @bertsky that c606391 probably should not have been merged in the first place...alas, we now have it in main, so we will proceed to dissect the problematic changes from the larger changeset.

This was referenced Jun 10, 2023

update tool json resource path_in_archive #107

Closed

eynollah: contour data type = 17 is not supported OCR-D/ocrd_all#368

Closed

mikegerber added the bug Something isn't working label Aug 15, 2023

mikegerber mentioned this issue Aug 15, 2023

Please don't use the qurator namespace #112

Open

mikegerber mentioned this issue Aug 16, 2023

Workaround for/Removal of eynollah qurator-spk/ocrd-galley#78

Open

2 tasks

cneud added ocr-d OCR-D related issue and removed bug Something isn't working labels Aug 17, 2023

mikegerber mentioned this issue Oct 19, 2023

Memory usage explosion with very narrow images (e.g. book spine) #67

Open

cneud mentioned this issue Mar 1, 2024

Expected Ptr<cv::UMat> for argument 'contour' #110

Open

cneud mentioned this issue Mar 19, 2024

eynollah command doesnt do anything nor log errors inside docker container #120

Closed

cneud mentioned this issue Aug 9, 2024

fix OCR-D regression #129

Merged

vahidrezanezhad added a commit that referenced this issue Aug 14, 2024

resolving #106 for pyproject.toml test

53fd5fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression in OCR-D processor #106

regression in OCR-D processor #106

bertsky commented Jun 10, 2023

bertsky commented Jun 10, 2023

mikegerber commented Aug 15, 2023

bertsky commented Aug 15, 2023

bertsky commented Aug 15, 2023

bertsky commented Aug 15, 2023

cneud commented Aug 16, 2023

bertsky commented Aug 16, 2023

cneud commented Feb 28, 2024

regression in OCR-D processor #106

regression in OCR-D processor #106

Comments

bertsky commented Jun 10, 2023

bertsky commented Jun 10, 2023

mikegerber commented Aug 15, 2023

bertsky commented Aug 15, 2023

bertsky commented Aug 15, 2023

bertsky commented Aug 15, 2023

cneud commented Aug 16, 2023

bertsky commented Aug 16, 2023

cneud commented Feb 28, 2024