Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError thrown due to shape inconsistence (empty page) #40

Closed
cneud opened this issue Aug 26, 2020 · 8 comments
Closed

ValueError thrown due to shape inconsistence (empty page) #40

cneud opened this issue Aug 26, 2020 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@cneud
Copy link
Member

cneud commented Aug 26, 2020

For this empty page, I get a ValueError when applying sbb-textline-detector via my_ocrd_workflow.

14:38:43.639 INFO processor.OcrdSbbTextlineDetectorRecognize - INPUT FILE 80 / <OcrdFile fileGrp=OCR-D-IMG-BIN, ID=FILE_0081_OCR-D-IMG-BIN, mimetype=application/vnd.prima.page+xml, url=OCR-D-IMG-BIN/FILE_0081_OCR-D-IMG-BIN.xml, local_filename=OCR-D-IMG-BIN/FILE_0081_OCR-D-IMG-BIN.xml]/>
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-sbb-textline-detector", line 8, in <module>
    sys.exit(ocrd_sbb_textline_detector())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 25, in ocrd_sbb_textline_detector
    return ocrd_cli_wrap_processor(OcrdSbbTextlineDetectorRecognize, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 71, in process
    x.run()
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 2091, in run
    textline_mask_tot=self.textline_contours(image_page)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 492, in textline_contours
    prediction_textline=self.do_prediction(patches,img,model_textline)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 284, in do_prediction
    img_patch.reshape(1, img_patch.shape[0], img_patch.shape[1], img_patch.shape[2]))
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 142, 3)

Ideally the output should be a PAGE-XML file with no region/textlines.

@vahidrezanezhad
Copy link
Member

Dear Clemens just check it out.

@mikegerber
Copy link
Member

@vahidrezanezhad Can this be closed?

0f09f4a says it's resolved.

(If you had included (Closes GH-40) the pushed commit would have automatically closed it.)

@mikegerber mikegerber added the bug Something isn't working label Oct 2, 2020
@cneud
Copy link
Member Author

cneud commented Oct 2, 2020

Please don't close just yet @mikegerber, I opened this and still need to verify the fix works!
(yes I know one can link issues and PR's but this was intentional so I can actually test it)

@mikegerber
Copy link
Member

That's why I asked :)

@cneud
Copy link
Member Author

cneud commented Oct 2, 2020

I have now tested the fix against three different images that threw the error before, and can confirm all of them were processed successfully with the fix, so will close.

@cneud cneud closed this as completed Oct 2, 2020
@cneud
Copy link
Member Author

cneud commented Oct 2, 2020

(@mikegerber but in fact I did miss 0f09f4a also closed #30, which should have been two separate commits @vahidrezanezhad )

@cneud
Copy link
Member Author

cneud commented Oct 5, 2020

Maybe I did indeed close this too early...while most images that threw above error before are now processing OK, I still do get this exception for e.g. this image: https://digital.staatsbibliothek-berlin.de/werkansicht?PPN=PPN687049385&PHYSID=PHYS_0024&DMDID=

16:42:40.931 INFO processor.OcrdSbbTextlineDetectorRecognize - INPUT FILE 23 / <OcrdFile fileGrp=OCR-D-IMG-BIN, ID=FILE_0024_OCR-D-IMG-BIN, mimetype=application/vnd.prima.page+xml, url=OCR-D-IMG-BIN/FILE_0024_OCR-D-IMG-BIN.xml, local_filename=OCR-D-IMG-BIN/FILE_0024_OCR-D-IMG-BIN.xml]/>
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-sbb-textline-detector", line 8, in <module>
    sys.exit(ocrd_sbb_textline_detector())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 31, in ocrd_sbb_textline_detector
    return ocrd_cli_wrap_processor(OcrdSbbTextlineDetectorRecognize, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/processor/helpers.py", line 68, in run_processor
    processor.process()
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 73, in process
    x.run()
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 2102, in run
    textline_mask_tot=self.textline_contours(image_page)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 496, in textline_contours
    prediction_textline=self.do_prediction(patches,img,model_textline)
  File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 288, in do_prediction
    img_patch.reshape(1, img_patch.shape[0], img_patch.shape[1], img_patch.shape[2]))
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 110, 3)

@cneud
Copy link
Member Author

cneud commented Jan 25, 2021

Can confirm this has been fixed for the images here with 4c498fc.

@cneud cneud closed this as completed Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants