Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to get running... #17

Closed
SB2020-eye opened this issue Feb 3, 2021 · 53 comments
Closed

trying to get running... #17

SB2020-eye opened this issue Feb 3, 2021 · 53 comments

Comments

@SB2020-eye
Copy link

Hi. I am trying to get this running on Windows 10 using Visual Studio Code.

If cd into the repo and run a command like:
eynollah -i C:/Users/Scott/Desktop/Python2/Kpages/Pages/076v.jpg -o C:/Users/Scott/Desktop/Python2/Kpages -m C:/Users/Scott/Desktop/Python2/eynollah/models_eynollah -si C:/Users/Scott/Desktop/Python2/Kpages
it doesn't appear to run. A new command prompt comes up after a couple of seconds -- but no output and no error message.

Any guidance would be appreciated.

@cneud
Copy link
Member

cneud commented Feb 3, 2021

Hi, I'm sorry but we are currently underway with major refactoring, and unintentionally seem to have broken main doing so. I hope we can conclude the overhaul within the next couple of weeks. Soon after that, this tool will also be included in our ocrd-galley, which would allow usage via "stable" Docker images.

I believe https://github.com/qurator-spk/eynollah/tree/778a4197a5ee99e8bbcfc86e8ae75cec96a3435e was still working for me, but unfortunately no experience trying any of this on Windows.

@kba
Copy link
Contributor

kba commented Feb 4, 2021

main should be working after #12. Smoke test: eynollah --help work? Can you share the image you're trying this on?

@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 4, 2021

Thank you, @cneud and @kba.

@cneud , are you suggesting I download the version found at the link you gave?

@kba , I think you're asking me to run eynollah --help from terminal after cd-ing into the repo root folder. If so, the same behavior occurs: no output, and after 3 or so seconds, a new command line appears ready to go.

Just in case, I should probably ask something crucial towards my goal with eynollah, to make sure I don't waste everyone's time. I am assuming the -si argument results in image files for all the different segmented sections of the original image. Is that correct? And if so, are they lossless (ie, is anything lost in the process)?

Here's an example image.
page181r-downsized

@cneud
Copy link
Member

cneud commented Feb 4, 2021

@cneud , are you suggesting I download the version found at the link you gave?

@kba has kindly applied a fix to main, so (theoretically) the main branch should now build again.

I am assuming the -si argument results in image files for all the different segmented sections of the original image. Is that correct?

No I believe the -si flag only extracts regions with image content, i.e. illustrations, pictures, fotos or similar that were identified by the layout analysis as "graphical elements".

You can however cut out any regions from the image after layout analysis based on their pixel coordinates in the PAGE-XML output, which will give you the segment images in the same resolution as the source image.

@kba
Copy link
Contributor

kba commented Feb 4, 2021

@cneud , are you suggesting I download the version found at the link you gave?

The main branch works for me, so no, just make sure you are at the latest commit in main.

@kba , I think you're asking me to run eynollah --help from terminal after cd-ing into the repo root folder. If so, the same behavior occurs: no output, and after 3 or so seconds, a new command line appears ready to go.

From the other issue, I infer you're using conda. If the conda env is active, you do not need to be in the repo folder. Are you sure, you have installed eynollah including its dependencies, i.e. conda activate yourenv; pip install . or can you try with a fresh environment to make sure this is not the issue?

I am assuming the -si argument results in image files for all the different segmented sections of the original image. Is that correct? And if so, are they lossless (ie, is anything lost in the process)?

Yes, with the -si option, cropped images of all the contours found by eynollah are written to that directory. GIGO, so this should not reduce image quality IIUC.

However, I see this as merely a debug function (@vahidrezanezhad correct me if I'm wrong), the important result is the PAGE-XML. From that (or any other) PAGE-XML you can use ocrd_segment, specifically ocrd-segment-extract-regions and *-lines to extract the cropped images afterwards. Even better would be, if you use this within a python project, to use the polygons in the PAGE-XML directly, so you don't lose that information in serialization which must be a bounding box.

Here's an example image

And here's eynollah would segment that page:

image

And without the image for clearer visuals:

image

The ruler confused the detection so the reading order is shoddy, should have cropped the printspace more vertically. But the regions and esp. lines (which are essential for OCR) are tight and accurate AFAICS.

@kba
Copy link
Contributor

kba commented Feb 4, 2021

Yes, with the -si option, cropped images of all the contours found by eynollah are written to that directory. GIGO, so this should not reduce image quality IIUC.

I was wrong, @cneud hat it right:

No I believe the -si flag only extracts regions with image content, i.e. illustrations, pictures, fotos or similar that were identified by the layout analysis as "graphical elements".

@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 4, 2021

Regarding -si, does this mean that I would need to work with PAGE-XML in order to get the cut-out images of text lines? I have some doubts about my abilities in that realm (never worked with XML, never heard of XSLT, can't even locate the dependencies needed for that repo, etc). Lol.

I actually don't need OCR per se -- just images of text lines (or, even better, words, if possible). This is toward a subsequent goal of cutting out images of just glyphs (with no background). eynollah is obviously constructed for purposes more sophisticated than just what I'm describing.

I actually already have something slicing out images of text lines for me -- docExtractor. But having found your sbb_binarization and getting such positive results, I came to eynollah since sbb_binarization doesn't seem to run in python 3.8.6, which the rest of my program (including docExtractor) is currently running in. And I just don't know how to get them to "talk" to each other. So I figured maybe I could replace docExtractor with eynollah and have everything run in python 3.7.0 environment. (Yes, @kba , it is indeed a conda environment.)

If this sounds like I'm making things overly complicated, I probably am! And I'd appreciate you saying so (plus any suggestions you might have). Or if eynollah seems to you like it's a rabbit trail for my particular purposes, please don't hesistate to say so. You are obviously doing good work here!

@vahidrezanezhad
Copy link
Member

Regarding -si, does this mean that I would need to work with PAGE-XML in order to get the cut-out images of text lines? I have some doubts about my abilities in that realm (never worked with XML, never heard of XSLT, can't even locate the dependencies needed for that repo, etc). Lol.

I actually don't need OCR per se -- just images of text lines (or, even better, words, if possible). This is toward a subsequent goal of cutting out images of just glyphs (with no background). eynollah is obviously constructed for purposes more sophisticated than just what I'm describing.

I actually already have something slicing out images of text lines for me -- docExtractor. But having found your sbb_binarization and getting such positive results, I came to eynollah since sbb_binarization doesn't seem to run in python 3.8.6, which the rest of my program (including docExtractor) is currently running in. And I just don't know how to get them to "talk" to each other. So I figured maybe I could replace docExtractor with eynollah and have everything run in python 3.7.0 environment. (Yes, @kba , it is indeed a conda environment.)

If this sounds like I'm making things overly complicated, I probably am! And I'd appreciate you saying so (plus any suggestions you might have). Or if eynollah seems to you like it's a rabbit trail for my particular purposes, please don't hesistate to say so. You are obviously doing good work here!

Hi there, -si option gives you this capability to crop and save images inside the document . This can be done using output xml data but to make it easier we have provided this option too (to crop and save them while you run eynollah).

@vahidrezanezhad
Copy link
Member

I am assuming the -si argument results in image files for all the different segmented sections of the original image. Is that correct?

No I believe the -si flag only extracts regions with image content, i.e. illustrations, pictures, fotos or similar that were identified by the layout analysis as "graphical elements".

You can however cut out any regions from the image after layout analysis based on their pixel coordinate

Correct. Thank you

@kba
Copy link
Contributor

kba commented Feb 4, 2021

If this sounds like I'm making things overly complicated, I probably am!

IIUC you want to create some sort of glyph repository, so you're not interested in the text detection but in getting lines and glyphs from the lines in a bitonal format.

You want to preprocess your page to crop it to the print space (which should get rid of opposing pages, rulers etc.), deskew/dewarp it (if lines aren't perfectly orthogonal to image or have water damage or have a deep joint) and then segment the page into lines. We have a multiple tools for that in OCR-D, see https://ocr-d.de/en/workflows. Then you can use an OCR engine like tesseract or calamari to do the recognition down to glyph level and just disregard the actual detected text and just use the bounding boxes of the glyphs to cut them out of the original image.

Yes, this would involve working with PAGE-XML. We do have a pythonic API for that in OCR-D/core though that can make this a bit easier, at the end of the day it's a hierarchical data structure like any other: Page -> TextLine -> Word -> Glyph -> Coords -> points.

But I suggest you drop by our chat at https://gitter.im/OCR-D/Lobby, say hi and describe your use case, it's easier to discuss there than in an issue.

@kba
Copy link
Contributor

kba commented Feb 4, 2021

-si option gives you this capability to crop and save images inside the document

@vahidrezanezhad just to make sure: with "save images" you mean "save graphic regions", correct?

@vahidrezanezhad
Copy link
Member

-si option gives you this capability to crop and save images inside the document

@vahidrezanezhad just to make sure: with "save images" you mean "save graphic regions", correct?

Yes :)

@vahidrezanezhad
Copy link
Member

The ruler confused the detection so the reading order is shoddy, should have cropped the printspace more vertically. But the regions and esp. lines (which are essential for OCR) are tight and accurate AFAICS.

As you mentioned, the reason for a bad reading order is the page detector (this simply happens since in GT we did not have such documents). But this is a general problem for reading order detection that can occur for documents with multi-columns and footnotes even though you have extracted printspace correctly.
main-qimg-b6abcab9f04b2c29fe571802d348a973

and have a look at reading order

footnotes_disorder

you see reading order still is not correct :)

@cneud
Copy link
Member

cneud commented Feb 4, 2021

main should be working after #12.

I still had to the following to get main working:

  • update pip (otherwise tensorflow-gpu 1.15 won't be found)
  • install tqdm and seaborn via pip
  • downgrade keras pip install keras==2.3.1

With these changes, I can successfully run the tool (on Ubuntu, not Windows though).

@SB2020-eye
Copy link
Author

-si option gives you this capability to crop and save images inside the document

@vahidrezanezhad just to make sure: with "save images" you mean "save graphic regions", correct?

Yes :)

And does that mean "save graphic regions...as image files", or something else? Thanks.

@SB2020-eye
Copy link
Author

If this sounds like I'm making things overly complicated, I probably am!

IIUC you want to create some sort of glyph repository, so you're not interested in the text detection but in getting lines and glyphs from the lines in a bitonal format.

You want to preprocess your page to crop it to the print space (which should get rid of opposing pages, rulers etc.), deskew/dewarp it (if lines aren't perfectly orthogonal to image or have water damage or have a deep joint) and then segment the page into lines. We have a multiple tools for that in OCR-D, see https://ocr-d.de/en/workflows. Then you can use an OCR engine like tesseract or calamari to do the recognition down to glyph level and just disregard the actual detected text and just use the bounding boxes of the glyphs to cut them out of the original image.

Yes, this would involve working with PAGE-XML. We do have a pythonic API for that in OCR-D/core though that can make this a bit easier, at the end of the day it's a hierarchical data structure like any other: Page -> TextLine -> Word -> Glyph -> Coords -> points.

But I suggest you drop by our chat at https://gitter.im/OCR-D/Lobby, say hi and describe your use case, it's easier to discuss there than in an issue.

Thanks. I just posted something.

@kba
Copy link
Contributor

kba commented Feb 5, 2021

install tqdm and seaborn via pip

I wonder why you need those. Are you sure you're up-to-date? These have been removed in 9596a44 and 801ccac resp.

downgrade keras pip install keras==2.3.1

Oh, yes, that's fixed in the refactoring but should be in main too, ef1e32e

And does that mean "save graphic regions...as image files", or something else? Thanks.

Yes, the graphic regions are saved as JPEG image files.

@cneud
Copy link
Member

cneud commented Feb 5, 2021

Yes this was on a clean clone of c7d509b - I still had to install both packages manually or eynollah would not run.

I also could not get any images extracted using -si. Does this only work in combination with -fl=true? @vahidrezanezhad

Also I am getting OOM exception due to Tensor shape... every time I try to run eynollah with the -fl=true parameter on my Geforce RTX2070S with 8 GB :(

@vahidrezanezhad
Copy link
Member

Yes this was on a clean clone of c7d509b - I still had to install both packages manually or eynollah would not run.

I also could not get any images extracted using -si. Does this only work in combination with -fl=true? @vahidrezanezhad

Also I am getting OOM exception due to Tensor shape... every time I try to run eynollah with the -fl=true parameter on my Geforce RTX2070S with 8 GB :(

No. -si has nothing to do with -fl option. By -si a directory should be given.

@cneud
Copy link
Member

cneud commented Feb 5, 2021

Hmm, when I tried using e.g. eynollah -i 00000015.tif -o . -si . I did not get any images extracted to that directory? I was using this image https://content.staatsbibliothek-berlin.de/dms/PPN626696453/1200/0/00000015.tif?original=true.

@kba
Copy link
Contributor

kba commented Feb 5, 2021

Hmm, when I tried using e.g. eynollah -i 00000015.tif -o . -si . I did not get any images extracted to that directory? I was using this image https://content.staatsbibliothek-berlin.de/dms/PPN626696453/1200/0/00000015.tif?original=true.

That might well be a regression on my part, investigating.

seaborn and tqdm

I am still confused about this. Can you try pip uninstall tqdm seaborn and provide the stacktrace this causes please?

pipdeptree shows this dependency tree for me:

pipdeptree -p eynollah
eynollah==0.0.1
  - imutils [required: >=0.5.3, installed: 0.5.3]
  - keras [required: >=2.3.1, installed: 2.3.1]
    - h5py [required: Any, installed: 2.10.0]
      - numpy [required: >=1.7, installed: 1.18.5]
      - six [required: Any, installed: 1.15.0]
    - keras-applications [required: >=1.0.6, installed: 1.0.8]
      - h5py [required: Any, installed: 2.10.0]
        - numpy [required: >=1.7, installed: 1.18.5]
        - six [required: Any, installed: 1.15.0]
      - numpy [required: >=1.9.1, installed: 1.18.5]
    - keras-preprocessing [required: >=1.0.5, installed: 1.1.0]
      - numpy [required: >=1.9.1, installed: 1.18.5]
      - six [required: >=1.9.0, installed: 1.15.0]
    - numpy [required: >=1.9.1, installed: 1.18.5]
    - pyyaml [required: Any, installed: 5.3.1]
    - scipy [required: >=0.14, installed: 1.4.1]
      - numpy [required: >=1.13.3, installed: 1.18.5]
    - six [required: >=1.9.0, installed: 1.15.0]
  - matplotlib [required: Any, installed: 3.3.1]
    - certifi [required: >=2020.06.20, installed: 2020.6.20]
    - cycler [required: >=0.10, installed: 0.10.0]
      - six [required: Any, installed: 1.15.0]
    - kiwisolver [required: >=1.0.1, installed: 1.2.0]
    - numpy [required: >=1.15, installed: 1.18.5]
    - pillow [required: >=6.2.0, installed: 7.2.0]
    - pyparsing [required: >=2.0.3,!=2.1.6,!=2.1.2,!=2.0.4, installed: 2.4.7]
    - python-dateutil [required: >=2.1, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.15.0]
  - ocrd [required: >=2.20.1, installed: 2.22.3]
    - bagit [required: >=1.7.0, installed: 1.7.0]
    - bagit-profile [required: >=1.3.0, installed: 1.3.1]
      - bagit [required: Any, installed: 1.7.0]
      - requests [required: Any, installed: 2.24.0]
        - certifi [required: >=2017.4.17, installed: 2020.6.20]
        - chardet [required: >=3.0.2,<4, installed: 3.0.4]
        - idna [required: >=2.5,<3, installed: 2.10]
        - urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.10]
    - click [required: >=7, installed: 7.1.2]
    - Deprecated [required: ==1.2.0, installed: 1.2.0]
      - wrapt [required: >=1,<2, installed: 1.12.1]
    - Flask [required: Any, installed: 1.1.2]
      - click [required: >=5.1, installed: 7.1.2]
      - itsdangerous [required: >=0.24, installed: 1.1.0]
      - Jinja2 [required: >=2.10.1, installed: 2.11.2]
        - MarkupSafe [required: >=0.23, installed: 1.1.1]
      - Werkzeug [required: >=0.15, installed: 1.0.1]
    - jsonschema [required: Any, installed: 3.2.0]
      - attrs [required: >=17.4.0, installed: 20.2.0]
      - importlib-metadata [required: Any, installed: 2.0.0]
        - zipp [required: >=0.5, installed: 3.2.0]
      - pyrsistent [required: >=0.14.0, installed: 0.17.3]
      - setuptools [required: Any, installed: 50.3.0]
      - six [required: >=1.11.0, installed: 1.15.0]
    - lxml [required: Any, installed: 4.5.2]
    - ocrd-modelfactory [required: ==2.22.3, installed: 2.22.3]
      - lxml [required: Any, installed: 4.5.2]
      - ocrd-models [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.5.2]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 7.2.0]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 7.2.0]
    - ocrd-models [required: ==2.22.3, installed: 2.22.3]
      - lxml [required: Any, installed: 4.5.2]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 7.2.0]
    - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
      - atomicwrites [required: >=1.3.0, installed: 1.4.0]
      - numpy [required: Any, installed: 1.18.5]
      - Pillow [required: >=7.2.0, installed: 7.2.0]
    - ocrd-validators [required: ==2.22.3, installed: 2.22.3]
      - bagit [required: >=1.7.0, installed: 1.7.0]
      - bagit-profile [required: >=1.3.0, installed: 1.3.1]
        - bagit [required: Any, installed: 1.7.0]
        - requests [required: Any, installed: 2.24.0]
          - certifi [required: >=2017.4.17, installed: 2020.6.20]
          - chardet [required: >=3.0.2,<4, installed: 3.0.4]
          - idna [required: >=2.5,<3, installed: 2.10]
          - urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.10]
      - click [required: >=7, installed: 7.1.2]
      - jsonschema [required: Any, installed: 3.2.0]
        - attrs [required: >=17.4.0, installed: 20.2.0]
        - importlib-metadata [required: Any, installed: 2.0.0]
          - zipp [required: >=0.5, installed: 3.2.0]
        - pyrsistent [required: >=0.14.0, installed: 0.17.3]
        - setuptools [required: Any, installed: 50.3.0]
        - six [required: >=1.11.0, installed: 1.15.0]
      - ocrd-modelfactory [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.5.2]
        - ocrd-models [required: ==2.22.3, installed: 2.22.3]
          - lxml [required: Any, installed: 4.5.2]
          - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
            - atomicwrites [required: >=1.3.0, installed: 1.4.0]
            - numpy [required: Any, installed: 1.18.5]
            - Pillow [required: >=7.2.0, installed: 7.2.0]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 7.2.0]
      - ocrd-models [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.5.2]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 7.2.0]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 7.2.0]
      - pyyaml [required: Any, installed: 5.3.1]
      - shapely [required: Any, installed: 1.7.1]
    - opencv-python-headless [required: Any, installed: 4.4.0.44]
      - numpy [required: >=1.13.3, installed: 1.18.5]
    - pyyaml [required: Any, installed: 5.3.1]
    - requests [required: Any, installed: 2.24.0]
      - certifi [required: >=2017.4.17, installed: 2020.6.20]
      - chardet [required: >=3.0.2,<4, installed: 3.0.4]
      - idna [required: >=2.5,<3, installed: 2.10]
      - urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.10]
  - scikit-learn [required: >=0.23.2, installed: 0.23.2]
    - joblib [required: >=0.11, installed: 0.17.0]
    - numpy [required: >=1.13.3, installed: 1.18.5]
    - scipy [required: >=0.19.1, installed: 1.4.1]
      - numpy [required: >=1.13.3, installed: 1.18.5]
    - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
  - tensorflow-gpu [required: >=1.15,<2, installed: 1.15.3]
    - absl-py [required: >=0.7.0, installed: 0.10.0]
      - six [required: Any, installed: 1.15.0]
    - astor [required: >=0.6.0, installed: 0.8.1]
    - gast [required: ==0.2.2, installed: 0.2.2]
    - google-pasta [required: >=0.1.6, installed: 0.2.0]
      - six [required: Any, installed: 1.15.0]
    - grpcio [required: >=1.8.6, installed: 1.31.0]
      - six [required: >=1.5.2, installed: 1.15.0]
    - keras-applications [required: >=1.0.8, installed: 1.0.8]
      - h5py [required: Any, installed: 2.10.0]
        - numpy [required: >=1.7, installed: 1.18.5]
        - six [required: Any, installed: 1.15.0]
      - numpy [required: >=1.9.1, installed: 1.18.5]
    - keras-preprocessing [required: >=1.0.5, installed: 1.1.0]
      - numpy [required: >=1.9.1, installed: 1.18.5]
      - six [required: >=1.9.0, installed: 1.15.0]
    - numpy [required: >=1.16.0,<2.0, installed: 1.18.5]
    - opt-einsum [required: >=2.3.2, installed: 3.3.0]
      - numpy [required: >=1.7, installed: 1.18.5]
    - protobuf [required: >=3.6.1, installed: 3.13.0]
      - setuptools [required: Any, installed: 50.3.0]
      - six [required: >=1.9, installed: 1.15.0]
    - six [required: >=1.10.0, installed: 1.15.0]
    - tensorboard [required: >=1.15.0,<1.16.0, installed: 1.15.0]
      - absl-py [required: >=0.4, installed: 0.10.0]
        - six [required: Any, installed: 1.15.0]
      - grpcio [required: >=1.6.3, installed: 1.31.0]
        - six [required: >=1.5.2, installed: 1.15.0]
      - markdown [required: >=2.6.8, installed: 3.2.2]
        - importlib-metadata [required: Any, installed: 2.0.0]
          - zipp [required: >=0.5, installed: 3.2.0]
      - numpy [required: >=1.12.0, installed: 1.18.5]
      - protobuf [required: >=3.6.0, installed: 3.13.0]
        - setuptools [required: Any, installed: 50.3.0]
        - six [required: >=1.9, installed: 1.15.0]
      - setuptools [required: >=41.0.0, installed: 50.3.0]
      - six [required: >=1.10.0, installed: 1.15.0]
      - werkzeug [required: >=0.11.15, installed: 1.0.1]
      - wheel [required: >=0.26, installed: 0.36.2]
    - tensorflow-estimator [required: ==1.15.1, installed: 1.15.1]
    - termcolor [required: >=1.1.0, installed: 1.1.0]
    - wheel [required: >=0.26, installed: 0.36.2]
    - wrapt [required: >=1.11.1, installed: 1.12.1]

@cneud
Copy link
Member

cneud commented Feb 5, 2021

So I do the following:

  1. create a fresh venv and activate it
  2. update pip
  3. git clone https://github.com/qurator-spk/eynollah
  4. pip install .

Now when I try to run eynollah it will complain about missing seaborn

eynollah -i PPN798786388_00000005.tif -o . -m ~/tmp/dev/qurator/models/eynollah                                     ✔  35s   venv-qurator   12:43:50  
Traceback (most recent call last):
  File "/usr/local/bin/eynollah", line 11, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'

So install seaborn with pip and run again:

eynollah -i PPN798786388_00000005.tif -o . -m ~/tmp/dev/qurator/models/eynollah                                   ✔  4s   venv-qurator   12:47:04  
Traceback (most recent call last):
  File "/usr/local/bin/eynollah", line 14, in <module>
    from tqdm import tqdm
ModuleNotFoundError: No module named 'tqdm'

After installation of tqdm, it runs fine.

pip uninstall tqdm seaborn will give me

pip3 uninstall tqdm seaborn                                                                                              ✔  venv-qurator   12:47:43  
Found existing installation: tqdm 4.56.0
Uninstalling tqdm-4.56.0:
  Would remove:
    /home/cnd/tmp/dev/qurator/tools/venv-qurator/bin/tqdm
    /home/cnd/tmp/dev/qurator/tools/venv-qurator/lib/python3.6/site-packages/tqdm-4.56.0.dist-info/*
    /home/cnd/tmp/dev/qurator/tools/venv-qurator/lib/python3.6/site-packages/tqdm/*
Proceed (y/n)? n
Found existing installation: seaborn 0.11.1
Uninstalling seaborn-0.11.1:
  Would remove:
    /home/cnd/tmp/dev/qurator/tools/venv-qurator/lib/python3.6/site-packages/seaborn-0.11.1.dist-info/*
    /home/cnd/tmp/dev/qurator/tools/venv-qurator/lib/python3.6/site-packages/seaborn/*
Proceed (y/n)? 

Output of pipdeptree -p eynollah:

eynollah==0.0.1
  - imutils [required: >=0.5.3, installed: 0.5.4]
  - keras [required: >=2.3.1, installed: 2.4.3]
    - h5py [required: Any, installed: 2.10.0]
      - numpy [required: >=1.7, installed: 1.18.5]
      - six [required: Any, installed: 1.15.0]
    - numpy [required: >=1.9.1, installed: 1.18.5]
    - pyyaml [required: Any, installed: 5.4.1]
    - scipy [required: >=0.14, installed: 1.5.4]
      - numpy [required: >=1.14.5, installed: 1.18.5]
  - matplotlib [required: Any, installed: 3.3.4]
    - cycler [required: >=0.10, installed: 0.10.0]
      - six [required: Any, installed: 1.15.0]
    - kiwisolver [required: >=1.0.1, installed: 1.3.1]
    - numpy [required: >=1.15, installed: 1.18.5]
    - pillow [required: >=6.2.0, installed: 8.1.0]
    - pyparsing [required: >=2.0.3,!=2.1.6,!=2.1.2,!=2.0.4, installed: 2.4.7]
    - python-dateutil [required: >=2.1, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.15.0]
  - ocrd [required: >=2.20.1, installed: 2.22.3]
    - bagit [required: >=1.7.0, installed: 1.8.0]
    - bagit-profile [required: >=1.3.0, installed: 1.3.1]
      - bagit [required: Any, installed: 1.8.0]
      - requests [required: Any, installed: 2.25.1]
        - certifi [required: >=2017.4.17, installed: 2020.12.5]
        - chardet [required: >=3.0.2,<5, installed: 4.0.0]
        - idna [required: >=2.5,<3, installed: 2.10]
        - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
    - click [required: >=7, installed: 7.1.2]
    - Deprecated [required: ==1.2.0, installed: 1.2.0]
      - wrapt [required: >=1,<2, installed: 1.12.1]
    - Flask [required: Any, installed: 1.1.2]
      - click [required: >=5.1, installed: 7.1.2]
      - itsdangerous [required: >=0.24, installed: 1.1.0]
      - Jinja2 [required: >=2.10.1, installed: 2.11.3]
        - MarkupSafe [required: >=0.23, installed: 1.1.1]
      - Werkzeug [required: >=0.15, installed: 1.0.1]
    - jsonschema [required: Any, installed: 3.2.0]
      - attrs [required: >=17.4.0, installed: 20.3.0]
      - importlib-metadata [required: Any, installed: 3.4.0]
        - typing-extensions [required: >=3.6.4, installed: 3.7.4.3]
        - zipp [required: >=0.5, installed: 3.4.0]
      - pyrsistent [required: >=0.14.0, installed: 0.17.3]
      - setuptools [required: Any, installed: 53.0.0]
      - six [required: >=1.11.0, installed: 1.15.0]
    - lxml [required: Any, installed: 4.6.2]
    - ocrd-modelfactory [required: ==2.22.3, installed: 2.22.3]
      - lxml [required: Any, installed: 4.6.2]
      - ocrd-models [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.6.2]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 8.1.0]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 8.1.0]
    - ocrd-models [required: ==2.22.3, installed: 2.22.3]
      - lxml [required: Any, installed: 4.6.2]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 8.1.0]
    - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
      - atomicwrites [required: >=1.3.0, installed: 1.4.0]
      - numpy [required: Any, installed: 1.18.5]
      - Pillow [required: >=7.2.0, installed: 8.1.0]
    - ocrd-validators [required: ==2.22.3, installed: 2.22.3]
      - bagit [required: >=1.7.0, installed: 1.8.0]
      - bagit-profile [required: >=1.3.0, installed: 1.3.1]
        - bagit [required: Any, installed: 1.8.0]
        - requests [required: Any, installed: 2.25.1]
          - certifi [required: >=2017.4.17, installed: 2020.12.5]
          - chardet [required: >=3.0.2,<5, installed: 4.0.0]
          - idna [required: >=2.5,<3, installed: 2.10]
          - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
      - click [required: >=7, installed: 7.1.2]
      - jsonschema [required: Any, installed: 3.2.0]
        - attrs [required: >=17.4.0, installed: 20.3.0]
        - importlib-metadata [required: Any, installed: 3.4.0]
          - typing-extensions [required: >=3.6.4, installed: 3.7.4.3]
          - zipp [required: >=0.5, installed: 3.4.0]
        - pyrsistent [required: >=0.14.0, installed: 0.17.3]
        - setuptools [required: Any, installed: 53.0.0]
        - six [required: >=1.11.0, installed: 1.15.0]
      - ocrd-modelfactory [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.6.2]
        - ocrd-models [required: ==2.22.3, installed: 2.22.3]
          - lxml [required: Any, installed: 4.6.2]
          - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
            - atomicwrites [required: >=1.3.0, installed: 1.4.0]
            - numpy [required: Any, installed: 1.18.5]
            - Pillow [required: >=7.2.0, installed: 8.1.0]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 8.1.0]
      - ocrd-models [required: ==2.22.3, installed: 2.22.3]
        - lxml [required: Any, installed: 4.6.2]
        - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
          - atomicwrites [required: >=1.3.0, installed: 1.4.0]
          - numpy [required: Any, installed: 1.18.5]
          - Pillow [required: >=7.2.0, installed: 8.1.0]
      - ocrd-utils [required: ==2.22.3, installed: 2.22.3]
        - atomicwrites [required: >=1.3.0, installed: 1.4.0]
        - numpy [required: Any, installed: 1.18.5]
        - Pillow [required: >=7.2.0, installed: 8.1.0]
      - pyyaml [required: Any, installed: 5.4.1]
      - shapely [required: Any, installed: 1.7.1]
    - opencv-python-headless [required: Any, installed: 4.5.1.48]
      - numpy [required: >=1.13.3, installed: 1.18.5]
    - pyyaml [required: Any, installed: 5.4.1]
    - requests [required: Any, installed: 2.25.1]
      - certifi [required: >=2017.4.17, installed: 2020.12.5]
      - chardet [required: >=3.0.2,<5, installed: 4.0.0]
      - idna [required: >=2.5,<3, installed: 2.10]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
  - scikit-learn [required: >=0.23.2, installed: 0.24.1]
    - joblib [required: >=0.11, installed: 1.0.0]
    - numpy [required: >=1.13.3, installed: 1.18.5]
    - scipy [required: >=0.19.1, installed: 1.5.4]
      - numpy [required: >=1.14.5, installed: 1.18.5]
    - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
  - tensorflow-gpu [required: >=1.15,<2, installed: 1.15.5]
    - absl-py [required: >=0.7.0, installed: 0.11.0]
      - six [required: Any, installed: 1.15.0]
    - astor [required: >=0.6.0, installed: 0.8.1]
    - gast [required: ==0.2.2, installed: 0.2.2]
    - google-pasta [required: >=0.1.6, installed: 0.2.0]
      - six [required: Any, installed: 1.15.0]
    - grpcio [required: >=1.8.6, installed: 1.35.0]
      - six [required: >=1.5.2, installed: 1.15.0]
    - h5py [required: <=2.10.0, installed: 2.10.0]
      - numpy [required: >=1.7, installed: 1.18.5]
      - six [required: Any, installed: 1.15.0]
    - keras-applications [required: >=1.0.8, installed: 1.0.8]
      - h5py [required: Any, installed: 2.10.0]
        - numpy [required: >=1.7, installed: 1.18.5]
        - six [required: Any, installed: 1.15.0]
      - numpy [required: >=1.9.1, installed: 1.18.5]
    - keras-preprocessing [required: >=1.0.5, installed: 1.1.2]
      - numpy [required: >=1.9.1, installed: 1.18.5]
      - six [required: >=1.9.0, installed: 1.15.0]
    - numpy [required: >=1.16.0,<1.19.0, installed: 1.18.5]
    - opt-einsum [required: >=2.3.2, installed: 3.3.0]
      - numpy [required: >=1.7, installed: 1.18.5]
    - protobuf [required: >=3.6.1, installed: 3.14.0]
      - six [required: >=1.9, installed: 1.15.0]
    - six [required: >=1.10.0, installed: 1.15.0]
    - tensorboard [required: >=1.15.0,<1.16.0, installed: 1.15.0]
      - absl-py [required: >=0.4, installed: 0.11.0]
        - six [required: Any, installed: 1.15.0]
      - grpcio [required: >=1.6.3, installed: 1.35.0]
        - six [required: >=1.5.2, installed: 1.15.0]
      - markdown [required: >=2.6.8, installed: 3.3.3]
        - importlib-metadata [required: Any, installed: 3.4.0]
          - typing-extensions [required: >=3.6.4, installed: 3.7.4.3]
          - zipp [required: >=0.5, installed: 3.4.0]
      - numpy [required: >=1.12.0, installed: 1.18.5]
      - protobuf [required: >=3.6.0, installed: 3.14.0]
        - six [required: >=1.9, installed: 1.15.0]
      - setuptools [required: >=41.0.0, installed: 53.0.0]
      - six [required: >=1.10.0, installed: 1.15.0]
      - werkzeug [required: >=0.11.15, installed: 1.0.1]
      - wheel [required: >=0.26, installed: 0.36.2]
    - tensorflow-estimator [required: ==1.15.1, installed: 1.15.1]
    - termcolor [required: >=1.1.0, installed: 1.1.0]
    - wheel [required: >=0.26, installed: 0.36.2]
    - wrapt [required: >=1.11.1, installed: 1.12.1]

@kba
Copy link
Contributor

kba commented Feb 5, 2021

File "/usr/local/bin/eynollah", line 11, in

Wait, your venv is not /usr/local, is it? Looks like you installed eynollah before without a virtualenv to /usr/local/bin/eynollah - can you move/remove that file? which eynollah should point to $VIRTUAL_ENV/bin/eynollah.

@cneud
Copy link
Member

cneud commented Feb 5, 2021

Argh, you are right!

I deactivated the venv and uninstalled eynollah.

Then I activated the venv again and installed again via pip, now which eynollah returns the correct path to the venv /home/cnd/tmp/dev/qurator/tools/venv-qurator/bin/eynollah but I am not getting any output anymore...(immediately exits with no message).

@cneud
Copy link
Member

cneud commented Feb 5, 2021

Apparently I had an older version installed to /usr/local/bin/eynollah - thanks to @kba amazing debugging skills we were able to track this down eventually and now #18 works for me (without any need to install seaborn or tqdm and with working -si parameter!).

@Jim-Salmons
Copy link

Here's an update on my recent experience installing eynollah natively on Windows 10:

  • Created a clean conda environment w/ Python 3.7+.
  • Cloned the latest eynollah GitHub source and installed locally with no problem.
  • Tried a CLI test run similar to what has been reported/suggested here and got the same result of no visible activity or file-written output before getting the standard blank PowerShell command prompt.
  • To take a closer look at what was going on, I installed the eynollah source at the root level of a Pycharm project.
  • I ran both pytest scripts, smoke and xml, in debug mode and observed them run line by line with no problems.
  • I then created a debug config in PyCharm to execute a command line statement.
  • When I did this, I observed the problem being an invisible to eynollah user exception tripped by the keras module 2.3.1 (and 2.4.3, too) which said that keras required tensorflow 2+ in trying to import tensorflow.keras.layers.experimental.preprocessing.
  • Unfortunately, the requirements.txt for eynollah specifies tf-gpu 1.15+ but less than 2.
  • To quickly check if downgrading keras would help I found that at least at the level of keras 2.2.5, tf of 2+ was required (at least under Windows).

If I have a chance today, I will try a clean install w/ tf 2+, keras 2.4.3, and a relaxed requirement for eynollah to accept this configuration. I suspect it not to work due to the major refactoring in tf 2+. If anyone has a better idea to suggest, please don't hesitate to advise me.

ITMT, I have updated my Windows dev box to the latest Docker using WSL2. I'm in the process of learning how to config PyCharm Pro to do remote/virtual debuggable coding from my Windows IDE working on a live Docker image. I want to get this going as it will let me work more easily with OCR-D and similar research projects while still having PyCharm under Windows which includes the Kite Pro coding assistance. Kite is super helpful due to my severely limited keyboarding abilities following a July spinal cord injury.

@cneud
Copy link
Member

cneud commented Feb 8, 2021

Hi @Jim-Salmons, thanks for sharing! I am in a bit of a hurry, but chances to get eynollah working will be much improved once we have completed the refactoring which should only take a few more weeks hopefully. Meanwhile, one must use a version of keras <2.4 (cf. #18) as newer versions will pull in Tensorflow 2 whereas the tool only works with Tensorflow version 1.15.x. For TF2, the code would need to be adapted and the models retrained.

@Jim-Salmons
Copy link

Hi Clemons @cneud - Thanks for the quick reply. Sounds like the best strategy is to wait for the next release. In the meantime I can get my sea-legs under me using PyCharm Pro under Windows on a Docker/WSL2 image. 🤪

@mikegerber
Copy link
Member

mikegerber commented Feb 9, 2021

File "/usr/local/bin/eynollah", line 11, in

Wait, your venv is not /usr/local, is it? Looks like you installed eynollah before without a virtualenv to /usr/local/bin/eynollah - can you move/remove that file? which eynollah should point to $VIRTUAL_ENV/bin/eynollah.

Careful, which eynolah can absolutely give $VIRTUAL_ENV/bin/eynollah while you're still calling /usr/local/bin/eynollah, because your shell might still be caching that eynollah is /usr/local/bin/eynollah. You need a rehash or open a fresh terminal in that case. I have been bitten by this more than once...

(There a few subleties: If this was @cneud's problem, he had called the /usr/local eynollah before activating the new venv/installing eynollah, so that this whole confusion is possible... 🔍)

@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 15, 2021

Thanks to everyone weighing in with input on this.

I made a fresh conda environment (Windows 10 OS) and took another go at it. I needed to install pip to install eynollah. But after some fails, I figured out that pip installed python 3.9. So I installed python 3.6. pip install . worked after that -- but not until I manually made the changes referenced above (#18).

I used msys64 + mingw64 to install the models successfully.

Running
eynollah -i C:/Users/Scott/Desktop/Python2/K/eyn_test/F073r.jpg -o C:/Users/Scott/Desktop/Python2/K/eyn_test/results -m C:/users/scott/desktop/python2/eynollah/models_eynollah -si C:/users/scott/desktop/python2/K/eyn_test/results
I got the following

File "C:\ProgramData\Miniconda3\envs\eenv\Scripts\eynollah.exe\__main__.py", line 4, in <module>
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\cli.py", line 2, in <module>
    from sbb_newspapers_org_image.eynollah import eynollah
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 31, in <module>
    from shapely import geometry
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\geometry\__init__.py", line 4, in <module>
    from .base import CAP_STYLE, JOIN_STYLE
    from shapely.coords import CoordinateSequence
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\coords.py", line 8, in <module>
    from shapely.geos import lgeos
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\geos.py", line 154, in <module>
    _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
  File "c:\programdata\miniconda3\envs\eenv\lib\ctypes\__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

After some dead ends, I googled the error line plus "shapely," and found a suggestion at another repo to simply
conda install -c conda-forge shapely
I no longer got that specific error. And eynollah --help works. (Yay!)

However, running the same command above, now I get:

The system cannot find the path specified.
'identify' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "c:\programdata\miniconda3\envs\eenv\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\programdata\miniconda3\envs\eenv\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Miniconda3\envs\eenv\Scripts\eynollah.exe\__main__.py", line 7, in <module>
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\cli.py", line 102, in main
    headers_off,
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 2978, in run
    is_image_enhanced, img_org, img_res, num_col_classifier, num_column_is_classified = self.resize_and_enhance_image_with_column_classifier(is_image_enhanced)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 419, in resize_and_enhance_image_with_column_classifier
    dpi = self.check_dpi()
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 298, in check_dpi
    return int(float(dpi))
ValueError: could not convert string to float:

I've triple-checked my paths, and they're fine. And I've poked around to try to understand where the ValueError is coming from. But yet to no avail. Any suggestions?

@vahidrezanezhad
Copy link
Member

Thanks to everyone weighing in with input on this.

I made a fresh conda environment (Windows 10 OS) and took another go at it. I needed to install pip to install eynollah. But after some fails, I figured out that pip installed python 3.9. So I installed python 3.6. pip install . worked after that -- but not until I manually made the changes referenced above (#18).

I used msys64 + mingw64 to install the models successfully.

Running
eynollah -i C:/Users/Scott/Desktop/Python2/K/eyn_test/F073r.jpg -o C:/Users/Scott/Desktop/Python2/K/eyn_test/results -m C:/users/scott/desktop/python2/eynollah/models_eynollah -si C:/users/scott/desktop/python2/K/eyn_test/results
I got the following

File "C:\ProgramData\Miniconda3\envs\eenv\Scripts\eynollah.exe\__main__.py", line 4, in <module>
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\cli.py", line 2, in <module>
    from sbb_newspapers_org_image.eynollah import eynollah
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 31, in <module>
    from shapely import geometry
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\geometry\__init__.py", line 4, in <module>
    from .base import CAP_STYLE, JOIN_STYLE
    from shapely.coords import CoordinateSequence
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\coords.py", line 8, in <module>
    from shapely.geos import lgeos
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\shapely\geos.py", line 154, in <module>
    _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
  File "c:\programdata\miniconda3\envs\eenv\lib\ctypes\__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

After some dead ends, I googled the error line plus "shapely," and found a suggestion at another repo to simply
conda install -c conda-forge shapely
I no longer got that specific error. And eynollah --help works. (Yay!)

However, running the same command above, now I get:

The system cannot find the path specified.
'identify' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "c:\programdata\miniconda3\envs\eenv\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\programdata\miniconda3\envs\eenv\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Miniconda3\envs\eenv\Scripts\eynollah.exe\__main__.py", line 7, in <module>
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\cli.py", line 102, in main
    headers_off,
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 2978, in run
    is_image_enhanced, img_org, img_res, num_col_classifier, num_column_is_classified = self.resize_and_enhance_image_with_column_classifier(is_image_enhanced)
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 419, in resize_and_enhance_image_with_column_classifier
    dpi = self.check_dpi()
  File "c:\programdata\miniconda3\envs\eenv\lib\site-packages\sbb_newspapers_org_image\eynollah.py", line 298, in check_dpi
    return int(float(dpi))
ValueError: could not convert string to float:

I've triple-checked my paths, and they're fine. And I've poked around to try to understand where the ValueError is coming from. But yet to no avail. Any suggestions?

I think this is because of getting dpi of image on windows. I can temporarily add an exception to resolve your problem (this can affect the result).

@vahidrezanezhad
Copy link
Member

I just updated eynollah. please check if it works or not.

@vahidrezanezhad
Copy link
Member

take care that this can happen on linux if the image directory is false. So be sure that the given image directory is correct.

@vahidrezanezhad
Copy link
Member

dpi_value_error_with_false_image_directory

@SB2020-eye
Copy link
Author

Thanks, @vahidrezanezhad . I checked the image path carefully again (fearful I had gotten it wrong and missed it over and over again), but it is correct. I might as well make sure -- are .jpg files okay? Any other restrictions on input images?

@vahidrezanezhad
Copy link
Member

Of course jpg files are ok (all kind of images are valid). Did you pull the latest eynollah? This error should be because of getting dpi on windows. Check please with the latest version and give me a feedback .

@SB2020-eye
Copy link
Author

...working on a new install in a new conda environment now...

@vahidrezanezhad
Copy link
Member

Just consider that this is a temporary solution and it will disturb performance of the code.

@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 15, 2021

Understood. :) When I run a command (even eynollah --help), it just gives me a new command line after 4-5 seconds (no output). EDIT: Nevermind! Sorry. I haven't done make models yet.

2ND EDIT: I spoke too soon (twice now). Models are in; doesn't make a difference (and I believe models shouldn't make a difference for eynollah --help anyways, I now realize). There is still no output -- just a new command line.

@SB2020-eye
Copy link
Author

(tl;dr the new version isn't working for me)

@kba
Copy link
Contributor

kba commented Feb 17, 2021

The DPI check is checked using the identify CLI from image magick. It seems unnecessary to do that and could be done with Pillow or opencv. But 37431d4 should have given a workaround. I don't understand why eynollah --help stopped working for you. What did you change?

@SB2020-eye
Copy link
Author

I started from scratch with a new Anaconda environment. One difference this time was that instead of having to go back and install python 3.6 to replace 3.9, I did this from the outset:
conda create --name env e2 python=3.6
Then activate e2
git clone https://github.com/qurator-spk/eynollah.git
cd eynollah
(I can't recall if I had to conda install pip at this point or not; but I think I did)
pip install .
At this point, it didn't work. (No output; fresh command line in 4-5 seconds. This includes running eynollah --help.)
I used MSYS64/Mingw64 to cd into eynollah folder and run make models. Same results.

I am glad to try it again. But before I do, I'll see if you see any red flags regarding what I did above.

Thanks!

@kba
Copy link
Contributor

kba commented Feb 17, 2021

No, that setup looks reasonable. Can you check out https://github.com/qurator-spk/eynollah/tree/refactor-cntd and install that? Among other things, this adds an overrideable log level switch --log-level.

Then try running eynollah on some image with --log-level DEBUG and post the output here.

Feel free to send me a DM in gitter to debug this further.

@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 17, 2021

@kba , thanks for responding. Writing out my previous post, I thought it well worth going ahead and setting things up exactly as I had before, without the "shortcut" of conda create --name env e2 python=3.6. It works now!

I will lay out my steps for installing (at least as of 2/17/2021) that evidently work for me, in case it helps anyone else:

(Windows 10 os, Anaconda environment)
(For my example, I will call my conda environment "my_env".)

conda create --name my_env
conda activate my_env
git clone https://github.com/qurator-spk/eynollah.git
cd eynollah
conda install pip
pip install .
conda install python=3.6
conda install shapely
This gets you to the point of having eynollah running... eynollah --help should work.

(Nothing has changed for me to add the models to run eynollah. From the outset, I needed a way to run a make file in Windows 10. For me, I set up msys64 and used the mingw64 terminal to: 1. cd to the eynollah repository folder, and 2. run make models. I am not very versed at all in this non-Python side of things, or else I would say more; I just did lots of googling, plodded through, and eventually it worked.)

@kba
Copy link
Contributor

kba commented Feb 17, 2021

🎉 good to hear it's working for you now and thanks for documenting the steps you needed. Good to close an issue with an actual solution.

About make models, you don't need to go through make just for that, all that target does is

wget 'https://qurator-data.de/eynollah/models_eynollah.tar.gz'
tar xf models_eynollah.tar.gz

i.e. download the tarball and extract it, nothing fancy.

Once the OCR-D bindings are in place and OCR-D/core#668 is merged, you will be able to download the models with ocrd resmgr download ocrd-eynollah '*'.

@kba kba closed this as completed Feb 17, 2021
@SB2020-eye
Copy link
Author

SB2020-eye commented Feb 18, 2021

"About make models, you don't need to go through make just for that"
Lol. Shows you what I know! (But now I know that it's worth looking at the file -- even in a "foreign language" to me -- before downloading heavier-lifting stuff.)

Great to hear about the anticipated bindings!

Lastly -- just fyi (and the issue can definitely stay closed):

  1. eynollah definitely works (and works well!)
  2. when running, I still get
The system cannot find the path specified.
'identify' is not recognized as an internal or external command,
operable program or batch file.

(In case it's helpful, my most recent run from terminal -- command and full output -- can be found here. You'll see this message come up twice.)

@kba
Copy link
Contributor

kba commented Feb 18, 2021

"About make models, you don't need to go through make just for that"
Lol. Shows you what I know! (But now I know that it's worth looking at the file -- even in a "foreign language" to me -- before downloading heavier-lifting stuff.)

Always a good idea. Be bold :)

  1. when running, I still get
The system cannot find the path specified.
'identify' is not recognized as an internal or external command,
operable program or batch file.

(In case it's helpful, my most recent run from terminal -- command and full output -- can be found here. You'll see this message come up twice.)

I've removed the identify call with OcrdExif in 8c603ae so no need to have imagemagick installed anymore once that's been merged.

@Jim-Salmons
Copy link

Just a quick placeholder note to be supplemented with a recipe detail to follow. I tried to do @SB2020-eye's recipe for getting eynollah running under Windows 10. I got close but no cigar. I incrementally addressed issues and eventually got this incredible library running on a page image from my project's interest in ground-truthing the 48 issues of Softalk magazine.

To update this issue, I wanted to ensure that I have a working/repeatable Windows installation process. With this goal of providing a repeatable install recipe, I also wanted to use @kba's latest refactor-cntd branch of the eynollah module to overcome the non-critical identify error.

The good news is that I do have a process with only one non-critical hiccup related to the 'si' cli parameter for saving extracted images from the target page. I'll detail the working recipe soon, but ITMT here's a quick question:

Q bkgnd: My current conda environment does not have tesseract OCR installed. I know I can easily get this into my current conda eynollah environment. The cli I have run is -

eynollah -i <path-to-target-img>\softalkv1n01sep1980_0007.jpg -o <path-to-target-img> -m <path-to-models_eynollah> -si <path-to-target-img>\imgs\

This runs and processes a PAGExml result (without the extracted saved images). The PAGExml file has the detailed coordinates for the text and image regions of the page, however as expected there is no OCR extracted text data in the TextLine/TextEquiv/Unicode elements that complement the Coords of those regions.

Here is a screenshot of the PAGxml result via PRImA's PAGE Viewer:

softalk_eynollah_page

Q: With tesseract installed (which version BTW, or other recommended OCR engine), what cli statement will generate this current result with those TextLine/TextEquiv/Unicode elements' data included in the PAGExml file?

Thanks @kba, @bertsky, @SB2020-eye for your ongoing interest and assistance. :-)

@kba
Copy link
Contributor

kba commented Feb 19, 2021

one non-critical hiccup related to the 'si' cli parameter

There is a new switch --enable-plotting/-ep that must be set for all the intermediary images to be written out. This needs some more work and then an update to the README.

what cli statement will generate this current result with those TextLine/TextEquiv/Unicode elements' data included in the PAGExml file?

Once the OCR-D interface (ocrd-eynollah) is in place, you can use any of the OCR engines we wrapped to do the actual text recognition, though I am not sure how well they work on Windows. I don't think that any OCR engines support segmentation input in PAGE-XML natively at the moment.

If you don't want to wait for the OCR-D interface, you can take the output of eynollah and add it to an OCR-D workspace and then run any of the above-mentioned engines on it:

eynollah ... -o . -i image1.png # will write to image1.xml
ocrd workspace init
ocrd workspace add -G IMG -i IMG_1 -g page1 image1.png
ocrd workspace add -G SEG -i SEG_1 -g page1 image1.xml
ocrd-tesserocr-recognize -P segmentation_level none -P textequiv_level line

@Jim-Salmons
Copy link

Great, thanks @kba, I'll try your recommendation. I'll also post the working Widows install recipe soon. Although I am super impressed with Calamari via my exposure to it through DATeCH, I'll stick to Tesseract ATM due to easier and previous successful use natively on Windows. BTW, I assume that the 5.0 alpha build is OCR-D compatible?

@kba
Copy link
Contributor

kba commented Feb 19, 2021

Great, thanks @kba, I'll try your recommendation. I'll also post the working Widows install recipe soon. Although I am super impressed with Calamari via my exposure to it through DATeCH, I'll stick to Tesseract ATM due to easier and previous successful use natively on Windows.

It is possible to run calamari in windows though, cf. https://github.com/Calamari-OCR/calamari/search?q=windows&type=issues Not sure how much effort it would be. A very nice feature of Calamari that sets it apart from tesseract is that it has multi-model/voting support built in.

BTW, I assume that the 5.0 alpha build is OCR-D compatible?

We rely on the https://github.com/sirfz/tesserocr python bindings and yes, they are compatible with 5.0 thanks to sirfz/tesserocr#242

@Jim-Salmons
Copy link

Jim-Salmons commented Feb 19, 2021

Hi @kba, I'll check out Calamari soon. ITMT, quick question... Does Tesseract only need to be installed as a native executable for cli invocation? Or do I need a Python module wrapper for script-wise calls? If the latter, I will likely try pytesseract. But my assumption is that ocr-d directly wraps and accesses the installed binary.

@kba
Copy link
Contributor

kba commented Feb 20, 2021

Or do I need a Python module wrapper for script-wise calls?

You need to install https://github.com/OCR-D/ocrd_tesserocr which uses the tesserocr python bindings. As such, we can use the full power of the low-level tesseract APIs, not just what the CLI exposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants