Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized Arguments Help Text vs `--help` Help Text, different output #402

Closed
jbolda opened this issue Jul 10, 2019 · 9 comments

Comments

@jbolda
Copy link

commented Jul 10, 2019

Describe the issue
I am trying to use the --redo-ocr argument which notes that it is unrecognized (looks like this was noted in #397). I am on version 8.3.1 so that is kind of odd. So I ran --help to check if I am using the argument wrong somehow, and I noticed that the argument list is different in --help and matches what the docs on the website note.

To Reproduce
What command line were you trying to run?

running:

ocrmypdf  --redo-ocr input.pdf output.pdf

returns:

usage: ocrmypdf [-h] [-l LANGUAGE] [--image-dpi DPI]
                [--output-type {pdfa,pdf,pdfa-1,pdfa-2}] [--sidecar [FILE]]
                [--version] [-j N] [-q] [-v [VERBOSE]] [--title TITLE]
                [--author AUTHOR] [--subject SUBJECT] [--keywords KEYWORDS]
                [-r] [--remove-background] [-d] [-c] [-i] [--oversample DPI]
                [-f] [-s] [--skip-big MPixels] [--max-image-mpixels MPixels]
                [--tesseract-config CFG] [--tesseract-pagesegmode PSM]
                [--tesseract-oem MODE]
                [--pdf-renderer {auto,tesseract,hocr,sandwich}]
                [--tesseract-timeout SECONDS]
                [--rotate-pages-threshold CONFIDENCE]
                [--pdfa-image-compression {auto,jpeg,lossless}]
                [--user-words FILE] [--user-patterns FILE] [--skip-repair]
                [-k] [-g] [--flowchart FLOWCHART]
                input_pdf_or_image output_pdf
ocrmypdf: error: unrecognized arguments: --redo-ocr

and running:

ocrmypdf --help

returns:

usage: ocrmypdf [-h] [-l LANGUAGE] [--image-dpi DPI]
                [--output-type {pdfa,pdf,pdfa-1,pdfa-2,pdfa-3}]
                [--sidecar [FILE]] [--version] [-j N] [-q] [-v [VERBOSE]]
                [--title TITLE] [--author AUTHOR] [--subject SUBJECT]
                [--keywords KEYWORDS] [-r] [--remove-background] [-d] [-c]
                [-i] [--unpaper-args UNPAPER_ARGS] [--oversample DPI]
                [--remove-vectors] [--mask-barcodes] [--threshold] [-f] [-s]
                [--redo-ocr] [--skip-big MPixels] [-O {0,1,2,3}]
                [--jpeg-quality Q] [--png-quality Q] [--jbig2-lossy]
                [--max-image-mpixels MPixels] [--tesseract-config CFG]
                [--tesseract-pagesegmode PSM] [--tesseract-oem MODE]
                [--pdf-renderer {auto,hocr,sandwich}]
                [--tesseract-timeout SECONDS]
                [--rotate-pages-threshold CONFIDENCE]
                [--pdfa-image-compression {auto,jpeg,lossless}]
                [--user-words FILE] [--user-patterns FILE] [-k]
                [--flowchart FLOWCHART]
                input_pdf_or_image output_pdf

[rest of output removed for clarity]

Expected behavior

  1. The --redo-ocr to not return unrecognized.
  2. The unrecognized argument help text and --help text to match.

System:

  • OS: Ubuntu v18.04 (via WSL on Windows)
  • OCRmyPDF Version: v8.3.1
@jbarlow83

This comment has been minimized.

Copy link
Owner

commented Jul 10, 2019

I suspect you ran that in two shell contexts that had different PATH settings, one picking up /usr/bin/ocrmypdf from Ubuntu 18.04, and one picking up a locally installed ocrmypdf.

@jbolda

This comment has been minimized.

Copy link
Author

commented Jul 10, 2019

These are run back to back on the same shell. I installed it for the first time yesterday, so it would also be surprising that I somehow have two different versions installed.

@jbarlow83

This comment has been minimized.

Copy link
Owner

commented Jul 10, 2019

My install procedure recommends installing the system package (v6.1.2 on Ubuntu 18.04) and then installing the most recent version, because the former ensures you get the non-Python dependencies.

The first copy of the help text is clearly v6.1.2. Note one difference in the output.

- [--output-type {pdfa,pdf,pdfa-1,pdfa-2}]
+ [--output-type {pdfa,pdf,pdfa-1,pdfa-2,pdfa-3}]

pdfa-3 support was added in v6.1.5.

I cannot say how this situation came about, but I can tell you that is v6.1.2.

@jbarlow83

This comment has been minimized.

Copy link
Owner

commented Jul 10, 2019

You could compare /usr/bin/ocrmypdf --version to ocrmypdf --version and also check find / -type f -name ocrmypdf to see where any installations are.

@jbolda

This comment has been minimized.

Copy link
Author

commented Jul 11, 2019

That makes a lot of sense. I completely forgot about the old version in the install instructions. I now realize what you meant by context now. Turns out WSL doesn't set up the user it creates in the sudoers group, so when running the command with sudo, it uses a root user path instead of the user path that is configured in the install instructions: export PATH=$HOME/.local/bin:$PATH. So this came about by running the command that errored with sudo and getting the old version, and immediately running the --help command without sudo and getting the new version.

Would you be interested in a quick doc PR to the Windows section clarifying this?

@jbolda jbolda closed this Jul 11, 2019

@jbarlow83

This comment has been minimized.

Copy link
Owner

commented Jul 11, 2019

If that's a documented difference for WSL compared to vanilla Ubuntu 18.04 then yes.

@jbolda

This comment has been minimized.

Copy link
Author

commented Jul 22, 2019

I don't know if "difference" is really the appropriate term, but WSL requires you to set up a new user the first time you run it. This is a user separate from root, but still in the sudoers group. Running these commands on the mounted Windows filesystem, you need higher privileges. I have mostly used *nix on VPSs and such beginning from root. Due to my lack of familiarity with the inner workings of sudo, I didn't realize it actually changed the user the command runs as (and in turn the path).

So my intent was just a quick note for Windows users in my situation to avoid the sudo gotcha. It would have been quite a bit more obvious had the command failed, but having an older version on the system installation obfuscates the issue.

Up to you if you want it or not. Regardless, thanks again for your help.

@jbarlow83

This comment has been minimized.

Copy link
Owner

commented Jul 22, 2019

I think a procedure along those lines would be helpful. I tried out WSL myself and added some procedure; I found it tricky to get consistent behavior. Feel free to expand it.

@jbolda

This comment has been minimized.

Copy link
Author

commented Aug 13, 2019

I think your updated procedure makes a lot of sense to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.