Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized Arguments Help Text vs --help Help Text, different output #402

Closed
jbolda opened this issue Jul 10, 2019 · 9 comments
Closed
Labels
user config Seems to be a problem specific to user configuration

Comments

@jbolda
Copy link

jbolda commented Jul 10, 2019

Describe the issue
I am trying to use the --redo-ocr argument which notes that it is unrecognized (looks like this was noted in #397). I am on version 8.3.1 so that is kind of odd. So I ran --help to check if I am using the argument wrong somehow, and I noticed that the argument list is different in --help and matches what the docs on the website note.

To Reproduce
What command line were you trying to run?

running:

ocrmypdf  --redo-ocr input.pdf output.pdf

returns:

usage: ocrmypdf [-h] [-l LANGUAGE] [--image-dpi DPI]
                [--output-type {pdfa,pdf,pdfa-1,pdfa-2}] [--sidecar [FILE]]
                [--version] [-j N] [-q] [-v [VERBOSE]] [--title TITLE]
                [--author AUTHOR] [--subject SUBJECT] [--keywords KEYWORDS]
                [-r] [--remove-background] [-d] [-c] [-i] [--oversample DPI]
                [-f] [-s] [--skip-big MPixels] [--max-image-mpixels MPixels]
                [--tesseract-config CFG] [--tesseract-pagesegmode PSM]
                [--tesseract-oem MODE]
                [--pdf-renderer {auto,tesseract,hocr,sandwich}]
                [--tesseract-timeout SECONDS]
                [--rotate-pages-threshold CONFIDENCE]
                [--pdfa-image-compression {auto,jpeg,lossless}]
                [--user-words FILE] [--user-patterns FILE] [--skip-repair]
                [-k] [-g] [--flowchart FLOWCHART]
                input_pdf_or_image output_pdf
ocrmypdf: error: unrecognized arguments: --redo-ocr

and running:

ocrmypdf --help

returns:

usage: ocrmypdf [-h] [-l LANGUAGE] [--image-dpi DPI]
                [--output-type {pdfa,pdf,pdfa-1,pdfa-2,pdfa-3}]
                [--sidecar [FILE]] [--version] [-j N] [-q] [-v [VERBOSE]]
                [--title TITLE] [--author AUTHOR] [--subject SUBJECT]
                [--keywords KEYWORDS] [-r] [--remove-background] [-d] [-c]
                [-i] [--unpaper-args UNPAPER_ARGS] [--oversample DPI]
                [--remove-vectors] [--mask-barcodes] [--threshold] [-f] [-s]
                [--redo-ocr] [--skip-big MPixels] [-O {0,1,2,3}]
                [--jpeg-quality Q] [--png-quality Q] [--jbig2-lossy]
                [--max-image-mpixels MPixels] [--tesseract-config CFG]
                [--tesseract-pagesegmode PSM] [--tesseract-oem MODE]
                [--pdf-renderer {auto,hocr,sandwich}]
                [--tesseract-timeout SECONDS]
                [--rotate-pages-threshold CONFIDENCE]
                [--pdfa-image-compression {auto,jpeg,lossless}]
                [--user-words FILE] [--user-patterns FILE] [-k]
                [--flowchart FLOWCHART]
                input_pdf_or_image output_pdf

[rest of output removed for clarity]

Expected behavior

  1. The --redo-ocr to not return unrecognized.
  2. The unrecognized argument help text and --help text to match.

System:

  • OS: Ubuntu v18.04 (via WSL on Windows)
  • OCRmyPDF Version: v8.3.1
@jbarlow83
Copy link
Collaborator

I suspect you ran that in two shell contexts that had different PATH settings, one picking up /usr/bin/ocrmypdf from Ubuntu 18.04, and one picking up a locally installed ocrmypdf.

@jbarlow83 jbarlow83 added the user config Seems to be a problem specific to user configuration label Jul 10, 2019
@jbolda
Copy link
Author

jbolda commented Jul 10, 2019

These are run back to back on the same shell. I installed it for the first time yesterday, so it would also be surprising that I somehow have two different versions installed.

@jbarlow83
Copy link
Collaborator

My install procedure recommends installing the system package (v6.1.2 on Ubuntu 18.04) and then installing the most recent version, because the former ensures you get the non-Python dependencies.

The first copy of the help text is clearly v6.1.2. Note one difference in the output.

- [--output-type {pdfa,pdf,pdfa-1,pdfa-2}]
+ [--output-type {pdfa,pdf,pdfa-1,pdfa-2,pdfa-3}]

pdfa-3 support was added in v6.1.5.

I cannot say how this situation came about, but I can tell you that is v6.1.2.

@jbarlow83
Copy link
Collaborator

You could compare /usr/bin/ocrmypdf --version to ocrmypdf --version and also check find / -type f -name ocrmypdf to see where any installations are.

@jbolda
Copy link
Author

jbolda commented Jul 11, 2019

That makes a lot of sense. I completely forgot about the old version in the install instructions. I now realize what you meant by context now. Turns out WSL doesn't set up the user it creates in the sudoers group, so when running the command with sudo, it uses a root user path instead of the user path that is configured in the install instructions: export PATH=$HOME/.local/bin:$PATH. So this came about by running the command that errored with sudo and getting the old version, and immediately running the --help command without sudo and getting the new version.

Would you be interested in a quick doc PR to the Windows section clarifying this?

@jbolda jbolda closed this as completed Jul 11, 2019
@jbarlow83
Copy link
Collaborator

If that's a documented difference for WSL compared to vanilla Ubuntu 18.04 then yes.

@jbolda
Copy link
Author

jbolda commented Jul 22, 2019

I don't know if "difference" is really the appropriate term, but WSL requires you to set up a new user the first time you run it. This is a user separate from root, but still in the sudoers group. Running these commands on the mounted Windows filesystem, you need higher privileges. I have mostly used *nix on VPSs and such beginning from root. Due to my lack of familiarity with the inner workings of sudo, I didn't realize it actually changed the user the command runs as (and in turn the path).

So my intent was just a quick note for Windows users in my situation to avoid the sudo gotcha. It would have been quite a bit more obvious had the command failed, but having an older version on the system installation obfuscates the issue.

Up to you if you want it or not. Regardless, thanks again for your help.

@jbarlow83
Copy link
Collaborator

I think a procedure along those lines would be helpful. I tried out WSL myself and added some procedure; I found it tricky to get consistent behavior. Feel free to expand it.

@jbolda
Copy link
Author

jbolda commented Aug 13, 2019

I think your updated procedure makes a lot of sense to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user config Seems to be a problem specific to user configuration
Projects
None yet
Development

No branches or pull requests

2 participants