Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: '/BitsPerComponent' #100

Closed
AEgit opened this issue Oct 13, 2016 · 4 comments
Closed

KeyError: '/BitsPerComponent' #100

AEgit opened this issue Oct 13, 2016 · 4 comments

Comments

@AEgit
Copy link

AEgit commented Oct 13, 2016

The following file gives me the error "KeyError: '/BitsPerComponent'", when trying to ocr it:

https://app.box.com/s/o1cjmt6y6mn2aqiusmi8g2kjbgw8z5jo

My current workaround is to print the file as a PDF. Then it is possible to ocr it with ocrmypdf.

ocrmypdf -v -l eng Cosgriff1969.pdf Cosgriff1969_ocr.pdf
  DEBUG - os.symlink(Cosgriff1969.pdf, /tmp/com.github.ocrmypdf.il6nhpa6/origin)

________________________________________
Tasks which will be run:


Task enters queue = 'ocrmypdf.triage'
  DEBUG - os.symlink(/tmp/com.github.ocrmypdf.il6nhpa6/origin, /tmp/com.github.ocrmypdf.il6nhpa6/origin.pdf)
Completed Task = 'ocrmypdf.triage'
Task enters queue = 'ocrmypdf.repair_pdf'
  DEBUG -



Original exception:

    Exception #1
      'builtins.KeyError('/BitsPerComponent')' raised in ...
       Task = def ocrmypdf.repair_pdf(...):
       Job  = [.../com.github.ocrmypdf.il6nhpa6/origin.pdf -> .../com.github.ocrmypdf.il6nhpa6/origin.repaired.pdf, <ocrmypdf.WrappedLogger>, [], <_thread.lock>]

    Traceback (most recent call last):
      File "/usr/local/lib/python3.4/dist-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
        register_cleanup, touch_files_only)
      File "/usr/local/lib/python3.4/dist-packages/ruffus/task.py", line 567, in job_wrapper_io_files
        ret_val = user_defined_work_func(*params)
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/__main__.py", line 568, in repair_pdf
        pdfinfo.extend(pdf_get_all_pageinfo(output_file))
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 378, in pdf_get_all_pageinfo
        return [_pdf_get_pageinfo(infile, n) for n in range(pdf.numPages)]
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 378, in <listcomp>
        return [_pdf_get_pageinfo(infile, n) for n in range(pdf.numPages)]
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 361, in _pdf_get_pageinfo
        page, pageinfo, contentsinfo)]
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 360, in <listcomp>
        pageinfo['images'] = [im for im in _find_page_images(
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 314, in _find_page_images
        yield from _find_page_regular_images(page, pageinfo, contentsinfo)
      File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/pageinfo.py", line 247, in _find_page_regular_images
        image['bpc'] = pdfimage['/BitsPerComponent']
      File "/usr/local/lib/python3.4/dist-packages/PyPDF2/generic.py", line 516, in __getitem__
        return dict.__getitem__(self, key).getObject()
    KeyError: '/BitsPerComponent'

@jbarlow83
Copy link
Collaborator

fixed in v4.2.5

I still got some unusual Ghostscript (9.19) errors of this form

 ERROR -   ./base/gsicc_manage.c:1126: gsicc_open_search(): Could not find srgb.icc 
+ ./base/gsicc_manage.c:1024: gsicc_get_profile_handle_file(): Creation of ICC profile failed

but despite the errors the output file seems to be okay. I don't know what to do with those errors either, since the file it is looking for exists.

Please let me know if you run into anything similar.

@AEgit
Copy link
Author

AEgit commented Oct 13, 2016

Thanks a lot! Once version 4.2.5 is available via pip3 I'll install it and report if I encounter similar issues.

@AEgit
Copy link
Author

AEgit commented Oct 15, 2016

Just tried the new version 4.2.5. I get the same errors as you, but, as you mentioned, the output file seems to be fine, so I guess this issue can be considered solved (?).

@AEgit AEgit closed this as completed Oct 15, 2016
@AEgit
Copy link
Author

AEgit commented Oct 15, 2016

Thanks again for this very helpful tool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants