Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCT Decoding Error #101

Closed
admercs opened this issue Sep 16, 2022 · 8 comments
Closed

DCT Decoding Error #101

admercs opened this issue Sep 16, 2022 · 8 comments
Labels
enhancement New feature or request needs example needs PDF fail to prove the issue

Comments

@admercs
Copy link

admercs commented Sep 16, 2022

I'm getting the following error:

$ document.pdf
ERROR:root:Partially decoded. Filters applied: []
Traceback (most recent call last):
  File "/HOME/quicksand/lib/python3.6/site-packages/pdfreader/types/native.py", line 55, in apply_filter_multi
    binary = apply_filter(fname, binary, params)
  File "/HOME/quicksand/lib/python3.6/site-packages/pdfreader/filters/__init__.py", line 14, in apply_filter
    return decoder.decode(binary, params or {})
  File "/HOME/quicksand/lib/python3.6/site-packages/pdfreader/filters/dct.py", line 5, in decode
    raise NotImplementedError('DCTDecode')
NotImplementedError: DCTDecode

Any idea how to resolve it?

@maxpmaxp
Copy link
Owner

maxpmaxp commented Nov 5, 2022

DCT decoder is not supported at this point. Feel free to contribute.

@maxpmaxp maxpmaxp added the enhancement New feature or request label Nov 5, 2022
@maxpmaxp
Copy link
Owner

maxpmaxp commented Nov 5, 2022

@admercs can you share the file please? I can try to add the decoder.

@admercs
Copy link
Author

admercs commented Nov 5, 2022 via email

@canbolukbas
Copy link

What would you think about not raising an error, but returning the same bytes back to the caller?

pypdf library does this : https://github.com/py-pdf/pypdf/blob/e92b20e0b35e4feb5a2a7f347de7a4c3f713011a/pypdf/filters.py#L510

LMK if you want me to create the MR, I'd be happy to contribute.

@maxpmaxp
Copy link
Owner

maxpmaxp commented May 3, 2024

@canbolukbas

Raw stream data can be accessed directly for any Stream object, use obj.stream instead of obj.filtered . See

self.dictionary = info_dict
self.stream = binary_stream

This should work for any Image object, as technically it's a descendant of Stream.

As for the suggestion to return raw data with unimplemented filters - I see pros and cons. Ideally we need to have this decoder implemented. Feel free to create a PR and contribute.

@maxpmaxp maxpmaxp added the needs example needs PDF fail to prove the issue label May 3, 2024
@maxpmaxp
Copy link
Owner

maxpmaxp commented May 3, 2024

@canbolukbas can you also attach your file please? I don't have PDFs with DCT streams. Thanks!

@maxpmaxp
Copy link
Owner

maxpmaxp commented May 3, 2024

Just realized that it's a very trivial patch. It's on master. The support added on #132

@maxpmaxp maxpmaxp closed this as completed May 3, 2024
@mara004
Copy link

mara004 commented May 3, 2024

can you also attach your file please? I don't have PDFs with DCT streams.

For what it's worth, DCT corresponds to JPEG, so should be trivial to create a sample. Just run img2pdf on an arbitrary JPEG image from the web, or drag one into Libreoffice and export to PDF.
If you have some PDFs on your disk, it's quite likely there will be one with DCT, as it's basically the most common PDF image encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs example needs PDF fail to prove the issue
Projects
None yet
Development

No branches or pull requests

4 participants