Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Support for reading PDFs with partial DRM (AES) - include PyCryptodome dependency #28

Open
montge opened this issue Feb 16, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@montge
Copy link

montge commented Feb 16, 2024

Description
When attempting to read PDF files that have partial DRM capabilities (e.g., Printing, Content Copying, and Content Copying for Accessibility allowed), the operation fails when reading local files with the following error message: "Failed to load file <filename.pdf> with error: PyCryptodome is required for AES algorithm. Skipping..." This issue arises due to the absence of the PyCryptodome library, which is necessary for handling AES encryption used by these DRM features.

Expected Behavior
The expected behavior is that the project should be able to read PDF files, including those with partial DRM capabilities, without throwing errors related to the absence of cryptographic support. Users should be able to process such PDFs for legitimate use cases, such as reading text for accessibility purposes, where the use complies with the DRM's allowances. Note if there is a restriction that would prevent reading the file, an error should still be thrown stating that the necessary DRM permissions do not allow reading of this document.

Actual Behavior
The actual behavior is that when attempting to read a PDF with partial DRM capabilities, the process is aborted due to the missing PyCryptodome dependency, and the file cannot be read or processed further.

Steps to Reproduce
Attempt to read a PDF file with partial DRM capabilities using the project.
Observe the error message indicating the absence of PyCryptodome for AES algorithm support.

Suggested Enhancement
To resolve this issue and enhance the capability to read a wider range of PDF files, suggest including PyCryptodome as a dependency/requirement within the project's Python implementation.

Additional Context
The ability to read PDFs with partial DRM is crucial for various legitimate use cases, including accessibility and content analysis, where the user is not infringing on the copyright or DRM protections but merely accessing the content in a manner that the DRM allows (e.g., reading for visually impaired users), or where legal and necessary references are provided in their document.

@Jason-XII
Copy link

I'm facing this problem too!

@dayjobtitus
Copy link

Same situation and I agree with the statement in the "Additional Context" section above.

@kedarpotdar-nv
Copy link
Collaborator

@montge thanks, we will consider this feature request.

@kedarpotdar-nv kedarpotdar-nv added the enhancement New feature or request label Feb 26, 2024
@sarsharoid
Copy link

@montge any ETA for this requirement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants