Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleted letters from PDF even if those letters were present in the source document #27

Open
jakubsiast opened this issue Mar 29, 2021 · 0 comments

Comments

@jakubsiast
Copy link

I used pdf-redactor to change some text in a pdf file, but in part of the pdf I've lost all the 'n' characters.
The affected text was not the one that I hoped to change. The text was handled in the "class TextToken" by the "str(self)" function as an unchanged text, i.e., it passes through condition "if self.value == self.original_value:". Nevertheless it has changed. What I managed to do is to track that the function to blame is "PdfString.from_bytes(...)" in line 379 of pdf_redactor.py:
# If unchanged, return the raw original value without decoding/encoding.
return PdfString.from_bytes(self.raw_original_value)
By forcing the encoding of the unchanged TextToken to 'hex' I managed to fix the issue:
return PdfString.from_bytes(self.raw_original_value, bytes_encoding = 'hex')
This simple change helped in my case, but I do not know if it is a general case. Can you try this and, eventually push this fix to your code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant