deleted letters from PDF even if those letters were present in the source document #27

jakubsiast · 2021-03-29T12:47:56Z

I used pdf-redactor to change some text in a pdf file, but in part of the pdf I've lost all the 'n' characters.
The affected text was not the one that I hoped to change. The text was handled in the "class TextToken" by the "str(self)" function as an unchanged text, i.e., it passes through condition "if self.value == self.original_value:". Nevertheless it has changed. What I managed to do is to track that the function to blame is "PdfString.from_bytes(...)" in line 379 of pdf_redactor.py:
# If unchanged, return the raw original value without decoding/encoding.
return PdfString.from_bytes(self.raw_original_value)
By forcing the encoding of the unchanged TextToken to 'hex' I managed to fix the issue:
return PdfString.from_bytes(self.raw_original_value, bytes_encoding = 'hex')
This simple change helped in my case, but I do not know if it is a general case. Can you try this and, eventually push this fix to your code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deleted letters from PDF even if those letters were present in the source document #27

deleted letters from PDF even if those letters were present in the source document #27

jakubsiast commented Mar 29, 2021

deleted letters from PDF even if those letters were present in the source document #27

deleted letters from PDF even if those letters were present in the source document #27

Comments

jakubsiast commented Mar 29, 2021