New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#3546] Fix issue applying censor rule to binary data #5474
Conversation
37557fd
to
c713a21
Compare
@garethrees this is green so should fix the exception but I'm not 100% sure this is the right change to make. For instance why is regexp being converted to ASCII-8BIT in the first place? If we didn't convert then we probably could do: binary_to_censor.gsub(to_replace(binary_to_censor.encoding)) do |match|
match.gsub(single_char_regexp, 'x')
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Minor comments but happy to merge as-is or you can fix and then merge – no need for re-review.
I've found another case that I think we may want to address, but have created a new issue for that to avoid blocking this getting out.
Also fixes #4064 (updated PR description) |
1ccfcab
to
cfe9033
Compare
The original implementation assumed the binary would be in ASCII-8BIT encoding but this is not always the case. I'm unsure at this stage if something has changed, such as how uncompressed data is extracted from to the pdftk tool, or if this case has never been handled. This change forces the data into ASCII-8BIT and back into the original encoding regardless of what it initially was.
cfe9033
to
e7f0b30
Compare
Before merging, as this will expose more attachments (instead of erroring) we want to check existing censor rules apply correctly in these new cases. |
#5821 unblocks this. |
Relevant issue(s)
Fixes #3546
Fixes #4064
What does this do?
The original implementation assumed the binary would be in ASCII-8BIT
encoding but this is not always the case. I'm unsure at this stage if
something has changed, such as how uncompressed data is extracted from
to the pdftk tool, or if this case has never been handled.
This change forces censor rule into the same encoding and ensures the
correct number of replacement bytes are used.
Why was this needed?
Increasing amount of exception emails.