[#3546] Fix issue applying censor rule to binary data #5474

gbp · 2019-11-21T12:52:44Z

Relevant issue(s)

What does this do?

The original implementation assumed the binary would be in ASCII-8BIT
encoding but this is not always the case. I'm unsure at this stage if
something has changed, such as how uncompressed data is extracted from
to the pdftk tool, or if this case has never been handled.

This change forces censor rule into the same encoding and ensures the
correct number of replacement bytes are used.

Why was this needed?

Increasing amount of exception emails.

spec/models/censor_rule_spec.rb

gbp · 2019-11-22T09:20:49Z

@garethrees this is green so should fix the exception but I'm not 100% sure this is the right change to make. For instance why is regexp being converted to ASCII-8BIT in the first place?

If we didn't convert then we probably could do:

    binary_to_censor.gsub(to_replace(binary_to_censor.encoding)) do |match|
      match.gsub(single_char_regexp, 'x')
    end

app/models/censor_rule.rb

lib/alaveteli_text_masker.rb

garethrees

Looks great! Minor comments but happy to merge as-is or you can fix and then merge – no need for re-review.

I've found another case that I think we may want to address, but have created a new issue for that to avoid blocking this getting out.

app/models/censor_rule.rb

spec/lib/alaveteli_text_masker_spec.rb

garethrees · 2019-12-11T12:21:10Z

Also fixes #4064 (updated PR description)

The original implementation assumed the binary would be in ASCII-8BIT encoding but this is not always the case. I'm unsure at this stage if something has changed, such as how uncompressed data is extracted from to the pdftk tool, or if this case has never been handled. This change forces the data into ASCII-8BIT and back into the original encoding regardless of what it initially was.

gbp · 2020-01-29T10:14:28Z

Before merging, as this will expose more attachments (instead of erroring) we want to check existing censor rules apply correctly in these new cases.

gbp · 2020-07-20T11:03:40Z

#5821 unblocks this.

gbp added the 3 - current sprint label Nov 21, 2019

gbp self-assigned this Nov 21, 2019

gbp added this to In progress in transparency-current-sprint via automation Nov 21, 2019

houndci-bot reviewed Nov 21, 2019

View reviewed changes

spec/models/censor_rule_spec.rb Outdated Show resolved Hide resolved

mysociety-pusher force-pushed the 3546-encoding-compatibility-error branch from 37557fd to c713a21 Compare November 21, 2019 14:55

gbp changed the title ~~[WiP] [#3546] Fix issue applying censor rule to binary data~~ [#3546] Fix issue applying censor rule to binary data Nov 22, 2019

gbp marked this pull request as ready for review November 22, 2019 09:16

gbp requested a review from garethrees November 22, 2019 09:16

garethrees added awaiting-review and removed 3 - current sprint labels Nov 27, 2019

garethrees assigned garethrees and unassigned gbp Nov 27, 2019

garethrees reviewed Nov 28, 2019

View reviewed changes

app/models/censor_rule.rb Outdated Show resolved Hide resolved

garethrees assigned gbp and unassigned garethrees Nov 28, 2019

garethrees added 3 - current sprint and removed awaiting-review labels Nov 28, 2019

houndci-bot reviewed Nov 29, 2019

View reviewed changes

lib/alaveteli_text_masker.rb Outdated Show resolved Hide resolved

gbp added awaiting-review and removed 3 - current sprint labels Dec 2, 2019

gbp moved this from In progress to Awaiting review in transparency-current-sprint Dec 2, 2019

gbp assigned garethrees and unassigned gbp Dec 2, 2019

garethrees mentioned this pull request Dec 11, 2019

Apply masks to UTF-8 binary data #5497

Open

garethrees reviewed Dec 11, 2019

View reviewed changes

app/models/censor_rule.rb Outdated Show resolved Hide resolved

spec/lib/alaveteli_text_masker_spec.rb Show resolved Hide resolved

garethrees assigned gbp and unassigned garethrees Dec 11, 2019

garethrees added awaiting-deploy and removed awaiting-review labels Dec 11, 2019

garethrees moved this from Awaiting review to Reviewer approved in transparency-current-sprint Dec 11, 2019

garethrees approved these changes Dec 11, 2019

View reviewed changes

gbp force-pushed the 3546-encoding-compatibility-error branch from 1ccfcab to cfe9033 Compare December 16, 2019 09:24

gbp force-pushed the 3546-encoding-compatibility-error branch from cfe9033 to e7f0b30 Compare December 16, 2019 13:58

gbp added the has-blockers label Jan 29, 2020

garethrees mentioned this pull request Jul 14, 2020

[#3546] Identify broken binary censor rules #5821

Merged

gbp removed the has-blockers label Jul 20, 2020

gbp merged commit 5bf0d01 into develop Jul 20, 2020

transparency-current-sprint automation moved this from Reviewer approved to Done Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#3546] Fix issue applying censor rule to binary data #5474

[#3546] Fix issue applying censor rule to binary data #5474

gbp commented Nov 21, 2019 •

edited by garethrees

gbp commented Nov 22, 2019

garethrees left a comment

garethrees commented Dec 11, 2019

gbp commented Jan 29, 2020

gbp commented Jul 20, 2020

[#3546] Fix issue applying censor rule to binary data #5474

[#3546] Fix issue applying censor rule to binary data #5474

Conversation

gbp commented Nov 21, 2019 • edited by garethrees

Relevant issue(s)

What does this do?

Why was this needed?

gbp commented Nov 22, 2019

garethrees left a comment

Choose a reason for hiding this comment

garethrees commented Dec 11, 2019

gbp commented Jan 29, 2020

gbp commented Jul 20, 2020

gbp commented Nov 21, 2019 •

edited by garethrees