Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move scrubbing logger to logx #1319

Merged
merged 1 commit into from
Sep 27, 2023
Merged

refactor: move scrubbing logger to logx #1319

merged 1 commit into from
Sep 27, 2023

Conversation

bassosimone
Copy link
Contributor

@bassosimone bassosimone commented Sep 27, 2023

This diff is yak shaving for a subsequent diff that attempts to mitigate potential IP addresses leaks caused by using happy eyeballs more aggressively in the codebase. The general idea is that we previously had a netxlite implementation that gave IPv4 preference over IPv6, meaning that we would end up using IPv4 in most cases and very rarely IPv6. As part of my work to make beacons possible, I have introduced happy eyeballs, which means we may sometimes use IPv4 instead of IPv6 if we adopt happy eyeballs more widely. This fact makes it more likely that we include the IPv6 address of a probe when we know its IPv4 address or the other way around into measurements. To mitigate this possible issue before it actually is possible (note that I have not yet changed how we discover the probe IP), I am going to proactively modify the serialization of HTTP bodies and headers to scrub IP endpoints unconditionally. However, to do this, I need to decouple the scrubber package from model such that internal/model/archival.go can use the scrubber package.

Part of ooni/probe#2531

This diff is yak shaving for a subsequent diff that attempts to
mitigate potential IP addresses leaks caused by using happy eyeballs
more aggressively in the codebase. The general idea is that we
previously had a netxlite implementation that gave IPv4 preference
over IPv6, meaning that we would end up using IPv4 in most cases
and very rarely IPv6. As part of my work to make beacons possible,
I have introduced happy eyeballs, which means we may sometimes
use IPv4 instead of IPv6. This fact makes it more likely that we
include the IPv6 address of a probe when we know its IPv4 address
or the other way around into measurements. To mitigate this possible
issue before it actually is possible (note that I have not yet
changed how we discover the probe IP), I am going to proactively
modify the serialization of HTTP bodies and headers to scrub. However,
to do this, I need to decouple the scrubber package from model such
that internal/model/archival.go can use the scrubber package.

Part of ooni/probe#2531
@bassosimone bassosimone merged commit 4a86c54 into master Sep 27, 2023
6 of 8 checks passed
@bassosimone bassosimone deleted the issue/2531 branch September 27, 2023 16:43
bassosimone added a commit that referenced this pull request Sep 28, 2023
There are cases where we know we have binary data in input and we
want the output to be binary data using the dictionary encoding like
`{"format":"base64","data":"..."}`.

Such cases are, for example, DNS messages bodies.

There is no need for us to pass through the MaybeArchivalBinaryData
in such cases. Additionally, this makes MaybeArchivalBinaryData a bit
more complex than it probably needs to be.

What's more, ArchivalBinaryData is for when you do not require to
scrub the data. I want to introduce new data types that automatically
perform scrubbing when they're used.

But this puts even more pressure on MaybeArchivalBinaryData, and
hence this commit, to start differentiating between:

1. always binary data vs maybe binary data

2. no scrubbing requires vs scrubbing required

The rationale for doing this set of changes has been explained in
#1319.

The reference issue is ooni/probe#2531.

For now, this commit just adds the new type and tests for it
without using the type, which we'll do later once we have added
better marshal/unmarshal testing for the interested types.
bassosimone added a commit that referenced this pull request Sep 28, 2023
There are cases where we know we have binary data in input and we want
the output to be binary data using the dictionary encoding like
`{"format":"base64","data":"..."}`.

Such cases are, for example, DNS messages bodies.

There is no need for us to pass through the MaybeArchivalBinaryData in
such cases. Additionally, this makes MaybeArchivalBinaryData a bit more
complex than it probably needs to be.

What's more, ArchivalBinaryData is for when you do not require to scrub
the data. I want to introduce new data types that automatically perform
scrubbing when they're used.

But this puts even more pressure on MaybeArchivalBinaryData, and hence
this commit, to start differentiating between:

1. always binary data vs maybe binary data

2. no scrubbing required vs scrubbing required

The rationale for doing this set of changes has been explained in
#1319.

The reference issue is ooni/probe#2531.

For now, this commit just adds the new type and tests for it without
using the type, which we'll do later once we have added better
marshal/unmarshal testing for the interested types.
bassosimone added a commit that referenced this pull request Sep 28, 2023
I am about to modify this structure to unconditionally use ArchivalBinaryData
rather than ArchivalMaybeBinaryData, following the plan that I explained in
#1319.

Before doing that, I want MORE test coverage for ArchivalTLSOrQUICHandshakeResult.

Part of ooni/probe#2531
bassosimone added a commit that referenced this pull request Sep 28, 2023
I am about to modify this structure to unconditionally use
ArchivalBinaryData rather than ArchivalMaybeBinaryData, following the
plan that I explained in #1319.

Before doing that, I want MORE test coverage for
ArchivalTLSOrQUICHandshakeResult.

Part of ooni/probe#2531
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
This diff is yak shaving for a subsequent diff that attempts to mitigate
potential IP addresses leaks caused by using happy eyeballs more
aggressively in the codebase. The general idea is that we previously had
a netxlite implementation that gave IPv4 preference over IPv6, meaning
that we would end up using IPv4 in most cases and very rarely IPv6. As
part of my work to make beacons possible, I have introduced happy
eyeballs, which means we may sometimes use IPv4 instead of IPv6 if we
adopt happy eyeballs more widely. This fact makes it more likely that we
include the IPv6 address of a probe when we know its IPv4 address or the
other way around into measurements. To mitigate this possible issue
before it actually is possible (note that I have not yet changed how we
discover the probe IP), I am going to proactively modify the
serialization of HTTP bodies and headers to scrub IP endpoints
unconditionally. However, to do this, I need to decouple the scrubber
package from model such that internal/model/archival.go can use the
scrubber package.

Part of ooni/probe#2531
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
There are cases where we know we have binary data in input and we want
the output to be binary data using the dictionary encoding like
`{"format":"base64","data":"..."}`.

Such cases are, for example, DNS messages bodies.

There is no need for us to pass through the MaybeArchivalBinaryData in
such cases. Additionally, this makes MaybeArchivalBinaryData a bit more
complex than it probably needs to be.

What's more, ArchivalBinaryData is for when you do not require to scrub
the data. I want to introduce new data types that automatically perform
scrubbing when they're used.

But this puts even more pressure on MaybeArchivalBinaryData, and hence
this commit, to start differentiating between:

1. always binary data vs maybe binary data

2. no scrubbing required vs scrubbing required

The rationale for doing this set of changes has been explained in
ooni#1319.

The reference issue is ooni/probe#2531.

For now, this commit just adds the new type and tests for it without
using the type, which we'll do later once we have added better
marshal/unmarshal testing for the interested types.
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
I am about to modify this structure to unconditionally use
ArchivalBinaryData rather than ArchivalMaybeBinaryData, following the
plan that I explained in ooni#1319.

Before doing that, I want MORE test coverage for
ArchivalTLSOrQUICHandshakeResult.

Part of ooni/probe#2531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant