Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RiseupVPN tests improvements #1125

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

cyBerta
Copy link
Contributor

@cyBerta cyBerta commented Mar 20, 2023

Checklist

Description

This MR aims to improve the data quality of RiseupVPN tests by reduce the amount of false positives due to server side issues.
Major changes are:

  • API failures are reported, but not necessarily a reason to stop the test early
  • invalid CA's are indicated but not considered a failure and the tests will continue without TLS verification
  • a blocked API will be indicated as well, but tried to be circumvented with a Tor+Snowflake tunnel.
  • the API is only considered blocked if the main configuration end point is unreachable, RiseupVPN has multiple ways to get around other other API blockages or server side misbehavior (targets riseupvpn: https://api.black.riseup.net:9001/json: connection_refused probe#1842 (comment))
  • overloaded gateways are not tested anymore
  • only a subset of gateways are tested

@cyBerta cyBerta changed the title Riseup tests improvements RiseupVPN tests improvements Mar 20, 2023
@cyBerta
Copy link
Contributor Author

cyBerta commented Mar 20, 2023

Hm... the coverage CI action wasn't failing before, when I played with the code on my personal github repo and filed a MR to myself.

From the above failing report - RiseupVPN's coverage:
github.com/ooni/probe-cli/v3/internal/experiment/riseupvpn 0.684s coverage: 99.4% of statements

@cyBerta cyBerta marked this pull request as draft March 21, 2023 08:55
@cyBerta cyBerta marked this pull request as ready for review March 21, 2023 17:31
@cyBerta
Copy link
Contributor Author

cyBerta commented Apr 4, 2023

Hi @bassosimone @hellais, is there anything I could help with to support the review process?

@cyBerta
Copy link
Contributor Author

cyBerta commented Jun 29, 2023

@bassosimone I've rebased the MR to that latest master branch. I'm sure you know that: to test the MR, you'll need to build mini-ooni. proble-cli tests will fail without an redirection in go.mod or until the changes landed on master. We would love to see this MR getting in soon, it will help us a lot with improving measurements for LEAP VPN / RiseupVPN.

@bassosimone
Copy link
Member

@cyBerta I will take care of doing a review of this MR! Thank you for updating riseupvpn!

@bassosimone
Copy link
Member

Thank you for your patience @cyBerta! I am currently finishing the review, running tests, and generally think about this experiment and how we can move it forward! I estimate I'll publish the review next week. 🙏

Copy link
Member

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving the riseupvpn experiment! I started a detailed code review and tested the old and the new implementations side by side for a few hours. I did not finish my review because I realized that there was a more fundamental issue that we needed to discuss.

While testing I noticed that fetching eip-service.json and the geo service JSON has been very unreliable across my testing. It happened to me several times that the connection timed out when trying to connect to api.black.riseup.net. Your diff calls these APIs at lines 249 and 255 respectively. You process the results by calling the UpdateProviderAPITestKeys method on line 264. This method sets the APIStatus as blocked on line 114 if and only if the URL was that of the eip-service.json. In turn, this causes the whole measurement to be flagged as anomaly on line 493. If I was a user and I had run and submitted those measurements, I would have believed that riseupvpn was blocked, while the underlying cause was the flaky service serving eip-service.json. If this event occurred so frequently that I could notice it, I am afraid it could have also happened for several other users. And we previously disabled riseupvpn because of too frequent false positives.

By reading the diff, I see how you did great work to try to reduce the frequency of false positives, which incidentally complicated the implementation and added extra complexity that we would need to maintain. At this point, I am convinced that we’re fighting an uphill battle and I want to instead propose a different approach that should make both my life (as the probe-cli maintainer) and your life (as the experiment maintainer) significantly easier. What’s more, my proposal would help expedite the review and reintegration of the riseupvpn experiment into ooniprobe.

Before explaining my proposal, I would like to explain what, in my opinion, the goal of this experiment should be, both now and in the next two years. We would like to collect data useful to understand whether (1) riseupvpn endpoints are accessible and (2) the riseupvpn app could bootstrap by fetching the data it needs to do so. We are much less concerned with providing immediate feedback to users than we are about collecting data that can be useful to you folks to plan and reason about the global VPN service you provide.

For this reason, here’s what I propose to do in the short term:

  1. We drop all the top-level keys from the experiment and we only keep the results generated by urlgetter. At the same time, we simplify the code that generates the summary so that it always says that there was no anomaly. This change removes the possibility that individual users would be triggered by flaky behavior of api.black.riseup.net and shifts the focus on the experiment purely on collecting data that should only be analyzed at scale (like we do for dnscheck).

  2. We change the logic of the experiment such that, if any of the APIs required to bootstrap fail, we stop and submit a smaller JSON to the backend. This change removes the extra complexity that was already there and the additional one you introduced in this commit. Arguably, this would also make the experiment less complete than the one you implemented in this pull request, but I also have a plan to address that. However, this plan is not for the short term, but luckily it intersects with my other work commitments. I expand on this plan in detail below.

  3. We modify the code to avoid submitting the JSON returned by the geo service, given that it includes geolocation data that it would be better not to submit. Granted, there is a mechanism in place with ooniprobe that strips the user’s IP address from the measurement. But I can see a bunch of cases where different geolocation mechanisms (the one used by OONI and the one used by riseup) may not agree because the IP address is different for some reasons and I would not like for us to increase the likelihood of a leak.

With these three changes in place, the riseupvpn experiment would be smaller and easier to maintain. So, we can add it back to ooniprobe in the experimental suite. By running it again, we can collect more data at scale that is certainly useful to you folks. At the same time, we do not need to worry anymore about false positives because now we focus on collecting data.

In the medium term, I am going to propose the following:

  1. We are developing a mechanism by which the OONI backend can serve to ooniprobe a DSL describing what a specific network experiment needs to do. We are planning on using this mechanism for selected experiments (surely dnscheck and the IM nettests). I am suggesting we apply this mechanism also to riseupvpn. You can see a complete prototype of how this would work implemented in ooni/2023-05-richer-input#12 and documented at probe#2502.

  2. When riseupvpn is rewritten to use the DSL, the backend will periodically invoke the relevant riseupvpn APIs to fetch gateways to test. The backend will then serve to ooniprobes a DSL that contains instructions to measure the services required to bootstrap and the gateways. Because the backend generates the DSL, we have now removed the functional coupling between the first part of the riseupvpn experiment (bootstrap) and the second part (gateways). These two parts can now run independently of each other with moderate parallelism. Yay for simplicity!

  3. The DSL will include a feature-flag mechanism allowing the selection of the best operation to perform depending on the probe capabilities. Once @ainghazal has landed minivpn support in ooniprobe, we will then be able to ask the ooniprobes that can do that, to perform the openvpn handshake. More generally, the DSL will decouple the definition of an experiment to the probe version out there, thus requiring fewer emergency releases.

In the long term (with all the usual caveats borrowed from J.M. Keynes), I suggest that we evaluate again the decision to score measurements locally based on the collected data. By then, incidentally, I suspect we would have migrated to a model where the OONI collector provides the data analysis and scoring for a measurement, which would further decouple releasing a new version of ooniprobe and tweaking data analysis.

What do you think?

Longitude float64 `json:"lon"`
Gateways []string `json:"gateways"`
SortedGateways []GatewayLoad `json:"sortedGateways"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not include this body into the measurement as explained in the whole review.

@@ -29,13 +29,17 @@ type EipService struct {
Gateways []GatewayV3
}

// Capabilities is a list of transports a gateway supports
type Capabilities struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that all the data structures belonging to EipServiceV3, so maybe add V3 everywhere?

@cyBerta
Copy link
Contributor Author

cyBerta commented Jul 19, 2023

Wow, Thanks for this extensive review! I appreciate a lot the time you took for it. I can definitely change the given MR according to your short term proposals, they sound reasonable to me. Also awesome that you apparently already started to work on the midterm proposals as I understand ooni/probe#2502.
I haven't spent time yet to dig into the DSL approach, but will do so soon, so that I can hopefully help to maintain the shiny new solution.

@bassosimone
Copy link
Member

Thank you, @cyBerta !

@cyBerta cyBerta force-pushed the riseup_tests_improvements branch 2 times, most recently from f0b8ca8 to 7aec078 Compare August 4, 2023 20:16
…r the ca was invalid. Instead of early returning the test, we can still gather the necessary configs to run the gateway tests. This change implies that there might occur multiple failures while fetching data from the API, which is why TestKeys.APIFailures becomes a string array.
… longer skipped, API config data is fetched even if CA cert is invalid or missing
…4 bridges, avoid hard-coding locations in the test
* never return IsAnomaly=true
* early return in case the communication to the API fails
* remove (most) top level TestKeys, including custom SummaryKeys and keep urlgetter keys
* however keeps invalid CA cert handling, since unattended expired certs was a root cause
  for false positives in the past
* removes snowflake + tor handling as part of the complexity reduction of the test
* remove API response bodys from testkeys, addresses also removal of geolocation API reponse
* adapts unit tests
@cyBerta
Copy link
Contributor Author

cyBerta commented Aug 5, 2023

ok, I think I've addressed your proposals.

  1. We drop all the top-level keys from the experiment and we only keep the results generated by urlgetter.

I've left 2 TestKeys that helps within the test to use tls-no-verify in case fetching the CA cert fails or the CA is expired. Next to the slow response time of Riseup's API server it was a common case for false positives. Also we're using these keys in a fetch-and-parsing pipeline for prometheus, so keeping these keys would help us to change less code at other places as well.

At the same time, we simplify the code that generates the summary so that it always says that there was no anomaly.

I've removed the summary keys and the tests always return false for IsAnomaly

This change removes the possibility that individual users would be triggered by flaky behavior of api.black.riseup.net and shifts the focus on the experiment purely on collecting data that should only be analyzed at scale (like we do for dnscheck).

yey, should be the case now

  1. We change the logic of the experiment such that, if any of the APIs required to bootstrap fail, we stop and submit a smaller JSON to the backend. This change removes the extra complexity that was already there and the additional one you introduced in this commit. Arguably, this would also make the experiment less complete than the one you implemented in this pull request, but I also have a plan to address that. However, this plan is not for the short term, but luckily it intersects with my other work commitments. I expand on this plan in detail below.

I've changed the test so that it fails and returns early.

  1. We modify the code to avoid submitting the JSON returned by the geo service, given that it includes geolocation data that it would be better not to submit. Granted, there is a mechanism in place with ooniprobe that strips the user’s IP address from the measurement. But I can see a bunch of cases where different geolocation mechanisms (the one used by OONI and the one used by riseup) may not agree because the IP address is different for some reasons and I would not like for us to increase the likelihood of a leak.

while I was at it I've removed all response bodies, since they aren't interesting for us and they unnecessarily leak data

bassosimone added a commit that referenced this pull request Oct 11, 2023
Removing the summary keys was discussed in #1125
and I'm extracting this diff form such a pull request.

Co-authored-by: cyBerta <cyberta@riseup.net>
bassosimone added a commit that referenced this pull request Oct 11, 2023
Removing the summary keys was discussed in
#1125 and I'm extracting this diff
form such a pull request.

While there, bump the version number for this and other upcoming
changes.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
bassosimone added a commit that referenced this pull request Oct 11, 2023
This diff renames some structs, changes the code emitting the progress,
and bumps the experiment version. In my previous commit, I said I did
bump the version number but that was not actually the case.

This diff has been extracted from #1125.

This diff is part of ooni/probe#1432
bassosimone added a commit that referenced this pull request Oct 11, 2023
…1361)

This diff renames some structs, changes the code emitting the progress,
and bumps the experiment version. In my previous commit, I said I did
bump the version number but that was not actually the case.

This diff has been extracted from
#1125.

This diff is part of ooni/probe#1432

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
@bassosimone
Copy link
Member

bassosimone commented Oct 11, 2023

I mistakenly pushed some of my commits in this branch, but I reverted it now.

bassosimone added a commit that referenced this pull request Oct 11, 2023
This diff incorporates part of what has been implemented by @cyBerta in
#1125 in response to my review.

I have omitted the work to use the geo service to figure out which are
the correct gateways to test for now, which allows me to simplify the
diff that I am committing to the master branch.

The spirit of the changes is the following:

1. reduce the test keys to the minimum required by riseup to
process the experiment results;

2. skip TLS verification if we cannot fetch the CA certificate
and unconditionally fetch information about gateways.

The major difference between this diff and the above-mentioned pull
request is that I did not feel good to keep the `api_failure` name
with a different type, which seems like a very breaking change,
and instead I introduced a new type called `api_failures`.

This work is part of ooni/probe#1432.
bassosimone added a commit that referenced this pull request Oct 11, 2023
This diff incorporates part of what has been implemented by @cyBerta in
#1125 in response to my review as
well as additional changes based on my own feelings about what is
correct to do here.

Compared to the original diff, these are the changes that I implemented:

1. I have omitted the work to fetch from riseup geo service and figure
out the correct gateways to test. The main reason for not including this
body of work has been to reduce the size of the diff and the amount of
code to deal with.

2. I modified the logic related to failures in fetching the CA and
communicating with riseup services. The test fails immediately if we
cannot fetch the proper CA or we cannot contact riseup services. I did
not feel comfortable disabling the CA to access riseup services and
connecting to the TCP endpoints discovered w/o CA verification.

3. In the test keys, I renamed `api_failure` to `api_failures` because I
do not think it's optimal to keep the same name while the type has
changed from `*string` to `[]string`.

The spirit of the changes is not directly compatible with what we
discussed with @cyBerta. The main difference is in my decision to fail
early in case we miss the preconditions. As I wrote in
#1125 (review),
I think we should be using richer input (and start with its simplest
form) to provision to probes the data they need to perform this
experiment. By provisioning the data ourselves, we remove the coupling
between getting the CA, accessing riseup services to get information on
what gateways we should measure, and measure the gateways, which makes
the experiment several orders of magnitude more robust.

Unfortunately, I do not have time, in this cycle, to perform all this
richer input work. We'll try again for 3.20.

This work is part of ooni/probe#1432.

While there, I forced null callbacks when performing the CA fetch and
contacting riseup services, otherwise we end up printing a non-monotonic
progress status. Admittedly, also omitting to provide progress about
these two operations is bad, but I think we won't be able to provide
monotonic progress until we know what we should fetch in advance.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
@bassosimone
Copy link
Member

@cyBerta I started merging code from this PR into the master branch. The related PRs are #1360, #1363, and #1364.

#1363 (comment) in particular explains my rationale for pulling changes and what I chose not to include for the 3.19 release. It also explains how I think we can use some form of richer input, such as the DSL described above, or perhaps something simpler (and easier to implement), to decouple testing services useful to bootstrap riseupvpn and getting the endpoints we should be testing.

I think this PR should probably remain open because it contains the work about finding out which subset of the gateways to test, which I think in the future should be an algorithm that runs in the backend to find out which gateways probes should test.

If you have no objections, I'd like to go ahead and merge your changes with master, so it's more clear what has not been added into master and the algorithms about parsing the geo service results stand out more clearly.

Thank you!

bassosimone added a commit that referenced this pull request Oct 11, 2023
This diff backports #1360 to the release/3.19 branch.

Removing the summary keys was discussed in
#1125 and I'm extracting this diff
form such a pull request.

While there, bump the version number for this and other upcoming
changes.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
bassosimone added a commit that referenced this pull request Oct 11, 2023
… version

This diff backports #1361 to the release/3.19 branch.

This diff renames some structs, changes the code emitting the progress,
and bumps the experiment version. In my previous commit, I said I did
bump the version number but that was not actually the case.

This diff has been extracted from
#1125.

This diff is part of ooni/probe#1432

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
bassosimone added a commit that referenced this pull request Oct 11, 2023
…keys

This diff backports #1363 to the release/3.19 branch.

This diff incorporates part of what has been implemented by @cyBerta in
#1125 in response to my review as
well as additional changes based on my own feelings about what is
correct to do here.

Compared to the original diff, these are the changes that I implemented:

1. I have omitted the work to fetch from riseup geo service and figure
out the correct gateways to test. The main reason for not including this
body of work has been to reduce the size of the diff and the amount of
code to deal with.

2. I modified the logic related to failures in fetching the CA and
communicating with riseup services. The test fails immediately if we
cannot fetch the proper CA or we cannot contact riseup services. I did
not feel comfortable disabling the CA to access riseup services and
connecting to the TCP endpoints discovered w/o CA verification.

3. In the test keys, I renamed `api_failure` to `api_failures` because I
do not think it's optimal to keep the same name while the type has
changed from `*string` to `[]string`.

The spirit of the changes is not directly compatible with what we
discussed with @cyBerta. The main difference is in my decision to fail
early in case we miss the preconditions. As I wrote in
#1125 (review),
I think we should be using richer input (and start with its simplest
form) to provision to probes the data they need to perform this
experiment. By provisioning the data ourselves, we remove the coupling
between getting the CA, accessing riseup services to get information on
what gateways we should measure, and measure the gateways, which makes
the experiment several orders of magnitude more robust.

Unfortunately, I do not have time, in this cycle, to perform all this
richer input work. We'll try again for 3.20.

This work is part of ooni/probe#1432.

While there, I forced null callbacks when performing the CA fetch and
contacting riseup services, otherwise we end up printing a non-monotonic
progress status. Admittedly, also omitting to provide progress about
these two operations is bad, but I think we won't be able to provide
monotonic progress until we know what we should fetch in advance.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
}
testkeys.Requests = scrubbedRequests
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding scrubbing, I should mention here that since 7a6fd91 we should be unconditionally scrubbing HTTP headers and bodies by removing anything that looks like an IP address or endpoint.

bassosimone added a commit to ooni/backend that referenced this pull request Oct 17, 2023
This diff relaxes the riseupvpn scoring logic because we have
determined in ooni/probe-cli#1125 (comment)
that we want this experiment to just collect data useful to riseup.

Closes #745.
bassosimone added a commit to ooni/backend that referenced this pull request Oct 17, 2023
This diff relaxes the riseupvpn scoring logic because we have
determined in ooni/probe-cli#1125 (comment)
that we want this experiment to just collect data useful to riseup.

Note that I needed to change test_aggregation.py as a result of these
changes because the number of ok and anomaly results changed.

Closes #745.
bassosimone added a commit to ooni/backend that referenced this pull request Oct 18, 2023
This diff relaxes the riseupvpn scoring logic because we have
determined in ooni/probe-cli#1125 (comment)
that we want this experiment to just collect data useful to riseup.

Note that I needed to change test_aggregation.py as a result of these
changes because the number of ok and anomaly results changed.

Closes #745.
@cyBerta
Copy link
Contributor Author

cyBerta commented Oct 25, 2023

@cyBerta I started merging code from this PR into the master branch. The related PRs are #1360, #1363, and #1364.

#1363 (comment) in particular explains my rationale for pulling changes and what I chose not to include for the 3.19 release. It also explains how I think we can use some form of richer input, such as the DSL described above, or perhaps something simpler (and easier to implement), to decouple testing services useful to bootstrap riseupvpn and getting the endpoints we should be testing.

I think this PR should probably remain open because it contains the work about finding out which subset of the gateways to test, which I think in the future should be an algorithm that runs in the backend to find out which gateways probes should test.

If you have no objections, I'd like to go ahead and merge your changes with master, so it's more clear what has not been added into master and the algorithms about parsing the geo service results stand out more clearly.

Thank you!

hi @bassosimone
Thank you so much for your thoughts, effort and work you've put into this review and the subsequent PRs! I'm curious to see how this test suite evolves. please let me know how I can support the development in any way.

Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
Removing the summary keys was discussed in
ooni#1125 and I'm extracting this diff
form such a pull request.

While there, bump the version number for this and other upcoming
changes.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
…oni#1361)

This diff renames some structs, changes the code emitting the progress,
and bumps the experiment version. In my previous commit, I said I did
bump the version number but that was not actually the case.

This diff has been extracted from
ooni#1125.

This diff is part of ooni/probe#1432

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
Murphy-OrangeMud pushed a commit to Murphy-OrangeMud/probe-cli that referenced this pull request Feb 13, 2024
…1363)

This diff incorporates part of what has been implemented by @cyBerta in
ooni#1125 in response to my review as
well as additional changes based on my own feelings about what is
correct to do here.

Compared to the original diff, these are the changes that I implemented:

1. I have omitted the work to fetch from riseup geo service and figure
out the correct gateways to test. The main reason for not including this
body of work has been to reduce the size of the diff and the amount of
code to deal with.

2. I modified the logic related to failures in fetching the CA and
communicating with riseup services. The test fails immediately if we
cannot fetch the proper CA or we cannot contact riseup services. I did
not feel comfortable disabling the CA to access riseup services and
connecting to the TCP endpoints discovered w/o CA verification.

3. In the test keys, I renamed `api_failure` to `api_failures` because I
do not think it's optimal to keep the same name while the type has
changed from `*string` to `[]string`.

The spirit of the changes is not directly compatible with what we
discussed with @cyBerta. The main difference is in my decision to fail
early in case we miss the preconditions. As I wrote in
ooni#1125 (review),
I think we should be using richer input (and start with its simplest
form) to provision to probes the data they need to perform this
experiment. By provisioning the data ourselves, we remove the coupling
between getting the CA, accessing riseup services to get information on
what gateways we should measure, and measure the gateways, which makes
the experiment several orders of magnitude more robust.

Unfortunately, I do not have time, in this cycle, to perform all this
richer input work. We'll try again for 3.20.

This work is part of ooni/probe#1432.

While there, I forced null callbacks when performing the CA fetch and
contacting riseup services, otherwise we end up printing a non-monotonic
progress status. Admittedly, also omitting to provide progress about
these two operations is bad, but I think we won't be able to provide
monotonic progress until we know what we should fetch in advance.

---------

Co-authored-by: cyBerta <cyberta@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants