Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wcth: compare old and new test helpers #1707

Closed
bassosimone opened this issue Jul 6, 2021 · 21 comments
Closed

wcth: compare old and new test helpers #1707

bassosimone opened this issue Jul 6, 2021 · 21 comments

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Jul 6, 2021

This issue is about fetching all the URLs in all the test lists with both helpers and comparing the results. We want to understand in which cases the new test helper differs from the old test helper and why.

You may want to jump straight to the conclusions: #1707 (comment).

@bassosimone bassosimone self-assigned this Jul 6, 2021
@bassosimone bassosimone changed the title wcth: compare old and new test helpers wcth: compare old and new test helpers (1/n) Jul 6, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 4, 2021
When a probe gets a local DNS failure, it will continue and nonetheless
query the test helper without any IP address, just an empty list.

This diff fixes the behavior of cmd/oohelper to do the same.

Work part of ooni/probe#1707.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 5, 2021
When a probe gets a local DNS failure, it will continue and nonetheless
query the test helper without any IP address, just an empty list.

This diff fixes the behavior of cmd/oohelper to do the same.

Work part of ooni/probe#1707.
@bassosimone
Copy link
Contributor Author

bassosimone commented Sep 13, 2021

Here's a status update. In late July I run a first pass comparison of the whole content of the test lists between the old and the new test helper.

To this end, I used the code at https://github.com/ooni/wcthcheck.

The codename of this first comparison is config-0. Running the ./compare tool of the above mentioning repository using config-0, we get:

{
  "dns/match/exactly": 19518,
  "dns/match/same_asn": 3362,
  "dns/match/same_org": 122,
  "dns/mismatch/cannot_map_to_asn": 6,
  "dns/mismatch/new_data_none_addrs": 47,
  "dns/mismatch/other": 431,
  "dns/total": 23486,

  "http_body_length/diff": 9333,
  "http_body_length/diff/new_th_larger": 6368,
  "http_body_length/diff/new_th_smaller": 2965,
  "http_body_length/total": 23480,

  "http_failure/match": 22363,
  "http_failure/mismatch": 1117,
  "http_failure/mismatch/new_is_none": 45,
  "http_failure/mismatch/old_is_none": 292,
  "http_failure/mismatch/other": 780,
  "http_failure/total": 23480,

  "http_headers/match/same_set": 19516,
  "http_headers/mismatch/new_is_none": 1037,
  "http_headers/mismatch/set_diff": 2927,
  "http_headers/total": 23480,

  "http_status_code/match": 21916,
  "http_status_code/mismatch": 1564,
  "http_status_code/total": 23480,

  "http_title/match/both_empty": 2502,
  "http_title/match/equal": 11180,
  "http_title/mismatch/new_empty": 749,
  "http_title/mismatch/old_empty": 935,
  "http_title/mismatch/old_th_smaller": 7683,
  "http_title/mismatch/other": 431,
  "http_title/mismatch/total": 8114,
  "http_title/total": 23480,

  "processing/match/has_both_measurements": 23486,
  "processing/mismatch/missing_newth_measurement": 7,
  "processing/mismatch/missing_oldth_measurement": 4478,
  "processing/succeeded": 27971,
  "processing/total": 27971,

  "tcp_connect/match": 23251,
  "tcp_connect/mismatch": 235,
  "tcp_connect/mismatch/added_ivp4": 60,
  "tcp_connect/mismatch/different_ivp4": 185,
  "tcp_connect/mismatch/removed_ivp4": 73,
  "tcp_connect/total": 23486
}

After this initial scanning, I started trying to figure out the reasons why there was such a difference. Generally speaking, what I did was to look into a subset of measurements to identify bugs.

The first issue I figured out is that is that the internal/cmd/oohelper command in ooni/probe-cli was stopping when the input URL's domain name was NXDOMAIN. This was fixed at ooni/probe-cli#450.

After which I started looking into the subset of URLs that presents tcp_connect mismatches. This subset of URLs has been used for config-1, config-2, config-3 and config-4 runs.

For this new comparison, I deployed the legacy backend on my local machine inside a Docker container running an ubuntu:16.04 container. To do that smoothly I needed ooni/backend#533 to pin working deps.

While investigating, there was some serendipity. I assumed the legacy backend was using 8.8.8.8:53/udp as a resolver, but instead it was using the system resolver. This led me to #1780.

Subsequent changes to the new test helper landed at: ooni/probe-cli#504.

This brings us to today. Here's the next-two-weeks agenda:

  • we are able to explain all the differences and to minimise them through code changes
  • we compare the error strings emitted for specific error conditions

The latter will be done by censoring the network on the computer I am using via Jafar.

@bassosimone
Copy link
Contributor Author

bassosimone commented Sep 13, 2021

Written status update. We can now move this issue to Sprint 48.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 15, 2021

Further investigating DNS differences (5/5)

I solved more DNS related issues and branched off an issue I could not solve easily: #1823.

This is the current DNS situation for most ~80% URLs in the test list:

{
  "dns/match/exactly": 25642,
  "dns/match/same_asn": 158,
  "dns/match/same_failure": 552,
  "dns/mismatch/different_failure": 1,
  "dns/mismatch/old_failure": 19,
  "dns/mismatch/other": 5,
[snip]

The different_failure case is the one for which I branched off the new issue. I still need to chase the old_failure and other cases to figure out what is their meaning.

So, most of them were transient, we're not looking at this:

  "dns/known_bug/old_cannot_handle_ip": 17,
  "dns/match/exactly": 25824,
  "dns/match/same_asn": 160,
  "dns/match/same_failure": 552,
  "dns/mismatch/different_failure": 3,
  "dns/mismatch/new_failure": 4,
  "dns/mismatch/old_failure": 1,

There are 17 entries were the old TH cannot handle IPs in the URL, which I think it's a behaviour that the new TH does not actually want to mirror. Then there are 8 failures that maybe we need to further investigate. Overall, though, the DNS part seems to be converging (I am continuing to scan the TLs but now I am near to cases where probably the domains are censored where I live because every measurement is taking such a long time).

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 15, 2021

HTTP headers

I also started looking into HTTP headers. I found more cases in which the set is disjoint, so I added more headers as exception, but then I also measured in which cases both sets ended up being empty. This is not looking good:

  "http_headers/failure/new": 883,
  "http_headers/failure/old": 5141,
  "http_headers/match/no_header_left": 9378,
  "http_headers/match/same_set": 9060,
  "http_headers/mismatch/set_diff": 925,
  "http_headers/total": 19363,

There are two issues here: (1) we have many HTTP failures for which we probably need to retry and (2) the more I add exception headers the more I increase the no_header_left. That is: set_diff and no_header_left cannot both be optimised at the same time. So perhaps the approach I am using for headers here is wrong. I think we should just eliminate the headers that the probe would not consider without adding further exceptions. Doing that now.

Okay, so this is what we have:

  "http_headers/failure/new": 885,
  "http_headers/failure/old": 5142,
  "http_headers/match/no_header_left": 4064,
  "http_headers/match/same_set": 4975,
  "http_headers/mismatch/set_diff": 10331,
  "http_headers/total": 19370,

This is, in itself, a data quality issue. The two different implementations return different headers that matter according to Web Connectivity's heuristics in many cases. This confuses new probes a bit (unclear the extent). If we switch to the new TH, we now confuse new probes less and old probes more (which is less of a concern).

So, okay, I think there is no need to dig this headers topic further. It's clear we cannot do much about this. The two implementations we are using are too different and too spaced in time for further action to be possible.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 15, 2021

HTTP status code (1/2)

Regarding the status code, after correcting to avoid noise caused by failures, here's what I get:

  "http_status_code/failure/new": 885,
  "http_status_code/failure/old": 5142,
  "http_status_code/match": 19030,
  "http_status_code/mismatch": 343,
  "http_status_code/total": 25400,

It may be worth taking a look into the mismatch cases and see whether there's anything fundamental here.

So, the first thing I noticed is that with the old TH there are URLs that return 403 and this becomes 200 with the new TH. An example is http://www.geocities.ws/marchrecce/, for which we have:

// old TH

  "http_request": {
    "body_length": 11784,
    "failure": null,
    "title": "Please Wait... | Cloudflare",
    "headers": {
        // snip
    },
    "status_code": 403

// new TH

  "http_request": {
    "body_length": 6573,
    "failure": null,
    "title": "Rendezvous en Lower Myanmar - An Inner Odyssey",
    "headers": {
        // snip
    },
    "status_code": 200

I created rules for both the 403 that becomes 200 and for the sub-case of Cloudflare and now I have:

  "http_status_code/403_becomes_200": 213,
  "http_status_code/cf": 6,
  "http_status_code/failure/new": 924,
  "http_status_code/failure/old": 5349,
  "http_status_code/match": 19420,
  "http_status_code/mismatch": 132,
  "http_status_code/total": 26044,

So, clearly the Cloudflare check was not very selective and the 403 => 200 was much more selective.

Okay, this is a data quality issue we're fixing by using the new TH ✔️

Now, what's next? we need to look into the remaining cases for which we have mismatch.

After more refinements, here's what I have:

  "http_status_code/200_becomes_503": 13,
  "http_status_code/308_becomes_200": 46,
  "http_status_code/403_becomes_200/cf": 6,
  "http_status_code/403_becomes_200/other": 217,
  "http_status_code/403_becomes_503/cf": 1,
  "http_status_code/403_becomes_503/other": 16,
  "http_status_code/503_becomes_200": 9,
  "http_status_code/failure/new": 935,
  "http_status_code/failure/old": 5474,
  "http_status_code/match": 19699,
  "http_status_code/mismatch": 49,
  "http_status_code/total": 26465,

Some insights based on the above results:

  1. we should retry all the cases in which 200 becomes 503 or 503 becomes 200, because this looks like a temporary failure that will go away if we retry and maybe our testing was a bit too aggressive

  2. in addition to 403 becoming 200 we also have 403 becoming 503, which are both data quality issues

  3. we still have 49 cases to dig into

One of the remaining cases (http://www.irna.ir/en/) is quite fascinating (edit: is a fundamental issue; see #1824):

// old TH

{
  "tcp_connect": {
    "178.216.249.78:80": {
      "status": true,
      "failure": null
    },
    "217.25.48.64:80": {
      "status": true,
      "failure": null
    }
  },
  "http_request": {
    "body_length": 35805,
    "failure": null,
    "title": "IRNA English",
    "headers": {
      "Cache-Control": "max-age=60",
      "Content-Type": "text/html;charset=UTF-8",
      "Date": "Fri, 15 Oct 2021 03:11:46 GMT",
      "Expires": "Fri, 15 Oct 2021 03:12:06 GMT",
      "Server": "nginx",
      "Vary": "Accept-Encoding",
      "X-Cache-Status": "HIT"
    },
    "status_code": 200
  },
  "dns": {
    "failure": null,
    "addrs": [
      "217.25.48.64",
      "178.216.249.78"
    ]
  }
}

// New TH

{
  "tcp_connect": {
    "178.216.249.78:80": {
      "status": true,
      "failure": null
    },
    "217.25.48.64:80": {
      "status": true,
      "failure": null
    }
  },
  "http_request": {
    "body_length": 1918,
    "failure": null,
    "title": "\u0635\u0641\u062d\u0647\u0654 \u062f\u0631\u062e\u0648\u0627\u0633\u062a\u06cc \u0634\u0645\u0627 \u06cc\u0627\u0641\u062a
 \u0646\u0634\u062f.",
    "headers": {
      "Content-Language": "en",
      "Content-Length": "1918",
      "Content-Type": "text/html;charset=UTF-8",
      "Date": "Fri, 15 Oct 2021 03:11:34 GMT",
      "Server": "nginx"
    },
    "status_code": 404
  },
  "dns": {
    "failure": null,
    "addrs": [
      "217.25.48.64",
      "178.216.249.78"
    ]
  }
}

I'm going to try and see whether there are more cases like this. (I didn't really see this coming!)

Okay, after some refactoring, this is the final analysis for status code. I'll add some comments inline:

  // These are cases where apparently using the new TH causes failures
  "http_status_code/200_becomes_403": 8,
  "http_status_code/200_becomes_404": 2,
  "http_status_code/200_becomes_500": 1,
  "http_status_code/200_becomes_502": 1,
  "http_status_code/200_becomes_503": 13,
  "http_status_code/200_becomes_504": 1,

  // This 203 is interesting
  "http_status_code/203_becomes_504": 1,

  // These are cases where the old TH does not handle 308
  "http_status_code/308_becomes_200": 46,
  "http_status_code/308_becomes_203": 1,
  "http_status_code/308_becomes_404": 2,
  "http_status_code/308_becomes_451": 1,

  // These seem cases where the old TH causes data quality issues to new probes
  // and probably it is somehow classified as "very old client"
  "http_status_code/400_becomes_200": 5,
  "http_status_code/400_becomes_404": 1,
  "http_status_code/403_becomes_200/cf": 6,
  "http_status_code/403_becomes_200/other": 217,
  "http_status_code/403_becomes_402": 1,
  "http_status_code/403_becomes_404": 8,
  "http_status_code/403_becomes_503/cf": 1,
  "http_status_code/403_becomes_503/other": 16,
  "http_status_code/404_becomes_200": 3,
  "http_status_code/404_becomes_503": 2,
  "http_status_code/429_becomes_200": 1,
  "http_status_code/500_becomes_200": 4,
  "http_status_code/502_becomes_200": 4,
  "http_status_code/503_becomes_200": 9,
  "http_status_code/523_becomes_522": 1,

  // This is a case where the old code sets the status code after reading the body
  // while the new code distinguish between round trip and reading body. Maybe I
  // can fix this case. Maybe re-run? This is clearly the case in which there is
  // some error after the round trip...
  "http_status_code/error_reading_body": 1,

  // This tells me I need to re-run for a bunch of URLs
  "http_status_code/failure/new": 935,
  "http_status_code/failure/old": 5474,

  // These are the cases in which we get the same status code
  "http_status_code/match": 19699,
  "http_status_code/total": 26465,

All these cases are interesting and are, additionally, a source of confusion anyway for new probes. That is, looking into what is happening here was out of curiosity and for the sake of data quality mostly.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 18, 2021

HTTP status code (2/2)

One of the remaining cases (http://www.irna.ir/en/) is quite fascinating:

Well, actually, no, it isn't fascinating. It's just a data quality issue with the way in which webconnectivity works. I've explained why this is a systematic issue of webconnectivity in #1824.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 18, 2021

Review of the results and conclusions

The objective of this comparison was to figure out whether replacing the old webconnectivity test helper (TH) with the new one would not reduce the quality of the data produced by new OONI Probes (i.e., OONI Probes using the Go engine).

The rest of this report is organised as follows. We analyse the Go client to determine what matters. We explain how run the comparison. We document actions to reduce the diff in the JSONs produced by the two THS in varying conditions. We document some oddities we observed. We show the results and comment each aspect. We draw the conclusions.

Analysis of the Go client

In this section, we read through the source code of the Go client. The objective here is to figure out what specific fields emitted by the TH matter the most to determine the blocking status of a measurement.

Because we've already discontinued the Python implementation of OONI Probe, we only focus on the Go codebase.

As of v3.10.1, the webconnectivity implementation:

  1. uses the string result from the control in dnsanalysis.go to determine DNS consistency

  2. performs HTTPAnalysis in httpanalysis.go

  3. aggregates a summary in summary.go

  4. only directly compares with the DNS flag inside the control in dnsanalysis.go and otherwise does not directly string match with the strings returned by the control

  5. incorrectly assumes that a status code equals to zero means failure whereas the original TH returns -1 (see webconnectivity: Go implementation wrongly handles the TH's HTTP status code #1825)

  6. skips the body length check if the response body is truncated

  7. skips the status code check if the response is 500

  8. skips many "common" headers when comparing headers

  9. uses a 128 byte limit for extracting the <title>

  10. skips the <title> check if the body is truncated

  11. still, does not set accessible and blocking if the control HTTP request failed, which somehow protects from the above mentioned issues regarding the truncated body and the incorrect handling of the status code

  12. does not seem to investigate whether the a connect failure could have happened also in the control but this issue is mitigated again by the fact that it does not set accessible and blocking if the control request fails, still this means there is a bunch of cases in which we could say there's consistency with respect to the control and we don't

  13. when there is no DNS, TCP/IP or TLS blocking, falls back to complicated heuristics that maybe could be made simpler by not processing ex post the full results but rather by using a model where we determine blocking right after each measurement step (this is also a good design hint for websteps, if we're able to do that)

Because of the above facts, it seems it's really important for us to make sure we don't have DNS discrepancies between what the old TH and the new TH do DNS-wise. Other aspects are comparatively less important to equalize.

Methodology

I've run repeatedly the old and the new TH with a nearly full version of the test list. To this end, I used the code at https://github.com/ooni/wcthcheck. Many commits in such a repository mentioned this issue. Also, there are comments in this issue explaining how I run the whole experiment. In a nutshell: I run the two THs on a Linux box, side by side and accessed them using oohelper, the TH client helper included in github.com/ooni/probe-cli.

I've also tried to use Jafar to provoke specific errors and see how the two test helpers reacted to them: #1707 (comment).

Actions to reduce the diff between the new and the old TH

The experiments I've run led to the following PRs that attempted to reduce the diff between the old and the new TH

  1. fix(oohelperd): reduce differences with legacy helper probe-cli#504

  2. fix(oohelperd): return HTTP headers as empty map on error probe-cli#541

  3. fix(webconnectivity): gather longer HTML titles probe-cli#542

  4. fix(oohelperd): reduce errors to what the old TH would emit probe-cli#543

  5. fix(wcth): emit empty Addrs when input URL contains addr probe-cli#545

  6. fix(wcth): match legacy TH w/ empty DNS reply probe-cli#546

Oddities

Performing this analysis, I came across these interesting oddities that may have a data quality impact:

  1. DNS resolution poisoning: the case of 97dounai.top #1780

  2. testlists: semi-automatically detect and remove domains that are now for sale (aka parked domains) #1826

  3. wcth: new wcth cannot resolve www.stratcom.mil, old wcth can #1823

  4. legacyth: timeout with www.aiadmk.website wcthcheck#1

  5. webconnectivity: inaccurate www.irna.ir measurement #1824

Results

Here we classify by "phase" of the experiment: DNS, TCP/IP, and HTTP.

category mismatches total percentage
dns 49 27873 0.17%
http_body_length 10961 26405 41%
http_failure 6608 26405 25%
http_headers ... ... ...
http_status 6766 26405 25%
http_title 13862 26405 52%
tcp_connect ... ... ...

The total number of URLs was 27971. We're missing 16 URLs with both, 1 is only missing with the new TH, and 81 are only missing with the old TH. Because the result encompass DNS, TCP/IP, and HTTP, re-running a measurement could in some cases cause failures in one aspect (e.g., DNS) while fixing other aspects in others (e.g., HTTP): (as we already know) the test lists contain many not-so-reliable websites.

DNS

So, regarding the DNS we are in a very good position and we only have really few differences. The full classification of the 49 DNS mismatches between new and old TH is the following:

  1. in one case the comparison failed because we could not figure out the IP's ASN, making comparison unfeasible. It turns out it's an issue of the GeoIP database we're using. The IPs in this result actually belong to Amazon's CloudFront.

  2. in 19 cases the new TH failed and in 9 cases the old TH failed, making comparison unfeasible. It's difficult to say whether the new TH is more likely to have DNS failures. However, wcth: new wcth cannot resolve www.stratcom.mil, old wcth can #1823 suggests that we should probably increase the timeout of the new codebase a bit to give the upstream server we're using time to reply with SERVFAIL.

  3. in 9 cases the failure is different between the old and the new TH and in 11 cases the failure is flagged as other. I have not fully investigated these cases yet and some of them could just be limitations of the classification where you end up flagging something as other having excluded all the other known possibilities.

It's also worth noting that we have 17 cases in which the old TH could not handle the URL containing an IP address. They are not flagged as failures, rather this is a known bug, so it goes into a specific, distinct bucket.

HTTP body length

The body length is complicated because we need to see the body but the THs generally do not return it. We have seen that the old TH sometimes encodes HTML tags, which may be a data quality issue because the new probe doesn't do it. See #1707 (comment) for more details on how I learned about this encoding issue.

HTTP failure

5574 of the 6608 failure mismatches occur because the new TH failure is null and the old TH failure is not null. The converse happens in 935 cases out of 6608. This seems to suggest that the new TH is more reliable. (Empirically, I also observed that, while the measurements were progressing, there were more failures with the old TH, since the number of missing measurements with the old TH was generally larger than the ones with the new TH.)

(Ah, and I also showed in #1707 (comment) that the old TH does not validate TLS certificates, which is a potential data quality issue and source of differences: the Go probe will fail in this case.)

HTTP headers

I basically gave up comparing headers. We need to exclude the common set of headers that the probes ignore. The remaining headers are quite dependent on the two implementations being very different (the old TH uses Twisted 13.2.0, which is software released in November 2013). I tried to add more headers to the ignore list, which often led me to an empty intersection between headers. I took this fact as a signal that investigating headers differences was a waste of my time. That said, the Go probe uses the same codebase of the new TH, so using the new TH seems better here.

HTTP status code

The status code matched in 19699 cases out of 26465. This leaves us with 6766 mismatches. Of these, 6409 are caused by the old TH failing for HTTP (5474 cases) or the new TH failing (935 cases). The remaining cases are cases in which the status code changes by changing TH. I also showed a case, http://www.irna.ir/en/, in which the result is purely dependent on the IP address chosen by the specific TH to fetch the webpage: not following all endpoints has a data quality cost (#1825).

HTTP title

When discussing the title, it's important to remember that the Go probe and the new TH use a similar codebase, therefore they should have more consistent results in this respect. I learned that the old TH strips non-latin characters from the title itself, which leads to it producing a smaller (6451 cases) or empty title (6309 cases). Summing these two cases we obtain 12760, which is most of the 13862 mismatching cases. There are also few cases where the new TH title is smaller or empty, which likely depend on the regexp used to extract the title (which is limited to a maximum number of characters in the Go codebase) or on the page size. Either way, doing something similar to what the probe does will be better.

TCP connect

I didn't spend much time looking into this topic. We don't directly string match strings from the TH in the probe and I had already gained confidence that we can handle the common cases using Jafar.

Conclusions

I think it's fine to switch to the new TH. We will have to fix some small bugs, chiefly that our DNS timeout seems to small in one specific case and we cannot properly get a SERVFAIL (#1823). We have spent lots of time trying to reduce the difference in the JSON structure and that DNS looks similar. We should fix new Go clients to always perform the body length comparison (now both bodies should be truncated) as documented in #1827. We should fix the new Go clients to properly handle the status code value as documented in #1825. We should have issues for each possible data quality issue anyway and, on top of this, maybe we need a periodic process for monitoring the TH quality (#1829).

@bassosimone bassosimone changed the title wcth: compare old and new test helpers (1/n) wcth: compare old and new test helpers Oct 18, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
Reducing the errors is not done in a perfect way.

We have documented the most striking differences inside
ooni/probe#1707 (comment) and
some attempts to improve the situation further inside
ooni/probe#1707 (comment).

A better strategy for the future would be to introduce more
specific timeout errors, such as dns_timeout_error, etc.

More testing may be needed to further validate and compare the
old and the new TH, but this requires Jafar improvements to
more precisely simulate more complex censorship.

Backport from master; see ooni/probe#1839
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
Matches the behavior that the legacy TH implements in this
situation and reduces slightly the differences.

See ooni/probe#1707 (comment)

Backport from master; see ooni/probe#1839
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
@bassosimone
Copy link
Contributor Author

We concluded with @hellais that it's okay to replace the old TH with the new TH. The most immediate next step was to create a release of oohelperd that was suitable for doing that, which I did in #1841.

It remains to figure out what are the remaining issues (including addressing issues discovered as part of this one). But, we can also safely close this issue.

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
When a probe gets a local DNS failure, it will continue and nonetheless
query the test helper without any IP address, just an empty list.

This diff fixes the behavior of cmd/oohelper to do the same.

Work part of ooni/probe#1707.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
Reducing the errors is not done in a perfect way.

We have documented the most striking differences inside
ooni/probe#1707 (comment) and
some attempts to improve the situation further inside
ooni/probe#1707 (comment).

A better strategy for the future would be to introduce more
specific timeout errors, such as dns_timeout_error, etc.

More testing may be needed to further validate and compare the
old and the new TH, but this requires Jafar improvements to
more precisely simulate more complex censorship.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
Matches the behavior that the legacy TH implements in this
situation and reduces slightly the differences.

See ooni/probe#1707 (comment)
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants