Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add proposal for external vpn data format #274

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ainghazal
Copy link
Contributor

@ainghazal ainghazal commented Mar 29, 2023

Checklist

  • I have read the contribution guidelines
  • reference issue for this pull request:
  • related ooni/probe-cli pull request:
  • If I changed a spec, I also bumped its version number and/or date

Description

This proposal contains a data format that would enable receiving data from external sources with compatible semantics to what we will be obtaining via the vpn experiments.

I'm hesitant whether the right form would not be to declare a new experiment directly - but I suspect the experiment itself should just extend the data format and deal with the mechanics of receiving, validating (and probably, verifying) submission, either direct or via aggregation from trusted sources. separating it so probably enables easier changes on the submission mechanism themselves.

Do note that an analog discussion needs to happen for the openvpn & wireguard experiments. The semantics is probably clearer there since we can probably refactor to a generalized vpn data format (for which openvpn or wireguard are just special cases), making the experiment then more naturally tied to provider-specific logic and implementations.

@ainghazal ainghazal changed the title add proposal for external data format add proposal for external vpn data format Mar 29, 2023
@fortuna
Copy link

fortuna commented May 5, 2023

I've been playing with Outline connectivity tests at
https://github.com/Jigsaw-Code/outline-internal-sdk/tree/main/x/outline-connectivity

The test consists in resolving a domain name with a DNS resolver over a proxy using Shadowsocks as the transport.
DNS resolution is nice because we can use the same test for TCP and UDP. Much simpler than fetching a page. This test works for any transport that I can use to connect to a TCP or UDP endpoint.

I get reports like:

{
"time":"2023-05-05T23:11:25Z",
"duration_ms":24,
"proxy":"[IP]:443",
"resolver":"8.8.8.8:53",
"proto":"tcp",
"prefix":"HTTP/1.1 ",
"error":{"op":"dial","msg":"connection refused"}
}

I believe that encapsulates all the info I need.
It has to be sanitized though. For the proxy we should keep the country and AS instead of server IP.

We don't really need the resolver. It only tells me whether the server supports IPv4 and IPv6 destinations.

For the errors, the operation that failed and the error seems to be good. The operation is "dial", "write" or "read". Ideally we would canonicalize the error somehow, but that's protocol and platform-dependent.

Prefix is an Outline-specific feature.

@fortuna
Copy link

fortuna commented May 15, 2023

Perhaps a better option is to break down the errors calling out each action explicitly:

{
  "error": {
    "connect" : "ok|timeout|connection refused|host unreachable|network unreachable", 
    "send": "ok|timeout|..." 
    "receive": "ok|timeout|..."
  }
}

Perhaps it makes sense to stick to the POSIX errors when possible:

In that case we could use the names of the errors:

"connect": "ECONNREFUSED" 

For UDP, the errors we see on TCP connect may show up on receive instead.

If an action was not done, the error should be null or the entry missing, to differentiate absence of test from successes.

@amircybersec
Copy link

@ainghazal & @fortuna Reviving this discussion.

@ainghazal Can you please share some thoughts on what metrics or specific errors your openvpn & wireguard experiment would capture?

I wrote a document to discuss some high-level requirements of designing an end-to-end network error logging system, which includes some discussions on error format.

I believe the decision on the data format should be left to the client but a higher-level meta format can be defined to have fields such as report type and version which are read first during consumption such as the example provided here.

@amircybersec
Copy link

amircybersec commented Apr 2, 2024

@ainghazal I just saw the other PR #293. I am going to review the information there to get a better understanding of the openvpn format.

@ainghazal
Copy link
Contributor Author

@ainghazal I just saw the other PR #293. I am going to review the information there to get a better understanding of the openvpn format.

while it can be interesting to gather some of the semantics for the internal OpenVPN experiment, #293 is perhaps too tied to gather network traces to help diagnose the blocking of OpenVPN. The spec proposal in this issue is conceived to be used by almost any tunneling protocol, and to be injected externally (i.e., perhaps bypassing other abstractions assumed in use by the official OONI probes).

@ainghazal
Copy link
Contributor Author

I believe the decision on the data format should be left to the client

my concern is that, in order to process data and store it in the database, we'd need to agree on a common structure, or at least to be able to version and track data submitted by a few clients we're interested in understanding.

but a higher-level meta format can be defined to have fields such as report type and version which are read first during consumption such as the example provided here.

my original proposal did suggest a "vpn-network-error", mostly to differentiate from http-like reports that we might also receive if clients are able to send also reports about web APIs being blocked. But perhaps we can only focus on the former for now:

https://github.com/ooni/spec/blob/2222f6fa5ad902c0b570f29d562a289ec9493c57/data-formats/df-010-vpnext.md#vpn-network-errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants