Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with circumvention tool operated test target services #396

Closed
hellais opened this issue Apr 14, 2020 · 18 comments
Closed

Integration with circumvention tool operated test target services #396

hellais opened this issue Apr 14, 2020 · 18 comments
Assignees
Labels
enhancement improving existing code or new feature ooni/backend Issues related to https://github.com/ooni/backend ooni/orchestra Issue related to https://github.com/ooni/orchestra priority/medium

Comments

@hellais
Copy link
Member

hellais commented Apr 14, 2020

It was discussed with @NullHypothesis that it might be desirable for Tor to dynamically give us testing targets for probes including private tor bridges.

This issue is for tracking that work that needs to be done there.

@hellais hellais added the enhancement improving existing code or new feature label Apr 14, 2020
@hellais hellais added ooni/backend Issues related to https://github.com/ooni/backend ooni/orchestra Issue related to https://github.com/ooni/orchestra labels Apr 14, 2020
@hellais hellais changed the title Consider integrating with circumvention tool operated test target services Integration with circumvention tool operated test target services Apr 14, 2020
@NullHypothesis
Copy link

FYI, we're using #32740 on Tor's side to keep track of this project.

Also, I revised the specification of wolpertinger, the new service that will hand out bridges to OONI, after taking into account @bassosimone's feedback. Here's the latest spec:
https://gitlab.torproject.org/torproject/anti-censorship/wolpertinger

@NullHypothesis
Copy link

Two questions:

  • How would you like us to fetch the test results? Using OONI's public API?
  • Is there a unique ID for an OONI probe? Ideally, this ID should be hard to change (e.g., derived from a phone #).

@hellais
Copy link
Member Author

hellais commented Apr 28, 2020

How would you like us to fetch the test results? Using OONI's public API?

My recommendation would be to setup a copy of the OONI MetaDB on which you will run SQL queries to dump the data that you need.

I wrote down some information on how this could be done for the old vanilla-tor test: https://trac.torproject.org/projects/tor/ticket/32126#comment:4.

We currently don't do any feature extraction for the tor test, however we are planning on doing that soon and once that is done it will be present in the OONI MetaDB.

The work to be done on doing this feature extraction is tracked with this issue: #210.

Is there a unique ID for an OONI probe? Ideally, this ID should be hard to change (e.g., derived from a phone #).

No, we don't record any unique ID of OONI Probe clients in the public data, due to privacy concerns.

We have been discussing how to achieve something related to this for a long time, but it's been stalling a bit due to lack of time available to think of how we can do it in a way that is most respectful to users. See: ooni/probe#438

@NullHypothesis
Copy link

Thanks, hellais!

I filed a ticket to expose our new API on https://bridges.torproject.org/wolpertinger/. Here's the input that it expects from OONI's side. Since there are no unique IDs, use an empty string for the "id" field. The "type" field should say "ooni", "country_code" is the country code of the OONI probe that will test the returned bridge(s), and "auth_token" is a confidential authentication token that I will send you separately.

Is there anything else that we can do from our side to move this forward?

@hellais
Copy link
Member Author

hellais commented May 4, 2020

Thanks for sharing that @NullHypothesis! This seems pretty easy to integrate into our backend.

What AUTH_TOKEN should we be using?

Should we always be using the addresses returned by this API for running OONI Probe tests (perhaps using as fallback the tor browser bridges), or do we want to test these addresses in addition too the defaults in the file: https://github.com/ooni/sysadmin/blob/1df5bee4d3794770b55d6367b43ba8ae2d811d05/ansible/roles/probe-services/templates/tor_targets.json?

In terms of integration, we would have to write some code to go into the orchestra backend, here:
https://github.com/ooni/orchestra/blob/master/orchestrate/orchestrate/handler/test_lists.go#L209.

@NullHypothesis
Copy link

What AUTH_TOKEN should we be using?

I will send you an authentication token over email.

Should we always be using the addresses returned by this API for running OONI Probe tests (perhaps using as fallback the tor browser bridges), or do we want to test these addresses in addition too the defaults in the file: https://github.com/ooni/sysadmin/blob/1df5bee4d3794770b55d6367b43ba8ae2d811d05/ansible/roles/probe-services/templates/tor_targets.json?

I suggest testing wolpertinger's bridges in addition to our default bridges because these two bridge sets are disjoint.

In terms of integration, we would have to write some code to go into the orchestra backend, here:
https://github.com/ooni/orchestra/blob/master/orchestrate/orchestrate/handler/test_lists.go#L209.

Gotcha. Please let me know if you would like to see any changes to wolpertinger's API. After all, OONI is the primary consumer.

@NullHypothesis
Copy link

How would you like us to fetch the test results? Using OONI's public API?

My recommendation would be to setup a copy of the OONI MetaDB on which you will run SQL queries to dump the data that you need.

I wrote down some information on how this could be done for the old vanilla-tor test: https://trac.torproject.org/projects/tor/ticket/32126#comment:4.

We currently don't do any feature extraction for the tor test, however we are planning on doing that soon and once that is done it will be present in the OONI MetaDB.

The work to be done on doing this feature extraction is tracked with this issue: #210.

Hmm, setting up a MetaDB on polyanthum (the host that runs BridgeDB) seems overkill. The host currently has ~30 GB of free disk space and 2 GB of RAM.

Let me give you a better idea of the data we expect: We have ~1,000 bridges and would like to test each one occasionally, in different countries. If we test each bridge in each country every, say, two weeks, we end up with approximately 1,000 * 250 * 2 = 500,000 measurement records per month.

Given this (likely exaggerated) estimate, can you think of a more lightweight solution to this problem?

@hellais
Copy link
Member Author

hellais commented May 11, 2020

Given this (likely exaggerated) estimate, can you think of a more lightweight solution to this problem?

Do you mind getting the data with a bit of lag? If you don't mind polling for data every 24h and getting it with that sort of lag, I would suggest ingesting that data directly from the raw data cans.

This will allow you to download just the raw measurements for the Tor tests.

The docs of the can dataformat can be found here: https://ooni.org/post/mining-ooni-data/.

I will see if I can put together a gist that demonstrates how to do this (here are in the meantime some notes: https://gist.github.com/hellais/d60c3ca9d4d456c01ecf793a5763d510).

In order to get historical data you will have to go over every past bucket_date to retrieve all the data for them.

It should not take so much depending on the connectivity of the host you are using.

@NullHypothesis
Copy link

Given this (likely exaggerated) estimate, can you think of a more lightweight solution to this problem?

Do you mind getting the data with a bit of lag? If you don't mind polling for data every 24h and getting it with that sort of lag, I would suggest ingesting that data directly from the raw data cans.

Yes, a 24h lag is something we can work with. Thanks for that!

This will allow you to download just the raw measurements for the Tor tests.

The docs of the can dataformat can be found here: https://ooni.org/post/mining-ooni-data/.

I will see if I can put together a gist that demonstrates how to do this (here are in the meantime some notes: https://gist.github.com/hellais/d60c3ca9d4d456c01ecf793a5763d510).

In order to get historical data you will have to go over every past bucket_date to retrieve all the data for them.

It should not take so much depending on the connectivity of the host you are using.

Great, I'll get started with these links!

@hellais
Copy link
Member Author

hellais commented May 12, 2020

I tried speaking to the bridges API backend using:

import requests

payload =   {
    "id": "",
    "type": "ooni",
    "country_code": "IT",
    "auth_token": "[REDACTED]"
  }

requests.post('https://bridges.torproject.org/wolpertinger/', data=payload)

But I get a 404 error. Is this because this service is not yet deployed?

@NullHypothesis
Copy link

But I get a 404 error. Is this because this service is not yet deployed?

Yes, it wasn't deployed at the time you were testing it but it should work now. The API endpoint is https://bridges.torproject.org/wolpertinger/fetch

@NullHypothesis
Copy link

I would like to make a small change to wolpertinger's API and rename the API endpoint from

POST https://bridges.torproject.org/wolpertinger/fetch

to

GET https://bridges.torproject.org/wolpertinger/bridges

The change is reflected in the documentation. Is this ok with you?

@hellais
Copy link
Member Author

hellais commented May 19, 2020

Thanks for the heads up! Yeah it's not a problem for us. When is this change going to be deployed?

@hellais
Copy link
Member Author

hellais commented May 19, 2020

@NullHypothesis I started adding support for this inside of the backend here: https://github.com/ooni/orchestra/pull/88/files.

There is a problem though with the data format returned by the bridges.tpo API backend. The parameters to be passed to obfs4proxy are in the form of key: value pairs in the arguments key, while OONI Probe expects them to be key: [value] inside of the params key (see: https://github.com/ooni/sysadmin/blob/1df5bee4d3794770b55d6367b43ba8ae2d811d05/ansible/roles/probe-services/templates/tor_targets.json#L151).

Would it be possible to change the response format of the API so that we can get these values in a way that is compatible with our clients?

If that is too much of a hassle I suppose we can also do the transformation in the backend, but if we are the only consumers of the data maybe it's easier to make the change on your end.

I don't recall exactly why we opted for a list and that format (@bassosimone maybe remembers).

@hellais
Copy link
Member Author

hellais commented May 26, 2020

I have finished implementing this PR: ooni/orchestra#88. This is good to go.

@NullHypothesis
Copy link

Is the targetId field from here going to be the same ID that wolpertinger provides you in its output here? I'm asking because we need to match OONI's measurements with our data.

@hellais
Copy link
Member Author

hellais commented May 29, 2020

Is the targetId field from here going to be the same ID that wolpertinger provides you in its output here? I'm asking because we need to match OONI's measurements with our data.

Yes that is correct.

<targetId> in our spec will match what is the BRIDGE_ID in the wolpertinger#output.

As a further update on this, we have finished writing the backend code, however we are waiting on deploying it until we have in place:

  1. Client side code which gives to the OONI Backend the country_code of the probe (orchestra/tor: include country code in query string probe-engine#629)
  2. We have implemented a strategy for sanitising the output data such that it does not include the IP addresses of the private bridges (Strip private bridge addresses probe-engine#643)

We are also going to be adding a source key to the output data format, so that we are able to distinguish default tor browser bridges from the private ones.

This will also allow us to present the results of the measurements in a more useful way to end users. If a obfs4 TBB bridge is blocked (which the censor might have discovered by scraping the tor browser config files) is a different story than if a private obfs4 bridge is blocked.

@hellais
Copy link
Member Author

hellais commented Jun 5, 2020

From the perspective of orchestra code, we are done. I documented the next steps as a follow up issue which is here: #426.

I am going to close this so we can record it in this sprint.

@hellais hellais closed this as completed Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improving existing code or new feature ooni/backend Issues related to https://github.com/ooni/backend ooni/orchestra Issue related to https://github.com/ooni/orchestra priority/medium
Projects
None yet
Development

No branches or pull requests

3 participants