-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with circumvention tool operated test target services #396
Comments
FYI, we're using #32740 on Tor's side to keep track of this project. Also, I revised the specification of wolpertinger, the new service that will hand out bridges to OONI, after taking into account @bassosimone's feedback. Here's the latest spec: |
Two questions:
|
My recommendation would be to setup a copy of the OONI MetaDB on which you will run SQL queries to dump the data that you need. I wrote down some information on how this could be done for the old vanilla-tor test: https://trac.torproject.org/projects/tor/ticket/32126#comment:4. We currently don't do any feature extraction for the tor test, however we are planning on doing that soon and once that is done it will be present in the OONI MetaDB. The work to be done on doing this feature extraction is tracked with this issue: #210.
No, we don't record any unique ID of OONI Probe clients in the public data, due to privacy concerns. We have been discussing how to achieve something related to this for a long time, but it's been stalling a bit due to lack of time available to think of how we can do it in a way that is most respectful to users. See: ooni/probe#438 |
Thanks, hellais! I filed a ticket to expose our new API on https://bridges.torproject.org/wolpertinger/. Here's the input that it expects from OONI's side. Since there are no unique IDs, use an empty string for the "id" field. The "type" field should say "ooni", "country_code" is the country code of the OONI probe that will test the returned bridge(s), and "auth_token" is a confidential authentication token that I will send you separately. Is there anything else that we can do from our side to move this forward? |
Thanks for sharing that @NullHypothesis! This seems pretty easy to integrate into our backend. What Should we always be using the addresses returned by this API for running OONI Probe tests (perhaps using as fallback the tor browser bridges), or do we want to test these addresses in addition too the defaults in the file: https://github.com/ooni/sysadmin/blob/1df5bee4d3794770b55d6367b43ba8ae2d811d05/ansible/roles/probe-services/templates/tor_targets.json? In terms of integration, we would have to write some code to go into the orchestra backend, here: |
I will send you an authentication token over email.
I suggest testing wolpertinger's bridges in addition to our default bridges because these two bridge sets are disjoint.
Gotcha. Please let me know if you would like to see any changes to wolpertinger's API. After all, OONI is the primary consumer. |
Hmm, setting up a MetaDB on polyanthum (the host that runs BridgeDB) seems overkill. The host currently has ~30 GB of free disk space and 2 GB of RAM. Let me give you a better idea of the data we expect: We have ~1,000 bridges and would like to test each one occasionally, in different countries. If we test each bridge in each country every, say, two weeks, we end up with approximately 1,000 * 250 * 2 = 500,000 measurement records per month. Given this (likely exaggerated) estimate, can you think of a more lightweight solution to this problem? |
Do you mind getting the data with a bit of lag? If you don't mind polling for data every 24h and getting it with that sort of lag, I would suggest ingesting that data directly from the raw data cans. This will allow you to download just the raw measurements for the Tor tests. The docs of the can dataformat can be found here: https://ooni.org/post/mining-ooni-data/. I will see if I can put together a gist that demonstrates how to do this (here are in the meantime some notes: https://gist.github.com/hellais/d60c3ca9d4d456c01ecf793a5763d510). In order to get historical data you will have to go over every past bucket_date to retrieve all the data for them. It should not take so much depending on the connectivity of the host you are using. |
Yes, a 24h lag is something we can work with. Thanks for that!
Great, I'll get started with these links! |
I tried speaking to the bridges API backend using: import requests
payload = {
"id": "",
"type": "ooni",
"country_code": "IT",
"auth_token": "[REDACTED]"
}
requests.post('https://bridges.torproject.org/wolpertinger/', data=payload) But I get a 404 error. Is this because this service is not yet deployed? |
Yes, it wasn't deployed at the time you were testing it but it should work now. The API endpoint is https://bridges.torproject.org/wolpertinger/fetch |
I would like to make a small change to wolpertinger's API and rename the API endpoint from
to
The change is reflected in the documentation. Is this ok with you? |
Thanks for the heads up! Yeah it's not a problem for us. When is this change going to be deployed? |
@NullHypothesis I started adding support for this inside of the backend here: https://github.com/ooni/orchestra/pull/88/files. There is a problem though with the data format returned by the bridges.tpo API backend. The parameters to be passed to obfs4proxy are in the form of Would it be possible to change the response format of the API so that we can get these values in a way that is compatible with our clients? If that is too much of a hassle I suppose we can also do the transformation in the backend, but if we are the only consumers of the data maybe it's easier to make the change on your end. I don't recall exactly why we opted for a list and that format (@bassosimone maybe remembers). |
I have finished implementing this PR: ooni/orchestra#88. This is good to go. |
Yes that is correct.
As a further update on this, we have finished writing the backend code, however we are waiting on deploying it until we have in place:
We are also going to be adding a This will also allow us to present the results of the measurements in a more useful way to end users. If a obfs4 TBB bridge is blocked (which the censor might have discovered by scraping the tor browser config files) is a different story than if a private obfs4 bridge is blocked. |
From the perspective of orchestra code, we are done. I documented the next steps as a follow up issue which is here: #426. I am going to close this so we can record it in this sprint. |
It was discussed with @NullHypothesis that it might be desirable for Tor to dynamically give us testing targets for probes including private tor bridges.
This issue is for tracking that work that needs to be done there.
The text was updated successfully, but these errors were encountered: