#16559, bwauth code needs to be smarter about failed circuits #160

juga0 · 2018-05-14T13:56:08Z

@pastly this is for working on https://trac.torproject.org/projects/tor/ticket/16559.
Since we changed to use http and not helper relay (or now is optional, i dunno), could you update us on the status of this and/or point to the code that checks when a circuit fail or a download fail?.
If there're not such checks i'd create them, and if can't know so far the reason for failure, i'd try to include it so we can debug better

pastly · 2018-05-14T13:59:28Z

https://github.com/pastly/simple-bw-scanner/blob/2727e20ed54df9d37401035b97a6a67fd4ec60a6/sbws/core/scanner.py#L131 is measure_relay, which does all the circuit building and measuring. See where it calls measure_rtt_to_server and measure_bandwidth_to_server and see where it returns ResultError types early if something goes wrong.

pastly · 2018-05-14T14:01:08Z

From trac#16559 ...

it will still be counted as a circuit that went through all of the nodes [...] the code will consider failures from unstable nodes to be relevant for nodes that are perfectly stable

sbws has the same limitation.

juga0 · 2018-05-15T09:31:12Z

hmm, not sure what "relevant" means here.
Maybe we could check whether the relays have the stable flag?
After looking the code, i found the possible errors:

unable to build circuit between relay and exit: ResultErrorCircuit
- PathLengthException
- is_controller_okay: have to look what this means
- InvalidRequest, CircuitExtensionFailed, ProtocolError: this errors can tell us more about what failed?
- is c.new_circuit able to tell us more?
web server usable: ResultErrorStream: have to look what this means
measure bw and rtt: ResultErrorStream
- requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout: this errors can tell us more about what failed?
- other possible errors with requests?

pastly · 2018-05-15T12:35:16Z

In this comment I'm saying that if one relay in a circuit causes or experiences a problem, all relays in the circuit will get a ResultError* given to it.

A ResutlErrorCircuit is created if the circuit can't be built. As far as I know, there is no easy way to learn from stem where the circuit failed. For example, you can't easily tell if you failed to connect to the first relay or if the first relay failed to connect to the second. I think you could technically figure this out by watching all circuit events while building a circuit and seeing where you get EXTENDED events; however, I don't think this is useful. If relayA can't extend to relayB, is it really relayA's fault? Or is it relayB's fault? How can you know?

A ResultErrorStreamis created when something went wrong with the TCP stream between the scanner and the destination web server. For example, the destination suddenly stopped being "usable" (one reason would be that the file suddenly became too small), or the stream was closed during the RTT or bandwidth measurements.

is_controller_okay makes sure our control connection to Tor is still good.

Regarding c.new_circuit and the exceptions it throws, you can double check me here to see if we can learn more information from the exceptions. I don't think we really can.

Regarding the Requests exceptions, it doesn't know anything about Tor so it only says it failed to connect to the web server or timed out.

In short, I don't think there's much productive work to be done for this ticket.

juga0 · 2018-05-15T21:03:21Z

Thanks for all the useful comments.
Do you or @teor2345 still think that maybe we should collect/analyze all those different errors (even the lack of reasons for them) in logs or even put them in the bandwidth files so hopefully we can figure out something about them?.

teor2345 · 2018-05-17T02:44:24Z

Do you or @teor2345 still think that maybe we should collect/analyze all those different errors (even the lack of reasons for them) in logs or even put them in the bandwidth files so hopefully we can figure out something about them?.

We can count failures in the bandwidth file. But there's no room for detailed information. We can log failures. But we should let operators disabled failure logs, because they can be large.

juga0 · 2018-05-21T11:59:54Z

We can count failures in the bandwidth file.

ok, so far the intermediate files contain success, error-X, X = {misc, circ, stream, auth}, i'm not sure if we'd like to have a key_value for each type of error with the total number of them, or just one key_value called something like errors_measuring.

But there's no room for detailed information.

Currently intermediate files contain a msg with the reason for failure, though that message could be improved.

We can log failures.

most of them are being logged already

But we should let operators disabled failure logs, because they can be large

So far that could be managed with logging config. All those errors are currently shown as warning, so logging should be configure to error.

teor2345 · 2018-05-21T12:18:43Z

i'm not sure if we'd like to have a key_value for each type of error with the total number of them, or just one key_value called something like errors_measuring.

Maybe we should include success counts as well?

8 key_values per relay would take up ~150 bytes per relay. That's well below the 512 byte line limit. And that's only an extra megabyte for 6000 relays.

So let's have the detailed information.

And let's patch the spec to document the new fields.

juga0 · 2018-05-21T12:40:25Z

ok, though if it's not a problem, i'd start 1st with sbws, we can leave it in a branch until spec merged

teor2345 · 2018-05-21T12:50:33Z

I'm not sure what you mean.

The bandwidth file spec was merged in https://bugs.torproject.org/25869 to https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt

You can do a sbws and torspec branch with these new fields, and get them both reviewed at the same time.

juga0 · 2018-05-21T13:01:58Z

What i mean is that i need to add new key_value pairs to the bw lines to:

sbws
bandwidth file spec

And i would do it in that other, but leaving the changes in a branch, so that the spec branch can be merged before the sbws branch, in case it's needed.

I'm not sure we want to patch Torflow, but i think we can decide later.

teor2345 · 2018-05-21T13:07:57Z

No, torflow isn't getting any new features

juga0 · 2018-05-25T10:41:19Z

Ticket for changing the spec: https://trac.torproject.org/projects/tor/ticket/26200

juga0 added the bug Something isn't working label May 14, 2018

juga0 added this to the SoP milestone May 14, 2018

juga0 self-assigned this May 14, 2018

juga0 mentioned this issue May 28, 2018

Issue160 #176

Merged

pastly closed this as completed in #176 Jun 7, 2018

juga0 mentioned this issue Jun 7, 2018

Add function to obtain relay ed25519 #174

Merged

juga0 modified the milestones: SoP, 1.0 (MVP must) Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#16559, bwauth code needs to be smarter about failed circuits #160

#16559, bwauth code needs to be smarter about failed circuits #160

juga0 commented May 14, 2018 •

edited

pastly commented May 14, 2018

pastly commented May 14, 2018

juga0 commented May 15, 2018

pastly commented May 15, 2018

juga0 commented May 15, 2018

teor2345 commented May 17, 2018 via email

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 25, 2018

#16559, bwauth code needs to be smarter about failed circuits #160

#16559, bwauth code needs to be smarter about failed circuits #160

Comments

juga0 commented May 14, 2018 • edited

pastly commented May 14, 2018

pastly commented May 14, 2018

juga0 commented May 15, 2018

pastly commented May 15, 2018

juga0 commented May 15, 2018

teor2345 commented May 17, 2018 via email

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 21, 2018

teor2345 commented May 21, 2018

juga0 commented May 25, 2018

juga0 commented May 14, 2018 •

edited