Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sessionresolver: refactor and cleanup #2135

Closed
8 of 10 tasks
bassosimone opened this issue Jun 8, 2022 · 1 comment · Fixed by ooni/probe-cli#1068
Closed
8 of 10 tasks

sessionresolver: refactor and cleanup #2135

bassosimone opened this issue Jun 8, 2022 · 1 comment · Fixed by ooni/probe-cli#1068
Assignees
Labels
cleanup There's need to cleanup stuff a bit ooni/probe-engine priority/low refactoring techdebt This issue describes technical debt

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Jun 8, 2022

While working on #2121, I noticed that the sessionresolver package needs refactoring and cleanup. I will document in this issue what changes are required while I implement them.

  • the childResolver type is useless and we can use model.Resolver directly;
  • we should use model/mocks instead of custom fakes;
  • we should not use log.Log rather we should use model.DiscardLogger;
  • make timeLimitedLookup easier to test with a -short tests;
  • ensure timeLimitedLookup returns as soon as its context expires regardless of the child resolver;
  • hotfix the data race mentioned at sessionresolver: refactor and cleanup #2135 (comment);
  • create separate issue for investigating the data race more broadly;
  • minor changes in naming of files and types;
  • we can replace dnsclientmaker with a private function;
  • should not use netx rather we should use netxlite;

This issue is related to #2121 and #2115.

@bassosimone bassosimone added priority/low refactoring ooni/probe-engine techdebt This issue describes technical debt cleanup There's need to cleanup stuff a bit labels Jun 8, 2022
@bassosimone bassosimone self-assigned this Jun 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
This diff addresses the following points of ooni/probe#2135:

- [ ] the `childResolver` type is useless and we can use `model.Resolver` directly;
- [ ] we should use `model/mocks` instead of custom fakes;
- [ ] we should not use `log.Log` rather we should use `model.DiscardLogger`;
- [ ] make `timeLimitedLookup` easier to test with a `-short` tests;
- [ ] ensure `timeLimitedLookup` returns as soon as its context expires regardless of the child resolver;

Subsequent diffs will address more points mentioned in there.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
This diff addresses the following points of ooni/probe#2135:

- [x] the `childResolver` type is useless and we can use `model.Resolver` directly;
- [x] we should use `model/mocks` instead of custom fakes;
- [x] we should not use `log.Log` rather we should use `model.DiscardLogger`;
- [x] make `timeLimitedLookup` easier to test with a `-short` tests;
- [x] ensure `timeLimitedLookup` returns as soon as its context expires regardless of the child resolver;

Subsequent diffs will address more points mentioned in there.
@bassosimone
Copy link
Contributor Author

bassosimone commented Jun 8, 2022

It seems my recent PR, ooni/probe-cli#807, introduced this data race:

==================
WARNING: DATA RACE
Read at 0x00c00010a360 by goroutine 29:
  github.com/lucas-clemente/quic-go/http3.(*client).Close()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/client.go:191 +0x3b
  github.com/lucas-clemente/quic-go/http3.(*RoundTripper).Close()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/roundtrip.go:162 +0x156
  github.com/ooni/probe-cli/v3/internal/netxlite.(*http3Transport).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http3.go:42 +0x42
  github.com/ooni/probe-cli/v3/internal/netxlite.(*httpTransportErrWrapper).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http.go:34 +0x42
  github.com/ooni/probe-cli/v3/internal/netxlite.(*httpTransportLogger).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http.go:76 +0x42
  github.com/ooni/probe-cli/v3/internal/bytecounter.(*httpTransport).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/bytecounter/http.go:38 +0x42
  net/http.(*Client).CloseIdleConnections()
      /opt/hostedtoolcache/go/1.18.2/x64/src/net/http/client.go:949 +0x8d
  github.com/ooni/probe-cli/v3/internal/netxlite.(*DNSOverHTTPSTransport).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/dnsoverhttps.go:113 +0x42
  github.com/ooni/probe-cli/v3/internal/netxlite.(*SerialResolver).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/resolverserial.go:63 +0x47
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.(*Resolver).closeall()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/resolvermaker.go:116 +0x174
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.(*Resolver).closeall-fm()
      <autogenerated>:1 +0x39
  sync.(*Once).doSlow()
      /opt/hostedtoolcache/go/1.18.2/x64/src/sync/once.go:68 +0x101
  sync.(*Once).Do()
      /opt/hostedtoolcache/go/1.18.2/x64/src/sync/once.go:59 +0x46
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.(*Resolver).CloseIdleConnections()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/sessionresolver.go:104 +0x64
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.TestTypicalUsageWithFailure()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/sessionresolver_test.go:77 +0x4a4
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1486 +0x47

Previous write at 0x00c00010a360 by goroutine 32:
  github.com/lucas-clemente/quic-go/http3.(*client).dial()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/client.go:104 +0x15e
  github.com/lucas-clemente/quic-go/http3.(*client).RoundTrip.func1()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/client.go:211 +0xa4
  sync.(*Once).doSlow()
      /opt/hostedtoolcache/go/1.18.2/x64/src/sync/once.go:68 +0x101
  sync.(*Once).Do()
      /opt/hostedtoolcache/go/1.18.2/x64/src/sync/once.go:59 +0x46
  github.com/lucas-clemente/quic-go/http3.(*client).RoundTrip()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/client.go:210 +0x268
  github.com/lucas-clemente/quic-go/http3.(*RoundTripper).RoundTripOpt()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/roundtrip.go:116 +0x928
  github.com/lucas-clemente/quic-go/http3.(*RoundTripper).RoundTrip()
      /home/runner/go/pkg/mod/github.com/lucas-clemente/quic-go@v0.27.0/http3/roundtrip.go:121 +0x38
  github.com/ooni/probe-cli/v3/internal/netxlite.(*http3Transport).RoundTrip()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http3.go:37 +0x4a
  github.com/ooni/probe-cli/v3/internal/netxlite.(*httpTransportErrWrapper).RoundTrip()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http.go:26 +0x4e
  github.com/ooni/probe-cli/v3/internal/netxlite.(*httpTransportLogger).RoundTrip()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/http.go:60 +0x2e1
  github.com/ooni/probe-cli/v3/internal/bytecounter.(*httpTransport).RoundTrip()
      /home/runner/work/probe-cli/probe-cli/internal/bytecounter/http.go:50 +0x210
  net/http.send()
      /opt/hostedtoolcache/go/1.18.2/x64/src/net/http/client.go:252 +0x93e
  net/http.(*Client).send()
      /opt/hostedtoolcache/go/1.18.2/x64/src/net/http/client.go:176 +0x157
  net/http.(*Client).do()
      /opt/hostedtoolcache/go/1.18.2/x64/src/net/http/client.go:725 +0x1064
  net/http.(*Client).Do()
      /opt/hostedtoolcache/go/1.18.2/x64/src/net/http/client.go:593 +0x36
  github.com/ooni/probe-cli/v3/internal/netxlite.(*DNSOverHTTPSTransport).RoundTrip()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/dnsoverhttps.go:74 +0x827
  github.com/ooni/probe-cli/v3/internal/netxlite.(*SerialResolver).lookupHostWithoutRetry()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/resolverserial.go:131 +0x23c
  github.com/ooni/probe-cli/v3/internal/netxlite.(*SerialResolver).lookupHostWithRetry()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/resolverserial.go:103 +0x108
  github.com/ooni/probe-cli/v3/internal/netxlite.(*SerialResolver).LookupHost()
      /home/runner/work/probe-cli/probe-cli/internal/netxlite/resolverserial.go:69 +0x8d
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.timeLimitedLookupWithTimeout.func1()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/childresolver.go:39 +0xf7

Goroutine 29 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1486 +0x724
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1839 +0x99
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1439 +0x213
  testing.runTests()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1837 +0x7e4
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1719 +0xa71
  main.main()
      _testmain.go:181 +0x3a9

Goroutine 32 (running) created at:
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.timeLimitedLookupWithTimeout()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/childresolver.go:37 +0x256
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.timeLimitedLookup()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/childresolver.go:22 +0x91
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.(*Resolver).lookupHost()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/sessionresolver.go:172 +0x293
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.(*Resolver).LookupHost()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/sessionresolver.go:142 +0x4f9
  github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver.TestTypicalUsageWithFailure()
      /home/runner/work/probe-cli/probe-cli/internal/engine/internal/sessionresolver/sessionresolver_test.go:36 +0x170
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.2/x64/src/testing/testing.go:1486 +0x47

==================
--- FAIL: TestTypicalUsageWithFailure (0.00s)
    testing.go:1312: race detected during execution of test
FAIL
coverage: 100.0% of statements
FAIL	github.com/ooni/probe-cli/v3/internal/engine/internal/sessionresolver	0.087s

See https://github.com/ooni/probe-cli/runs/6793211653?check_suite_focus=true. I am going to re-run the tests but obviously we need to understand and fix the data race as part of this PR.

bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
These factories will soon be useful to finish with
ooni/probe#2135.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jun 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Feb 1, 2023
This diff closes ooni/probe#2121 because it
removes the last unnecessary netx usage. All the other packages that
currently use netx are network experiments. We will eventually convert
them to not use netx as part of other tracked issues.

This diff closes ooni/probe#2135 because
now the sessionresolver does not depend on netx anymore.

We refactor the way in which we conditionally configure byte counting
for HTTP such that we use a free function rather than a method (we can
use methods with nil receiver in Go, but the free function seems to
be a better choice in this case).

We introduce and use a new bytecounter specifically for the system
resolver. This byte counter is very imprecise but seems still better
than using the system resolver doesn't use any network I/O.

We stop printing the sessionresolver stats when we close a session, since
this component has been in production for years now.

We improve the model.UnderlyingNetwork model to add support for changing
the root cert pool, which helps with writing some integration tests.

We modify the protocol to use and modify the netxlite tproxy (a singleton
instance of UnderlyingNetwork) to make it goroutine safe.

We introduce a new file inside sessionresolver, factory.go, which creates
and properly wraps the resolvers. This code replaces code for which we
previously used netx, and is the core change introduced here.

While there, we refactor how we log in the session resolver to use
the operation logger as we do in some experiments.

We write some additional tests that take advantage of the new netxlite
tproxy mocking functionality to ensure the sessionresolver continues
to work in two extreme use cases: no resolver is available and just the
system resolver is available. I introduced these new tests because I
originally broke the system resolver when I introduced factory.go and
I felt like it was useful to have more robustness here.
bassosimone added a commit to ooni/probe-cli that referenced this issue Feb 1, 2023
This diff tweaks #1068
to make sure overriding the default cert pool works.

In #1068 we introduced
code to add this functionality but we never tested it was working
as intended. It turns out it was not!

Because this diff amends the previous diff, we'll consider it
part of ooni/probe#2135.
bassosimone added a commit to ooni/probe-cli that referenced this issue Feb 1, 2023
This diff tweaks #1068
to make sure overriding the default cert pool works.

In #1068 we introduced
code to add this functionality but we never tested it was working
as intended. It turns out it was not!

Because this diff amends the previous diff, we'll consider it
part of ooni/probe#2135.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup There's need to cleanup stuff a bit ooni/probe-engine priority/low refactoring techdebt This issue describes technical debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant