Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding checklist for candidate validation #42

Merged
merged 2 commits into from
Jun 1, 2019
Merged

Conversation

hugoArregui
Copy link
Member

@hugoArregui hugoArregui commented May 14, 2019

Fixes #35

This implements a basic validation schema using a checklist. We try every pair at least maxTries (7 by default see #35), and mark it as failed if we don't get a success response after that many requests. Once we get a success response, we check if it belongs to the best candidate available so far, if it does we nominate it, otherwise we continue.

Also, after a given timeout, if no candidate has been nominated, we simply choose the best valid candidate we got so far (if no candidate is valid, we mark the connection as failed).

Finally, the nomination request also has a maximum of maxTries, we mark the connection as failed if after that many attempt we fail to get a success response.

@hugoArregui hugoArregui force-pushed the topic-checklist branch 2 times, most recently from 75516e9 to 9c4ffd6 Compare May 16, 2019 18:42
@enobufs
Copy link
Member

enobufs commented May 16, 2019

From a quick review, it looks good to me. I guess this PR does not address adding a new prflx "local" candidate. Correct? (I guess that would be the next step after this)
One test (TestTimeout) is failing, btw.

@hugoArregui
Copy link
Member Author

hugoArregui commented May 16, 2019

@enobufs 🤔 not sure what you mean by prflx local candidate

@enobufs
Copy link
Member

enobufs commented May 16, 2019

It's related to things we talked about earlier around "we should check (xor-)mapped address in the binding-response message".. so that we can have a complete list of local candidates that would include prflx candidate also.

Say if you have this srflx candidate as your local candidate.

Candidates [
{
Type: srflx
Local transport addr: 1.2.3.4:6000
Related Addr: 192.1.1.2:5000
}]

Later, you may find a new prflx candidate from the binding-response:

Candidates [
{
Type: srflx
Local transport addr: 1.2.3.4:6000
Related Addr: 192.1.1.2:5000
}, {
Type: prflx
Local transport addr: 1.2.3.4:6001
Related Addr: 192.1.1.2:5000
}]

I believe doing this does not affect "connectivity" (nor checklist because these share the same related addr...), but it may help later in selection logic (I think prflx is preferred more than srflx, etc) and diagnostic purpose (you can tell which candidate(mapped-addr) is actually being used(valid/nominated), etc)

@hugoArregui
Copy link
Member Author

@enobufs oh yes, you are right: https://github.com/pion/ice/blob/master/candidatetype.go#L35 didn't realise that, I thought it was just a useful debugging property.

@hugoArregui hugoArregui marked this pull request as ready for review May 17, 2019 14:16
@hugoArregui hugoArregui changed the title WIP: adding checklist for candidate validation Adding checklist for candidate validation May 17, 2019
@enobufs
Copy link
Member

enobufs commented May 21, 2019

TestTimeout is failing on my mac...(rev @ae901)

=== RUN   TestTimeout
ice ERROR: 2019/05/21 15:47:47 error processing checkCandidatesTimeout handler the agent is closed
ice ERROR: 2019/05/21 15:47:47 error processing checkCandidatesTimeout handler the agent is closed
--- FAIL: TestTimeout (30.00s)
    transport_test.go:43: Connection timed out early. (after 29900 ms)

@hugoArregui
Copy link
Member Author

@enobufs yes, that's a known issue, also happens with master. Fails on mac and not on linux, not sure why (I never actually checked)

@enobufs
Copy link
Member

enobufs commented May 22, 2019

I see. I don't know exactly what the testing is testing, but as it says, Connection timed out early. (after 29900 ms) - I'd assume it is trying validate that the timeout shouldn't occur too early. The error says, it at least did not timeout in 29.9s - pretty close! Maybe the timeout threshold in the test code is the problem? (just guessing...)

Copy link
Member

@enobufs enobufs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the timeout test failure (appears to be a separate issue), the changes look good to me.

@enobufs
Copy link
Member

enobufs commented May 25, 2019

I see. I don't know exactly what the testing is testing, but as it says, Connection timed out early. (after 29900 ms) - I'd assume it is trying validate that the timeout shouldn't occur too early. The error says, it at least did not timeout in 29.9s - pretty close! Maybe the timeout threshold in the test code is the problem? (just guessing...)

This has been addressed in #49 !

@masterada masterada mentioned this pull request May 26, 2019
@hugoArregui hugoArregui force-pushed the topic-checklist branch 3 times, most recently from 9a85f5c to b056104 Compare May 31, 2019 18:49
@codecov-io
Copy link

codecov-io commented May 31, 2019

Codecov Report

Merging #42 into master will decrease coverage by 2%.
The diff coverage is 58.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
- Coverage   70.89%   68.89%   -2.01%     
==========================================
  Files          20       20              
  Lines        1357     1469     +112     
==========================================
+ Hits          962     1012      +50     
- Misses        327      370      +43     
- Partials       68       87      +19
Impacted Files Coverage Δ
candidatepair.go 74.19% <10%> (-12.35%) ⬇️
gather.go 34.71% <33.33%> (+0.03%) ⬆️
selection.go 64.59% <53.62%> (-10.66%) ⬇️
agent.go 75.35% <69.13%> (-3.49%) ⬇️
candidatetype.go 95% <0%> (-5%) ⬇️
candidate_base.go 88.18% <0%> (-1.82%) ⬇️
candidaterelatedaddress.go 100% <0%> (+30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a58a281...0509572. Read the comment docs.

agent.go Outdated
@@ -215,6 +228,41 @@ func NewAgent(config *AgentConfig) (*Agent, error) {
}
a.haveStarted.Store(false)

a.maxBindingRequests = config.MaxBindingRequests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!

Would you mind moving the defaultValues into const values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@Sean-Der
Copy link
Member

Amazing work @hugoArregui !

This is an approve from me, would you mind just moving the 'default values' into constants. Just easier for people to audit/twiddle that way.

Beyond that I am gonna go home and play with this a little bit, but it is a approve from me :)

This implements a basic validation schema using a checklist. We try
every pair at least maxTries, and mark it as failed if we don't get a
success response after that many requests. Once we get a success
response, we check if it belongs to the best candidate available so far,
if it does we nominate it, otherwise we continue.

Also, after a given timeout, if no candidate has been nominated, we
simply choose the best valid candidate we got so far (if no candidate is
valid, we mark the connection as failed).

Finally, the nomination request also has a maximum of maxTries, we mark
the connection as failed if after that many attempt we fail to get a
success response.
Add helper function to add localCandidates. Brings down the duplication
and make sure we have a properly formed checklist when we are doing
trickle.

When comparing candidates in findPair do by value, and not address.
Before some candidates were failing to be found because of this.
@Sean-Der Sean-Der merged commit cfa4fc7 into master Jun 1, 2019
@Sean-Der Sean-Der deleted the topic-checklist branch June 1, 2019 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

ICE connection state stuck at 'checking' forever
4 participants