New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[releaser-4.7] Bug 1950131: fix deadlock in EgressFirewall DNS code #513
[releaser-4.7] Bug 1950131: fix deadlock in EgressFirewall DNS code #513
Conversation
Currently when adding EgressFirewalls with 5 or so dnsNames (observed with 5 dnsNames) it is probable that a deadlock will occur. thread openshift#1 calls (e *EgressDNS) Add() grabs lock e.lock and calls signalAdded(dnsName) and continues looping to the next dnsName calling (e *EgressDNS) Add() grabbing e.lock and trying to write to the channel thread openshift#2 recieves the signalAdded(dnsName) and is waiting to grab e.lock in updateEntry for name thread openshift#1 cannot write to the channel because the channel blocks waiting for the write and holds e.lock thread openshift#2 cannot continue with updateEntryForName() because it is wating for e.lock Solve this problem by taking the add goroutine out of the long running update goroutine. By spawning a goroutine for every DNS name the goroutines can wait until it is there turn to update. includes a small e2e test that should exercise the deadlocking codepaths Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
@JacobTanenbaum: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@JacobTanenbaum: This pull request references Bugzilla bug 1950131, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@JacobTanenbaum: This pull request references Bugzilla bug 1950131, which is valid. 6 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, JacobTanenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@JacobTanenbaum: All pull requests linked via external trackers have merged: Bugzilla bug 1950131 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is not an automated cherry pick there where some minor merge conflicts in egressfirewall_dns.go and had to disregard everything in /test/e2e/e2e.go
Currently when adding EgressFirewalls with 5 or so dnsNames (observed
with 5 dnsNames) it is probable that a deadlock will occur.
thread #1 calls (e *EgressDNS) Add() grabs lock e.lock and calls
signalAdded(dnsName) and continues looping to the next dnsName calling
(e *EgressDNS) Add() grabbing e.lock and trying to write to the channel
thread #2 recieves the signalAdded(dnsName) and is waiting to grab
e.lock in updateEntry for name
thread #1 cannot write to the channel because the channel blocks waiting
for the write and holds e.lock
thread #2 cannot continue with updateEntryForName() because it is wating
for e.lock
Solve this problem by taking the add goroutine out of the long running
update goroutine. By spawning a goroutine for every DNS name the
goroutines can wait until it is there turn to update.
includes a small e2e test that should exercise the deadlocking codepaths
Signed-off-by: Jacob Tanenbaum jtanenba@redhat.com
- What this PR does and why is it needed
- Special notes for reviewers
- How to verify it
- Description for the changelog