New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ovn: fix reserve joinSwitch LRP IPs #2331
Conversation
Added test correction. |
Can someone re-trigger the CI jobs? |
/retest |
1 similar comment
/retest |
Does anyone have any idea why my retests are being cancelled? |
/retest |
1 similar comment
/retest |
No clue; I restarted them. |
/retest |
I ran the test in my fork and they were not cancelled |
/retest |
A review would be very nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think this looks good. We don't currently have any e2e tests which performs disruptive tests as you mention (delete ovnkube-master while tests are running), so I think this is fine.
@trozet : could you please have a look at this. I had noticed the same issues locally on my computer, and I suspect this might have upgrade impacts.
I rebased my changes to the current master. |
@Reamer with this change we now do:
later in syncNodes() we again do the same things for existing nodes..
since we have already reserved the IP you will error out 2nd time here
|
Hi @girishmg,
What do you think? |
@girishmg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
3 similar comments
/retest |
/retest |
/retest |
@alexanderConstantinescu Do you see why the tests fail? |
I suspect this means you need to |
/retest |
I managed to work out what the issue is and filed: #2428 Sorry for that! |
/retest |
1 similar comment
/retest |
master.go: Use of ensureJoinLRPIPs, which also checks the running DB. logical_switch_manager: ensureJoinLRPIPs now also looks into the running DB and fills the cache on a hit. Move getJoinLRPAddresses from gateway to logical_switch_manager Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
logical_switch_manager: During startup, getJoinLRPAddresses validates the active joinLRPAddress against the node's subnet, but because of the early state, the node's subnets are empty, instead we should validate against the join switch's subnets that are already initialised. Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
I have done a git rebase. |
Great! I merged #2434 , so I am closing this. Thanks resolving this! |
I would like to see this fixed in the openshift 4.7 and 4.8 branch as soon as possible. The fixed bug is massively hindering me in setting up my productive environment. |
It's coming. I am opening up a PR as we write. |
- What this PR does and why is it needed
This PR fixes the reservation of joinSwitch IPs and is needed to make egressIPs work even if the active ovn changes due to a node failure or ovn update.
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1973215
- Special notes for reviewers
namespace.go: When using the HostNetworkNamespace feature, the synchronisation
code for namespaces triggers the ensureJoinLRPIPs method, which returns
a valid IP from the join subnet without considering a possible active
IP address. The end result is that the gwLRPIP is changed every time ovn is
restarted and this breaks things like egressIPs.
gateway: During startup, getJoinLRPAddresses validates the
active joinLRPAddress against the node's subnet, but because of
the early state, the node's subnets are empty, instead we should
validate against the join switch's subnets that are already initialised.
- How to verify it
I have tested it manually in my environment.
I know that these changes should be tested with a unit test or e2e test. Unfortunately, my Go expertise is low and I don't know how to test such a complex behaviour as restarting the application. I hope an experienced maintainer can add tests.
- Description for the changelog
Fix the reservation of JoinLRPAddresses