Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsets restore sometimes loses an update #1347

Closed
fasaxc opened this issue Feb 17, 2017 · 4 comments

Comments

Projects
None yet
1 participant
@fasaxc
Copy link
Member

commented Feb 17, 2017

Seen in a simulated scale run with lots of churn. From time to time, we hit an error where ipset claims that an IP can't be deleted from the set because it's not present. I created a script that replayed the ipset interactions and it fails only some of the time, indicating that ipset or the kernel is at fault.

@fasaxc fasaxc self-assigned this Feb 17, 2017

@fasaxc fasaxc added this to the Calico v2.1.0 milestone Feb 17, 2017

@fasaxc

This comment has been minimized.

Copy link
Member Author

commented Feb 17, 2017

Started a thread on the netfilter list: http://marc.info/?l=netfilter&m=148734147004186&w=2

@fasaxc

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2017

confirmed by the netfilter team as a kernel bug: https://bugzilla.netfilter.org/show_bug.cgi?id=1119

@fasaxc

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2017

Since this is a kernel bug we need to implement a workaround.

@fasaxc fasaxc referenced this issue Feb 27, 2017

Merged

Implement bulk updates to IP sets #1357

1 of 1 task complete

@fasaxc fasaxc closed this in #1357 Mar 1, 2017

fasaxc added a commit that referenced this issue Mar 1, 2017

Merge pull request #1357 from fasaxc/ipsets-perf
Implement bulk updates to IP sets

-  Improve IP sets update performance by batching multiple set updates into single ipset restore calls.  
- Avoid buffering large amounts of input/output by doing streaming reads/writes from pipes to the child process.
- Replace `AttemptCleanup()` start-of-day cleanup routine with a resync mechanism that reads back the contents of the dataplane.  Works around / Fixes #1347. 

Design:

- Move write function to `Registry`, renamed as `IPSets`.
- Reduce IPSet struct to a data struct rather than a fully fledged object to avoid having an intricate coupling between the two objects.
- To support comparison during resync, convert IPs/CIDRs to canonical form and store them as ip.Addr/ip.CIDR objects instead.
- Remove IP set existence cache since logic is now rolled into resync.
@fasaxc

This comment has been minimized.

Copy link
Member Author

commented Mar 7, 2017

The kernel bug has been there since at least v4.2. This commit: torvalds/linux@18f84d4#diff-004327fbdfc8bec3558628e163fd8f6b may have introduced it (the offending line was present afterwards and not before but I don't understand that commit well enough to know if it copied the bug from another location).

The bug was fixed in the kernel in this commit: torvalds/linux@50054a9 which should be in kernel version v4.11.

song-jiang pushed a commit to song-jiang/felix that referenced this issue Aug 30, 2017

Merge pull request projectcalico#1357 from fasaxc/ipsets-perf
Implement bulk updates to IP sets

-  Improve IP sets update performance by batching multiple set updates into single ipset restore calls.  
- Avoid buffering large amounts of input/output by doing streaming reads/writes from pipes to the child process.
- Replace `AttemptCleanup()` start-of-day cleanup routine with a resync mechanism that reads back the contents of the dataplane.  Works around / Fixes projectcalico#1347. 

Design:

- Move write function to `Registry`, renamed as `IPSets`.
- Reduce IPSet struct to a data struct rather than a fully fledged object to avoid having an intricate coupling between the two objects.
- To support comparison during resync, convert IPs/CIDRs to canonical form and store them as ip.Addr/ip.CIDR objects instead.
- Remove IP set existence cache since logic is now rolled into resync.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.