Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only delete conntrack entries for orig-src and reply-src. #1500

Merged
merged 2 commits into from
Jul 11, 2017

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Jul 11, 2017

This reduces the number of (expensive) scans of the conntrack table we need to do.

Description

This was one of the ideas that @doublek / @matthewdupre / I came up with to reduce the time spent removing conntrack flows. After closer inspection of the conntrack tuple, I'm convinced that we don't need to look in the other two fields when looking for flows to delete.

Todos

  • Unit tests (full coverage)

Improves https://github.com/projectcalico/felix/issues/1501

Release Note

Improve performance of dataplane driver by reducing number of conntrack deletions.

This reduces the number of scans of the conntrack table we need to do.
// deleting entries for local workload endpoints, either the endpoint originated the traffic, or it
// received the traffic and replied to it. In the originating case, the "original source" will be
// set to the endpoint's IP; in the other case, the "reply source". Hence, it's sufficient to only
// look in those two fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very helpful, but I think it may need even a bit more in order to explain why only looking at orig-src and reply-src is watertight for our purposes.

The concerns are, I think that:

  • There could be a (partial) flow with orig-dst==<endpoint IP> but no reply-src, if the endpoint chooses not to respond at all. At least with UDP that would be quite easy to arrange; probably more tricky with TCP.
  • Is there any way to get a flow with reply-dst==<endpoint IP>!=orig-src ?

I think the possible answers are that:

  • It isn't possible to get conntrack state at all for such cases, because they fall outside the definition of what conntrack considers to be a flow; and/or
  • There can be conntrack state for such cases, but it naturally expires so quickly that we don't care about Felix deleting it here. (Hypothesis:) Conntrack state is only long-lived for complete flows with reply-src==orig-dst and reply-dst==orig-src, and it's only that long-lived state that we are concerned about deleting here.

Feel free to cut and paste if any of this is right or useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conntrack tuples are produced as a result of the NAT calculation; it doesn't matter that there hasn't yet been a reply; the reply direction would still exist and have the expected reply source IP in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now, I had completely misunderstood. (So this exchange was very useful!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check my new understanding, then:

  • If orig-dst is different from reply-src, that would only be because of a DNAT affecting connection to the local workload, with orig-dst being the service or floating IP and reply-src being the endpoint IP. Since we are deleting by endpoint IP, reply-src is the only useful lookup.

  • If reply-dst is different from orig-src, that would only be because of an SNAT affecting outgoing connections from the local workload, with orig-src being the endpoint IP - and again, because we are deleting by endpoint IP, orig-src is the only useful lookup.

That all seems watertight to me, so I'll approve now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding. I can't think of any other cases.

// deleting entries for local workload endpoints, either the endpoint originated the traffic, or it
// received the traffic and replied to it. In the originating case, the "original source" will be
// set to the endpoint's IP; in the other case, the "reply source". Hence, it's sufficient to only
// look in those two fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check my new understanding, then:

  • If orig-dst is different from reply-src, that would only be because of a DNAT affecting connection to the local workload, with orig-dst being the service or floating IP and reply-src being the endpoint IP. Since we are deleting by endpoint IP, reply-src is the only useful lookup.

  • If reply-dst is different from orig-src, that would only be because of an SNAT affecting outgoing connections from the local workload, with orig-src being the endpoint IP - and again, because we are deleting by endpoint IP, orig-src is the only useful lookup.

That all seems watertight to me, so I'll approve now.

@fasaxc fasaxc merged commit fead0b9 into projectcalico:master Jul 11, 2017
@fasaxc fasaxc deleted the reduce-ct-passes branch July 11, 2017 15:45
@caseydavenport caseydavenport added this to the Calico v2.4.0 milestone Jul 13, 2017
song-jiang pushed a commit to song-jiang/felix that referenced this pull request Aug 30, 2017
Only delete conntrack entries for orig-src and reply-src.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants