Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VXLAN calculation and data plane #1989

Merged
merged 1 commit into from Apr 5, 2019

Conversation

Projects
None yet
3 participants
@caseydavenport
Copy link
Member

commented Mar 30, 2019

Description

Adds VXLAN calculator to the calc graph, as well as the Linux dataplane.

Todos

  • Unit tests (full coverage)
  • Integration tests (delete as appropriate) In plan/Not needed/Done
  • Documentation
  • Backport
  • Release note

Release Note

Felix programs VXLAN routes

@caseydavenport caseydavenport added this to the Calico v3.7.0 milestone Mar 30, 2019

@caseydavenport caseydavenport requested a review from projectcalico/core-maintainers as a code owner Mar 30, 2019

@caseydavenport

This comment has been minimized.

Copy link
Member Author

commented Mar 30, 2019

The dataplane is still pretty crude - a few things still need doing, including cleaning up old L2 entries (right now they stay around forever) and also just some general code cleanup.

I think the calc graph piece is getting there though. One thing it doesn't yet handle is nodes with duplicate IP / VXLAN tunnel addresses.

I think we can probably live with the latter (they are assigned through our IPAM, so should be unique).

For the former, if two nodes both have an IP address, we'll potentially send two VTEP updates to the dataplane, meaning we'll try to set up two static ARP entries for the same IP. Need to think about whether this is acceptable or not - it's clearly a misconfiguration, the question is how should we handle it and will the current code fix itself once the misconfiguration is resolved.

@caseydavenport caseydavenport force-pushed the caseydavenport:casey-vxlan branch from 07acc0d to 327be9e Mar 30, 2019

Show resolved Hide resolved calc/event_sequencer.go Outdated
Show resolved Hide resolved calc/event_sequencer.go Outdated
@fasaxc
Copy link
Member

left a comment

Looking good for a WiP, few thoughts inline

Show resolved Hide resolved dataplane/driver.go
Show resolved Hide resolved dataplane/linux/vxlan_mgr.go Outdated
Show resolved Hide resolved dataplane/linux/vxlan_mgr.go Outdated
Show resolved Hide resolved dataplane/linux/vxlan_mgr.go Outdated
Show resolved Hide resolved dataplane/linux/vxlan_mgr.go Outdated
Show resolved Hide resolved calc/vxlan_resolver.go Outdated
Show resolved Hide resolved calc/vxlan_resolver.go Outdated
Show resolved Hide resolved calc/vxlan_resolver.go
Show resolved Hide resolved calc/vxlan_resolver.go Outdated
Show resolved Hide resolved calc/vxlan_resolver.go Outdated

@caseydavenport caseydavenport changed the title [WIP] VXLAN calculation and data plane VXLAN calculation and data plane Apr 2, 2019

Show resolved Hide resolved calc/vxlan_resolver.go
Show resolved Hide resolved calc/vxlan_resolver.go
Show resolved Hide resolved calc/vxlan_resolver.go Outdated

// vtepMACForHost calculates a deterministic MAC address based on the provided host.
// The returned address matches the address assigned to the VXLAN device on that node.
func (c *VXLANResolver) vtepMACForHost(nodename string) string {

This comment has been minimized.

Copy link
@fasaxc

fasaxc Apr 3, 2019

Member

Does this match flannel's algorithm? It'd probably need to if we want to support migration at some point.

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Apr 3, 2019

Author Member

Bah, it looks like flannel doesn't use an algorithmic approach for this - it relies on the kernel to generate a MAC for the device when creating it, rather than doing deterministic calculation of it.

https://github.com/coreos/flannel/blob/master/backend/vxlan/vxlan.go#L137

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Apr 4, 2019

Author Member

@fasaxc wondering if we can merge as-is and then do a follow up to adjust this.

I'm not really sure what we can do about this - we need to teach flannel about our local VXLAN tunnel endpoints, and also learn about Flannel's in order to make this work, which means either augmenting Calico to write to flannel's API or vice-versa, I think.

This comment has been minimized.

Copy link
@fasaxc

fasaxc Apr 5, 2019

Member

Maybe we just start storing the MAC in the datamodel? We already store the IP.

Show resolved Hide resolved dataplane/linux/vxlan_mgr.go Outdated
Show resolved Hide resolved routetable/route_table.go Outdated
Show resolved Hide resolved calc/vxlan_resolver.go
@fasaxc

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

I can think of some contrived ways that we could get clashing tunnel addresses (resync while two nodes get recreated). What would actually happen in that case, what's the blast radius and would it fix itself (by triggering a panic if nothing else)?

@caseydavenport

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

I can think of some contrived ways that we could get clashing tunnel addresses (resync while two nodes get recreated). What would actually happen in that case, what's the blast radius and would it fix itself (by triggering a panic if nothing else)?

Since everything is keyed by node, rather than by address, they won't end up hiding each other in the calc graph.

We'll end up with two VTEP entries which share a tunnel address, which means we'll get two ARP entries for the same IP but with different MAC addresses. One of them will be programmed second, and will win out.

Once the duplicate IPs are fixed, it should resolve itself by sending updates with the new IP down to the data plane, at which point the ARP entries won't conflict and both will be programmed.

So, I think it should just work.

@fasaxc fasaxc self-assigned this Apr 4, 2019

Show resolved Hide resolved config/config_params.go Outdated

@caseydavenport caseydavenport force-pushed the caseydavenport:casey-vxlan branch from 57bd2c8 to 3f10fca Apr 4, 2019

@CLAassistant

This comment has been minimized.

Copy link

commented Apr 4, 2019

CLA assistant check
All committers have signed the CLA.

@caseydavenport caseydavenport force-pushed the caseydavenport:casey-vxlan branch from 3f10fca to ccae2ea Apr 4, 2019

@fasaxc

fasaxc approved these changes Apr 5, 2019

Copy link
Member

left a comment

Just a couple of log-level nits. Would be nice to have a more efficient dataplane API rather than doing a resync every time but let's scale test first to see if it's needed.

Don't think I need to re-review.

Show resolved Hide resolved routetable/route_table.go Outdated
Show resolved Hide resolved routetable/route_table.go Outdated

@caseydavenport caseydavenport force-pushed the caseydavenport:casey-vxlan branch from 56be364 to 8053857 Apr 5, 2019

@caseydavenport caseydavenport force-pushed the caseydavenport:casey-vxlan branch from 8053857 to bbc860e Apr 5, 2019

@caseydavenport caseydavenport merged commit 5e8df2e into projectcalico:master Apr 5, 2019

2 checks passed

license/cla Contributor License Agreement is signed.
Details
semaphoreci The build passed on Semaphore.
Details

@caseydavenport caseydavenport deleted the caseydavenport:casey-vxlan branch Apr 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.