Skip to content
This repository has been archived by the owner on Dec 3, 2021. It is now read-only.

Potential bug with supporting connections between non-devices #113

Closed
blinklet opened this issue May 18, 2019 · 13 comments
Closed

Potential bug with supporting connections between non-devices #113

blinklet opened this issue May 18, 2019 · 13 comments

Comments

@blinklet
Copy link

I tried to abuse antidote-selfmedicate and define a lab of three linux nodes connected together in a ring topology so I could test open-source routing software like FRR.

It did not work because:

  1. I re-used the utility container as a devices endpoint. But Antidote wants to configure devices using NAPALM. I tried to fake it out using blank configuration files but that just caused an error.
  2. I created three utilities endpoints and tried to make connections between them. This also failed because Antidote will not add new interfaces to, and create connections between, utilities endpoints (even though the lesson validated OK).

I don't even know if the antidotelabs/utility image is suitable but I wanted to try it. I suspect I'll see the same issue if I try to create a custom image with Linux/FRR.

Request:
I suggest you allow Antidote to create connections between utilities endpoints, or set Antidote to not try to run NAPALM on devices endpoints if the configuration files are blank.

Longer term, I suggest you enable Ansible in addition to NAPALM so Antidote can also configure labs that run open-source systems like Linux/FRR.

@Mierdin
Copy link
Member

Mierdin commented May 18, 2019

I created three utilities endpoints and tried to make connections between them. This also failed because Antidote will not add new interfaces to, and create connections between, utilities endpoints (even though the lesson validated OK).

Can you share your lesson definition and relevant shell output that shows this? We have lessons with inter-networked entities that are not network devices, so this is surprising to me.

@Mierdin
Copy link
Member

Mierdin commented May 18, 2019

Also - this touches on an area I've been meaning to fix for a while. The current configuration abstraction is lacking. Not only because it doesn't support anything other than NAPALM currently but also because of how messy it is, conflated with presentation. I elaborated on this in #112, and I think most of what you mention above will be taken care of in that effort.

I'll leave this issue open because there's one thing you mentioned that won't be covered there, which is the potential of a bug with respect to how endpoints are connected. AFAIK there shouldn't be a problem connecting any endpoint at all, and if you are running into problems with this, we should fix that. So please share the details I asked for when you get the chance. I'll also move this to Syringe repo, as this isn't really a selfmedicate issue.

@Mierdin Mierdin closed this as completed May 18, 2019
@Mierdin Mierdin reopened this May 18, 2019
@Mierdin Mierdin changed the title Support open-source routers Potential bug with supporting connections between non-devices May 18, 2019
@Mierdin Mierdin transferred this issue from nre-learning/antidote-selfmedicate May 18, 2019
@Mierdin Mierdin added this to To do in v0.4.0 via automation May 18, 2019
@blinklet
Copy link
Author

blinklet commented May 19, 2019

The lesson definition file I used was:

---
lessonName: Lab connections
lessonId: 101
category: fundamentals
tier: local
description: Connect three Utility containers together.
slug: Networking

utilities:
- name: server1
  image: antidotelabs/utility
- name: server2
  image: antidotelabs/utility
- name: server3
  image: antidotelabs/utility


connections:
- a: server1
  b: server2
- a: server2
  b: server3
- a: server3
  b: server1

stages:
  - id: 1
    description: Configure IP interfaces on each node

When I ran the ip link show command on each utility shell, I saw the endpoint like eth0@if24, but no additional interfaces that I was expecting.

@blinklet
Copy link
Author

blinklet commented May 20, 2019

Here is a capture of the shell from server1

antidote@server1:~$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
29: eth0@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP mode DEFAULT group default
     link/ether 82:d0:29:a1:a0:98 brd ff:ff:ff:ff:ff:ff link-netnsid 0

I was expecting to see two more interfaces...

@Mierdin
Copy link
Member

Mierdin commented May 20, 2019

It's possible that there's a problem with multus, or between Syringe and multus. Can you post the following? Both will produce a lot of output potentially, so feel free to put each one into a Github gist and just link to them here.

  1. output of kubectl describe pod --all-namespaces
  2. Full logs of Syringe (use kubectl logs)

@blinklet
Copy link
Author

Hi, I ran the commands. the output is stored at:
https://gist.github.com/blinklet/eb00c87af2580dfbbdc87db2127d72b5

@Mierdin
Copy link
Member

Mierdin commented May 26, 2019

Hey @blinklet - apologies for the delay, I was traveling a lot last week and only now had a chance to really dive into this.

This is actually a byproduct of the problems I am tackling from #112, where the existing "devices" and "utilities", and "blackboxes" terminology (and underlying implementation) is just showing how problematic it has become. In particular, the logic around creating networks to support the connections you reference above, in the current implementation, is only done if there are devices in the lesson. If there are devices in the lesson, this conditional passes, and connections are made for devices and utilities alike. So it's not quite that connections aren't supported for non-devices - they are, but only if at least one device is present in the lesson definition.

This can be seen in the logs:

ime="2019-05-26T19:15:24Z" level=debug msg="Scheduler received new request. Sending to handle function." Operation=1 Stage=1 Uuid=101-wfmang0quq89518p
time="2019-05-26T19:15:24Z" level=info msg="Creating namespace: 101-wfmang0quq89518p-ns"
10.32.0.3 - - [26/May/2019:19:15:24 +0000] "POST /exp/livelesson HTTP/1.1" 200 29
time="2019-05-26T19:15:24Z" level=info msg="Created namespace: 101-wfmang0quq89518p-ns"
time="2019-05-26T19:15:24Z" level=error msg="Unable to sync secret into this namespace. Ingress-based resources (like iframes) may not work."
time="2019-05-26T19:15:24Z" level=debug msg="Not creating devices and connections"

In #114 I am getting rid of all of this "utilities vs devices" stuff, so that you just have endpoints, and the intention is that they're all equally "networkable". The logic like what I linked to above, where the type of endpoints in a lesson is checked, doesn't exist in my current branch. Connections are created at the beginning of a lesson stand-up no matter what.

I've added this issue to #114, so that when that gets merged, this issue will close, because it should be addressed at that point. However, another test and additional feedback from you is certainly welcome past that point.

@blinklet
Copy link
Author

Thanks for the update. I am happy to see the direction you are taking. This will give lesson developers more flexibility.

@blinklet
Copy link
Author

Here's an additional observation about how Syringe (or Antidote) assigns IP addresses to interfaces on Utility endpoints connected to devices. In a topology where two utility endpoints are connected to a vqfx device, Antidote assigned the same IP address to the connected interfaces on the two Utility nodes.

I imagine the changes you are making to address this issue will also impact the behavior I observed in this comment.

This could be resolved by allowing users to manually define the configuration of Utility nodes using tools like Ansible.

@Mierdin
Copy link
Member

Mierdin commented Jun 11, 2019

FYI Antidote doesn't assign addresses to any interfaces. eth0 is managed by Weave, and all of the multus-provided interfaces shouldn't get an address automatically. I would like more info about this, but you're probably right, it's not worth getting into until after we merge #114 at a minimum. I'll focus on getting that merged and then we can circle back on this to make sure we're straight.

@Mierdin
Copy link
Member

Mierdin commented Jun 21, 2019

@blinklet In adding docs for Connections, I noticed the two additional networks for this pod when I run kubectl describe have the same IP address in the pod annotation:

kubectl -n=12-abcdefghijkl-ns describe pod vqfx1
Name:               vqfx1
Namespace:          12-abcdefghijkl-ns
Priority:           0
PriorityClassName:  <none>
Node:               antidote-worker-6l0v/10.138.0.7
Start Time:         Sun, 16 Jun 2019 22:16:16 -0700
Labels:             lessonId=12
                    podName=vqfx1
                    syringeManaged=yes
Annotations:        k8s.v1.cni.cncf.io/networks: [{"name":"vqfx1-vqfx2-net"},{"name":"vqfx3-vqfx1-net"}]
                    k8s.v1.cni.cncf.io/networks-status:
                    [{
                        "name": "",
                        "ips": [
                            "10.32.29.138"
                        ],
                        "default": true,
                        "dns": {}
                    },{
                        "name": "12-abcdefghijkl-ns-vqfx1-vqfx2-net",
                        "ips": [
                            "10.10.0.2"
                        ],
                        "dns": {}
                    },{
                        "name": "12-abcdefghijkl-ns-vqfx3-vqfx1-net",
                        "ips": [
                            "10.10.0.2"
                        ],
                        "dns": {}

Is this what you're talking about? If so, it's something we should probably put thought into, but those IP addresses are not enforced. They can be overridden - in fact, we do this with regularity with the vQFX images. I believe @cloudtoad also is setting his IP addresses manually in his container. Also, with the new configuration mechanism, this becomes much easier to do for any Endpoint.

So, please confirm the above for me, and let me know if you're able to do what I described above. Regardless, we should probably set a configuration that doesn't assign any address, to make it clear that connections is pure L2 connectivity. Beyond that, I am particularly interested to make sure that the assertions I made in the previous paragraph are still true, because that's the intention. Please confirm that for me and if it's a problem we'll address it.

@Mierdin
Copy link
Member

Mierdin commented Jul 13, 2019

@blinklet Checking in - have you had a chance to confirm the above? I'd like to make sure I have my facts straight before I dive into a fix.

@Mierdin Mierdin added this to To do in Antidote v0.4.1 via automation Jul 16, 2019
@Mierdin Mierdin removed this from To do in v0.4.0 Jul 16, 2019
@Mierdin
Copy link
Member

Mierdin commented Sep 6, 2019

Had a bit of a chance to look into this myself. First, in terms of being able to set the IP address you want, there's nothing at a Syringe or Kubernetes level that interferes with this. Obviously our vQFX images have been doing this for a long time, but in addition, I made temporary local modifications to the utility image to allow me to make network interface modifications and it works there too:

Screenshot from 2019-09-06 16-44-23

As I alluded to, it all comes down to whether or not you have permissions to do this inside the image itself. The point of the utility image was to be a lightweight container-only solution to showing simple scripts. In the event that an image needs to do lower-level stuff, like reconfiguring network devices, the current approach requires a VM-in-docker, like we're doing with the vQFX. @cloudtoad is working on a long-term plan to help make this suck a little less.

As long as there are permissions for it in the image, the work in v0.4.0 will allow you to perform any needed configurations at runtime for any image type, as it introduced the ability to use Python or Ansible playbooks to perform inter-stage configurations. We've gotten this working on an FRR and Cumulus image that we hope to share soon.

Regarding the CNI configuration shown in the previous post, as I show in the above screenshot, you can use whatever address you want, and there's no enforcement at the outer bridge layer. The plugin needs to allocate something, so it allocated 10.10.0.2 for both interfaces, but this doesn't have to be used. I tried to modify Syringe to omit this, just for aesthetic purposes, and to help keep confusion about how this works down, but it doesn't appear there is a good way to not allocate any address. This is what I used in Syringe:

	networkArgs := fmt.Sprintf(`{
			"name": "%s",
			"type": "antibridge",
			"plugin": "antibridge",
			"bridge": "%s",
			"forceAddress": false,
			"hairpinMode": false,
			"delegate": {
					"hairpinMode": false
			}
		}`, networkName, bridgeName)

And I got this message back:

  Warning  FailedCreatePodSandBox  2s                kubelet, minikube  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "080a0665a257a3512feb673d0e7cee3648bcf5b87d231ea9e489238954c0518f" network for pod "r1": NetworkPlugin cni failed to set up pod "r1_2-7ih8roro92pe44g8-ns" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "antibridge": cannot convert: no valid IP addresses

I suppose we could further modify the antibridge plugin but I was hoping to get away from that, and return to a simpler model that didn't require us to maintain a separate plugin.

In any case, it appears to me that the original purpose of this issue is no more, so I'll close this. If you have any other suggestions or comments, you know where to reach us. Thanks!

@Mierdin Mierdin closed this as completed Sep 6, 2019
Antidote v0.4.1 automation moved this from To do to Done Sep 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants