Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico/VPP Cannot run on servers without AVX512 instruction set. #663

Open
Huxianying opened this issue Dec 1, 2023 · 3 comments
Open

Comments

@Huxianying
Copy link

Environment

  • Calico/VPP version: 3.23
  • Kubernetes version: 1.24.4
  • Deployment type: bare-metal
  • Network configuration: Calico

Issue description
Calico/VPP Cannot run on servers without AVX512 instruction set.

Calico/VPP logs

kubectl logs -n calico-vpp-dataplane   calico-vpp-node-2gk5k 
Defaulted container "vpp" out of: vpp, agent
time="2023-12-01T02:10:51Z" level=info msg="-- Environment --"
time="2023-12-01T02:10:51Z" level=info msg="CorePattern:         /var/lib/vpp/vppcore.%e.%p"
time="2023-12-01T02:10:51Z" level=info msg="ExtraAddrCount:      0"
time="2023-12-01T02:10:51Z" level=info msg="RxMode:              adaptive"
time="2023-12-01T02:10:51Z" level=info msg="TapRxMode:           adaptive"
time="2023-12-01T02:10:51Z" level=info msg="Tap MTU override:    0"
time="2023-12-01T02:10:51Z" level=info msg="Service CIDRs:       [10.96.0.0/12]"
time="2023-12-01T02:10:51Z" level=info msg="Tap Queue Size:      rx:1024 tx:1024"
time="2023-12-01T02:10:51Z" level=info msg="PHY Queue Size:      rx:1024 tx:1024"
time="2023-12-01T02:10:51Z" level=info msg="Hugepages            16"
time="2023-12-01T02:10:51Z" level=info msg="KernelVersion        5.15.0-73"
time="2023-12-01T02:10:51Z" level=info msg="Drivers              map[uio_pci_generic:%!s(bool=false) vfio-pci:%!s(bool=true)]"
time="2023-12-01T02:10:51Z" level=info msg="vfio iommu:          false"
time="2023-12-01T02:10:51Z" level=info msg="-- Interface Spec --"
time="2023-12-01T02:10:51Z" level=info msg="Interface Name:      ens8"
time="2023-12-01T02:10:51Z" level=info msg="Native Driver:       dpdk"
time="2023-12-01T02:10:51Z" level=info msg="vppIpConfSource:     linux"
time="2023-12-01T02:10:51Z" level=info msg="New Drive Name:      "
time="2023-12-01T02:10:51Z" level=info msg="PHY target #Queues   rx:1 tx:1"
time="2023-12-01T02:10:51Z" level=info msg="-- Interface config --"
time="2023-12-01T02:10:51Z" level=info msg="Node IP4:            192.168.3.7/24"
time="2023-12-01T02:10:51Z" level=info msg="Node IP6:            "
time="2023-12-01T02:10:51Z" level=info msg="PciId:               0000:49:00.0"
time="2023-12-01T02:10:51Z" level=info msg="Driver:              ice"
time="2023-12-01T02:10:51Z" level=info msg="Linux IF was up ?    true"
time="2023-12-01T02:10:51Z" level=info msg="Promisc was on ?     false"
time="2023-12-01T02:10:51Z" level=info msg="DoSwapDriver:        false"
time="2023-12-01T02:10:51Z" level=info msg="Mac:                 40:a6:b7:9e:e1:90"
time="2023-12-01T02:10:51Z" level=info msg="Addresses:           [192.168.3.7/24 ens8,fe80::42a6:b7ff:fe9e:e190/64]"
time="2023-12-01T02:10:51Z" level=info msg="Routes:              [{Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}, {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}]"
time="2023-12-01T02:10:51Z" level=info msg="PHY original #Queues rx:288 tx:288"
time="2023-12-01T02:10:51Z" level=info msg="MTU                  1500"
time="2023-12-01T02:10:51Z" level=info msg="isTunTap             false"
time="2023-12-01T02:10:51Z" level=info msg="isVeth               false"
time="2023-12-01T02:10:51Z" level=info msg="Running with uplink dpdk"
time="2023-12-01T02:10:51Z" level=info msg="deleting Route {Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:10:51Z" level=info msg="deleting Route {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:10:51Z" level=info msg="VPP started [PID 3681601]"
time="2023-12-01T02:10:51Z" level=info msg="Waiting for VPP... [0/10]"
/usr/bin/vpp[3681601]: tls_init_ca_chain:609: Could not initialize TLS CA certificates
/usr/bin/vpp[3681601]: tls_mbedtls_init:644: failed to initialize TLS CA chain
/usr/bin/vpp[3681601]: tls_init_ca_chain:976: Could not initialize TLS CA certificates
/usr/bin/vpp[3681601]: tls_openssl_init:1050: failed to initialize TLS CA chain
time="2023-12-01T02:10:53Z" level=info msg="Waiting for VPP... [1/10]"
time="2023-12-01T02:10:55Z" level=info msg="Waiting for VPP... [2/10]"
time="2023-12-01T02:10:57Z" level=info msg="Waiting for VPP... [3/10]"
time="2023-12-01T02:10:59Z" level=info msg="Waiting for VPP... [4/10]"
time="2023-12-01T02:11:01Z" level=warning msg="Waiting for VPP... [5/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:02Z" level=info msg="Received signal child exited, vpp index 1"
time="2023-12-01T02:11:02Z" level=info msg="VPP exited:true status:0 signaled:false"
time="2023-12-01T02:11:02Z" level=info msg="Done with signal child exited"
time="2023-12-01T02:11:03Z" level=warning msg="Waiting for VPP... [6/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:05Z" level=warning msg="Waiting for VPP... [7/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:07Z" level=warning msg="Waiting for VPP... [8/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:09Z" level=warning msg="Waiting for VPP... [9/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:11Z" level=error msg="Error connecting to VPP (SIGINT -1): Cannot connect to VPP after 10 tries"
time="2023-12-01T02:11:11Z" level=info msg="Terminating Vpp 1 (SIGINT)"
time="2023-12-01T02:11:11Z" level=info msg="Restoring configuration"
time="2023-12-01T02:11:11Z" level=info msg="Received signal interrupt, vpp index 1"
time="2023-12-01T02:11:11Z" level=info msg="Signaled vpp (PID -1) interrupt"
time="2023-12-01T02:11:11Z" level=info msg="Done with signal interrupt"
Using systemctl
Using systemd-networkd
time="2023-12-01T02:11:13Z" level=info msg="restoring address 192.168.3.7/24 ens8"
time="2023-12-01T02:11:13Z" level=info msg="restoring address fe80::42a6:b7ff:fe9e:e190/64"
time="2023-12-01T02:11:13Z" level=info msg="restoring route {Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:11:13Z" level=info msg="restoring routes : {Ifindex: 17 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} already exists"
time="2023-12-01T02:11:13Z" level=info msg="restoring route {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:11:13Z" level=info msg="restoring routes : {Ifindex: 17 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254} already exists"
time="2023-12-01T02:11:13Z" level=info msg="calico-vpp-pid file doesn't exist. Agent probably not started"
time="2023-12-01T02:11:13Z" level=info msg="Timeout : SIGKILL vpp 1"
time="2023-12-01T02:11:13Z" level=info msg="Received signal killed, vpp index 1"
time="2023-12-01T02:11:13Z" level=info msg="Signaled vpp (PID -1) killed"
time="2023-12-01T02:11:13Z" level=info msg="Done with signal killed"
time="2023-12-01T02:11:14Z" level=error msg="VPP run failed with Error running VPP: cannot connect to VPP after 10 tries"
kubectl logs -n calico-vpp-dataplane   calico-vpp-node-j6l7s -c agent
2023/12/01 02:53:44 File Content:

2023/12/01 02:53:44 Error reading file:%!(EXTRA *fs.PathError=open /var/run/vpp/vppmanagerlinuxmtu: no such file or directory)
time="2023-12-01T02:54:04Z" level=fatal msg="Error loading configuration: Vpp-host mtu not ready after 20 tries"

I dived into Calico/VPP code find the reason is VPP cannot start up, so I guess VPP cannot start up on server without AVX512 instruction set.

@onong
Copy link
Collaborator

onong commented Dec 6, 2023

Hi @Huxianying,

How did you conclude that VPP failed to start due to absence of AVX512 instruction set?
Could you share details of the bare metal server that you are using?
Lastly, you seem to be using an older version of Calico/VPP.

@Huxianying
Copy link
Author

I deployed VPP(version 22.02, which same with Calico/VPP 3.23 VPP version) on a bare-metal without AVX512 instruction set support, and I noticed three differences in the logs compared to a bare-metal with AVX512 support.
with AVX512 support:
image
without AVX512 support:
image
we can see from above two pictures:

  1. intel_uncore_init: no uncore units found
  2. topdown-level2: not supported
  3. memory-stalls: not supported

And i noticed in VPP code:
image
image

So i guess that it is possible that topdown and memory utilizes avx512 instruction set.

@onong
Copy link
Collaborator

onong commented Dec 7, 2023

So i guess that it is possible that topdown and memory utilizes avx512 instruction set.

Yes, going by the logs it certainly seems that way but I am not sure if that is reason enough for vpp to fail to start. Could you share the entire vpp logs? Are you able to reproduce it consistently on that server? Also, could you share the bare-metal server details so I can try and reproduce?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants