Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Error (lxd) #9

Closed
bodleytunes opened this issue Nov 27, 2017 · 6 comments
Closed

Spark Error (lxd) #9

bodleytunes opened this issue Nov 27, 2017 · 6 comments

Comments

@bodleytunes
Copy link

Getting an error:
cannot find interface for ifIndex: 5
This is running in an LXD container ubuntu 17.04

Environment

  • latest
  • ubuntu-17.04
    ...

Privileged LXD Container, doesn't seem to have this error in my Debian 8 bare metal machine.

originally thought it may have been down to not being privileged but it turns out I already set it up privileged.

What's the actual result?

Nov 27 22:05:38 vrf-zerotier openr[21313]: E1127 22:05:38.952653 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:41 vrf-zerotier openr[21313]: E1127 22:05:41.591500 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:44 vrf-zerotier openr[21313]: E1127 22:05:44.714031 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:45 vrf-zerotier openr[21313]: I1127 22:05:45.090761 21313 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB
Nov 27 22:05:45 vrf-zerotier openr[21313]: I1127 22:05:45.091567 21313 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber
Nov 27 22:05:45 vrf-zerotier openr[21313]: I1127 22:05:45.983922 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:05:47 vrf-zerotier openr[21313]: E1127 22:05:47.765833 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:50 vrf-zerotier openr[21313]: E1127 22:05:50.468478 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:53 vrf-zerotier openr[21313]: E1127 22:05:53.111532 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:55 vrf-zerotier openr[21313]: I1127 22:05:55.988886 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:05:56 vrf-zerotier openr[21313]: E1127 22:05:56.203078 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:05:59 vrf-zerotier openr[21313]: E1127 22:05:59.222036 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:02 vrf-zerotier openr[21313]: E1127 22:06:02.518582 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:05 vrf-zerotier openr[21313]: I1127 22:06:05.097883 21313 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB
Nov 27 22:06:05 vrf-zerotier openr[21313]: I1127 22:06:05.098803 21313 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber
Nov 27 22:06:05 vrf-zerotier openr[21313]: E1127 22:06:05.650622 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:05 vrf-zerotier openr[21313]: I1127 22:06:05.991883 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:09 vrf-zerotier openr[21313]: E1127 22:06:09.238982 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:12 vrf-zerotier openr[21313]: E1127 22:06:12.597512 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:15 vrf-zerotier openr[21313]: E1127 22:06:15.663316 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:15 vrf-zerotier openr[21313]: I1127 22:06:15.996842 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:18 vrf-zerotier openr[21313]: E1127 22:06:18.240803 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:20 vrf-zerotier openr[21313]: E1127 22:06:20.769973 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:23 vrf-zerotier openr[21313]: E1127 22:06:23.798859 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:25 vrf-zerotier openr[21313]: I1127 22:06:25.052012 21343 ThreadManager.tcc:374] ThreadManager::add called with numa == true, but not a NumaThreadManager
Nov 27 22:06:25 vrf-zerotier openr[21313]: I1127 22:06:25.106479 21313 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB
Nov 27 22:06:25 vrf-zerotier openr[21313]: I1127 22:06:25.107245 21313 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber
Nov 27 22:06:25 vrf-zerotier openr[21313]: I1127 22:06:25.999750 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:26 vrf-zerotier openr[21313]: E1127 22:06:26.494338 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:29 vrf-zerotier openr[21313]: E1127 22:06:29.581921 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:32 vrf-zerotier openr[21313]: E1127 22:06:32.383736 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:35 vrf-zerotier openr[21313]: E1127 22:06:35.601126 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:36 vrf-zerotier openr[21313]: I1127 22:06:36.003907 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:38 vrf-zerotier openr[21313]: E1127 22:06:38.084923 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:41 vrf-zerotier openr[21313]: E1127 22:06:41.501282 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:44 vrf-zerotier openr[21313]: E1127 22:06:44.955479 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:45 vrf-zerotier openr[21313]: I1127 22:06:45.110173 21313 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB
Nov 27 22:06:45 vrf-zerotier openr[21313]: I1127 22:06:45.111088 21313 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber
Nov 27 22:06:46 vrf-zerotier openr[21313]: I1127 22:06:46.007829 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:47 vrf-zerotier openr[21313]: E1127 22:06:47.754106 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:50 vrf-zerotier openr[21313]: E1127 22:06:50.328872 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:53 vrf-zerotier openr[21313]: E1127 22:06:53.622691 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:56 vrf-zerotier openr[21313]: I1127 22:06:56.012415 21348 LinkMonitor.cpp:472] InterfaceDb Sync is successful
Nov 27 22:06:56 vrf-zerotier openr[21313]: E1127 22:06:56.818434 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5
Nov 27 22:06:59 vrf-zerotier openr[21313]: E1127 22:06:59.702836 21347 Spark.cpp:791] Cannot find interface for ifIndex: 5

@saifhhasan
Copy link
Contributor

Can you give output for following commands ?

ip addr
breeze lm links --all

Also update issue title to describe Spark Error.

One more side question for my education. Does this error harming you from your testing in anyway ?

@bodleytunes bodleytunes changed the title running in container (lxd) Spark Error (lxd) Nov 28, 2017
@bodleytunes
Copy link
Author

`1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: gre0@NONE: mtu 1476 qdisc noop state DOWN group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
3: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
4: ip_vti0@NONE: mtu 1332 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
5: zt0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether 1e:97:fc:54:40:fd brd ff:ff:ff:ff:ff:ff
inet 10.55.0.10/24 brd 10.55.0.255 scope global zt0
valid_lft forever preferred_lft forever
inet6 fc7b:5e01:5e9f:7538:3f4::1/40 scope global
valid_lft forever preferred_lft forever
inet6 fe80::1c97:fcff:fe54:40fd/64 scope link
valid_lft forever preferred_lft forever
45: eth0@if46: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:16:3e:77:22:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.10.99.164/24 brd 10.10.99.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::216:3eff:fe77:22ad/64 scope link
valid_lft forever preferred_lft forever
root@vrf-zerotier:~# breeze lm links --all

Interface Status Overloaded Metric Override ifIndex Addresses


eth0 Up 45 10.10.99.164
fe80::216:3eff:fe77:22ad
`

@bodleytunes
Copy link
Author

Also:

`#!/bin/bash

Copyright (c) 2014-present, Facebook, Inc.

This source code is licensed under the MIT license found in the

LICENSE file in the root directory of this source tree.

Ports 60000 - 60100 needs to be reserved in system so that they are always

free for openr to use. On linux you can do following to reserve.

> sysctl -w net.ipv4.ip_local_reserved_ports=60000-60100

Default OpenR configuration

Override the ones you need in /etc/sysconfig/openr for custom configuration

on the node

OpenR binary path or command name present on bin paths

OPENR=openr

Domain. Nodes will not form adjacencies to other nodes with different domain.

One node can be part of only one domain. Useful when want to run OpenR on

adjacent nodes but don't want to form adjacencies with them.

DOMAIN=wizznet.co.uk

List of comma separated list of prefixes to announce

e.g. "face:cafe::1/128,face:b00c::/64"

PREFIXES="10.10.0.0/16"

Used to assign elected address if prefix allocator is enabled

LOOPBACK_IFACE="lo,zt0,eth0"

Announce all global addresses of interfaces into the network

REDISTRIBUTE_IFACES="zt0,eth0"

dryrun => Do not program routes in dryrun mode

DRYRUN=false

Enable RTT metric on links. RTTs are computed dynamically and then used as

cost for links. If disabled then hop count will be used as a cost of path

ENABLE_RTT_METRIC=true

Enable v4

ENABLE_V4=true

Enable health-checker

ENABLE_HEALTH_CHECKER=yes
HEALTH_CHECKER_PING_INTERVAL_S=3

Interface prefixes to perform neighbor discovery on. All interfaces whose

names start with these are used for neighbor discovery

IFACE_PREFIXES="eth0"

Logging verbosity

VERBOSITY=1

PrefixAllocator parameter

ENABLE_PREFIX_ALLOC=false # Enable automatic election of prefixes for nodes
SEED_PREFIX="" # Master prefix to allocate subprefixes of nodes
ALLOC_PREFIX_LEN=128 # Length of allocated prefix
SET_LOOPBACK_ADDR=false # Assign choosen prefix for node on loopback
# with prefix length as /128 (and address being first
# in elected network block
OVERRIDE_LOOPBACK_ADDR=false # Overrides other existing loopback addresses on
# loopback interface

Spark Configuration

How long to keep adjacency without hearing hello from neighbor

SPARK_HOLD_TIME_S=30

How often to send hello packeks to neighbors

SPARK_KEEPALIVE_TIME_S=3

How fast to perform initial neighbor discovery

SPARK_FASTINIT_KEEPALIVE_TIME_MS=100

Enable in build Fib service handler

ENABLE_NETLINK_FIB_HANDLER=yes
FIB_HANDLER_PORT=60100

Enable in built System service handler

ENABLE_NETLINK_SYSTEM_HANDLER=true

set decision debounce time

DECISION_DEBOUNCE_MIN_MS=10
DECISION_DEBOUNCE_MAX_MS=250

enable performance measurements

ENABLE_PERF_MEASUREMENT=true
`

@bodleytunes
Copy link
Author

Ah new messages if I put the zt0 and eth0 in :

Announce all global addresses of interfaces into the network

REDISTRIBUTE_IFACES="zt0,eth0"

`Interface Status Overloaded Metric Override ifIndex Addresses


eth0 Up 45 10.10.99.164
fe80::216:3eff:fe77:22ad
zt0 Up 5 10.55.0.10
fe80::1c97:fcff:fe54:40fd`

Nov 28 08:35:55 vrf-zerotier openr[21997]: I1128 08:35:55.385100 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:35:58 vrf-zerotier openr[21997]: I1128 08:35:58.141927 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:00 vrf-zerotier openr[21997]: I1128 08:36:00.941509 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:03 vrf-zerotier openr[21997]: I1128 08:36:03.548934 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:05 vrf-zerotier openr[21997]: I1128 08:36:05.378876 21997 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB Nov 28 08:36:05 vrf-zerotier openr[21997]: I1128 08:36:05.379459 21997 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber Nov 28 08:36:05 vrf-zerotier openr[21997]: I1128 08:36:05.387539 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:06 vrf-zerotier openr[21997]: I1128 08:36:06.400626 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:09 vrf-zerotier openr[21997]: I1128 08:36:09.269106 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:12 vrf-zerotier openr[21997]: I1128 08:36:12.756955 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:15 vrf-zerotier openr[21997]: I1128 08:36:15.389602 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:15 vrf-zerotier openr[21997]: I1128 08:36:15.735888 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:18 vrf-zerotier openr[21997]: I1128 08:36:18.724366 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:20 vrf-zerotier openr[21997]: I1128 08:36:20.114837 22032 LinkMonitor.cpp:446] LinkMonitor: processing LinkMonitor command Nov 28 08:36:20 vrf-zerotier openr[21997]: I1128 08:36:20.115170 22032 LinkMonitor.cpp:1234] Dump Links requested, replying with 2 links Nov 28 08:36:22 vrf-zerotier openr[21997]: I1128 08:36:22.190623 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:24 vrf-zerotier openr[21997]: I1128 08:36:24.861799 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:25 vrf-zerotier openr[21997]: I1128 08:36:25.386286 21997 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB Nov 28 08:36:25 vrf-zerotier openr[21997]: I1128 08:36:25.387398 21997 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber Nov 28 08:36:25 vrf-zerotier openr[21997]: I1128 08:36:25.392123 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:27 vrf-zerotier openr[21997]: I1128 08:36:27.490969 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:30 vrf-zerotier openr[21997]: I1128 08:36:30.279563 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:33 vrf-zerotier openr[21997]: I1128 08:36:33.341143 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:35 vrf-zerotier openr[21997]: I1128 08:36:35.397189 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:36 vrf-zerotier openr[21997]: I1128 08:36:36.492441 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:39 vrf-zerotier openr[21997]: I1128 08:36:39.911485 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:42 vrf-zerotier openr[21997]: I1128 08:36:42.569042 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:45 vrf-zerotier openr[21997]: I1128 08:36:45.334821 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:45 vrf-zerotier openr[21997]: I1128 08:36:45.391568 21997 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB Nov 28 08:36:45 vrf-zerotier openr[21997]: I1128 08:36:45.392102 21997 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber Nov 28 08:36:45 vrf-zerotier openr[21997]: I1128 08:36:45.399906 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:47 vrf-zerotier openr[21997]: I1128 08:36:47.998008 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:50 vrf-zerotier openr[21997]: I1128 08:36:50.925370 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:53 vrf-zerotier openr[21997]: I1128 08:36:53.450906 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:55 vrf-zerotier openr[21997]: I1128 08:36:55.404263 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:36:56 vrf-zerotier openr[21997]: I1128 08:36:56.798099 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:36:59 vrf-zerotier openr[21997]: I1128 08:36:59.701334 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:02 vrf-zerotier openr[21997]: I1128 08:37:02.531883 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:05 vrf-zerotier openr[21997]: I1128 08:37:05.314360 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:05 vrf-zerotier openr[21997]: I1128 08:37:05.392982 21997 NetlinkSystemHandler.cpp:178] Re-syncing Netlink DB Nov 28 08:37:05 vrf-zerotier openr[21997]: I1128 08:37:05.393541 21997 NetlinkSystemHandler.cpp:180] Completed re-syncing Netlink DB from Netlink Subscriber Nov 28 08:37:05 vrf-zerotier openr[21997]: I1128 08:37:05.407763 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful Nov 28 08:37:08 vrf-zerotier openr[21997]: I1128 08:37:08.071864 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:11 vrf-zerotier openr[21997]: I1128 08:37:11.441593 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:13 vrf-zerotier openr[21997]: I1128 08:37:13.403115 22030 ThreadManager.tcc:374] ThreadManager::add called with numa == true, but not a NumaThreadManager Nov 28 08:37:15 vrf-zerotier openr[21997]: I1128 08:37:15.025465 22020 Spark.cpp:990] Neighbor hub2-proxhub2-prox.i.wizznet.co.uk on iface zt0 from iface zt0 has not heard from us yet Nov 28 08:37:15 vrf-zerotier openr[21997]: I1128 08:37:15.412070 22032 LinkMonitor.cpp:472] InterfaceDb Sync is successful

Could this be local firewall related on the other side?

I do have iptables running but I thought I opened the correct ports. 60,000 : 60100 tcp

@bodleytunes
Copy link
Author

yes it was the firewall. Works when I turn it off

Can you confirm if these should be the correct ports?

image

@saifhhasan
Copy link
Contributor

You also need to open firewall for UDP port 6666 on which neighbor discovery is performed by Spark module. Can you give it a try.

You can ignore the spark error. The error was happening because

  • Spark is receiving all multicast packets on port 6666
  • But Spark doesn't know about interface zt0 (with ifIndex=5). Once you set IFACE_PREFIXES=eth,zt then spark will learn about zt0

Note that REDISTRIBUTE_IFACES is for announcing interface addresses into the network.

Closing the task as error is resolved. If you have more comment or questions regarding usage of OpenR (and not code related issue) you can discuss it over Facebook Group - https://www.facebook.com/groups/openr/

facebook-github-bot pushed a commit that referenced this issue Feb 28, 2020
Summary:
There is a race condition when KvStoreWrapper and ZMQ background threads are
destroyed. This diff is an attempt to explicitly destroy KvStoreWrapper in TearDown

Also use unique_ptr instead of shared_ptr

```
WARNING: ThreadSanitizer: data race (pid=49594)
  Write of size 8 at 0x7ba000000220 by thread T1:
    #0 close <null> (link_monitor_test+0xdb1f6a)
    #1 zmq::signaler_t::~signaler_t() /home/engshare/third-party2/zeromq/4.3.1/src/zeromq-4.3.1/src/signaler.cpp:114:20 (link_monitor_test+0xbd1274)

  Previous read of size 8 at 0x7ba000000220 by main thread:
    #0 epoll_ctl <null> (link_monitor_test+0xd9bdef)
    #1 epoll_del /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/epoll.c:485 (link_monitor_test+0xec4253)
    #2 folly::EventBaseEvent::eb_event_del() <null> (link_monitor_test+0xa747ef)
    #3 folly::EventHandler::~EventHandler() <null> (link_monitor_test+0xa74a70)
    #4 std::_Hashtable<int, std::pair<int const, openr::OpenrEventBase::ZmqEventHandler>, std::allocator<std::pair<int const, openr::OpenrEventBase::ZmqEventHandler> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable() <null> (link_monitor_test+0x219620)
    #5 openr::OpenrEventBase::~OpenrEventBase() <null> (link_monitor_test+0x212d8d)
    #6 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbb7)
    #7 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbe9)
    #8 openr::KvStoreWrapper::~KvStoreWrapper() <null> (link_monitor_test+0x117533)
    #9 std::_Sp_counted_ptr_inplace<openr::KvStoreWrapper, std::allocator<openr::KvStoreWrapper>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() <null> (link_monitor_test+0x116fbd)
    #10 LinkMonitorTestFixture::~LinkMonitorTestFixture() <null> (link_monitor_test+0xf8285)
    #11 LinkMonitorTestFixture_BasicOperation_Test::~LinkMonitorTestFixture_BasicOperation_Test() <null> (link_monitor_test+0xf7ee9)
    #12 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (link_monitor_test+0xc4b2ae)
    #13 __libc_start_main /home/engshare/third-party2/glibc/2.26/src/glibc-2.26/csu/libc-start.c:308:16 (libc.so.6+0x211a5)
```

Reviewed By: yi-xian

Differential Revision: D20148667

fbshipit-source-id: 71634c08df8ebbc98a6b8c4aa3329166834453f2
facebook-github-bot pushed a commit that referenced this issue Oct 12, 2020
Summary:
As titled, several different approaches have been added to address original issue of crash when eventbase is destructed. See P135204939

```
(gdb) bt
#0  0x000000000187ea8e in re2::RE2::Set::Match (this=0x2, text=..., v=0x7fffd8c0d500, error_info=0x0) at re2/set.cc:110
#1  0x000000000072d304 in openr::KeyPrefix::keyMatch (this=<optimized out>, key=...) at openr/common/Util.cpp:50
#2  0x00000000006d3649 in openr::KvStoreFilters::keyMatch (this=0x7fb22c613660, key=..., value=...) at openr/kvstore/KvStore.cpp:69
#3  0x00000000006d60e4 in openr::KvStore::mergeKeyValues (kvStore=..., keyVals=..., filters=...) at openr/kvstore/KvStore.cpp:246
#4  0x00000000006e3813 in openr::KvStoreDb::mergePublication (this=0x7fb232fbad28, rcvdPublication=..., senderId=...)
    at openr/kvstore/KvStore.cpp:2835
#5  0x00000000006e32d7 in openr::KvStoreDb::processThriftSuccess (this=0x7fb232fbad28, peerName=..., pub=..., timeDelta=...)
    at openr/kvstore/KvStore.cpp:1395
#6  0x000000000070487b in openr::KvStoreDb::requestThriftPeerSync()::$_19::operator()(openr::thrift::Publication&&) const (
    this=<optimized out>, pub=...) at openr/kvstore/KvStore.cpp:1338
#7  folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}::operator()() const (this=<optimized out>)
    at folly/futures/Future-inl.h:99
#8  folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}) (fn=...) at folly/futures/Future-inl.h:91
#9  folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&) (t=..., f=...) at folly/futures/Future-inl.h:109
#10 folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, t=...)
    at folly/futures/Future-inl.h:1033
#11 folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}>::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<openr::thrift::Publication> >(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, args=..., args=...)
    at folly/futures/Future-inl.h:145
...
#19 0x00000000015d00ee in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7fb22c63b1f0) at folly/Function.h:416
#20 folly::EventBase::FunctionLoopCallback::runLoopCallback (this=0x7fb22c63b1c0) at folly/io/async/EventBase.h:188
#21 folly::EventBase::runLoopCallbacks (this=<optimized out>) at folly/io/async/EventBase.cpp:703
#22 folly::EventBase::loopBody (this=0x7fb23ecb9410, flags=1, ignoreKeepAlive=false) at folly/io/async/EventBase.cpp:402
#23 0x00000000015cdc60 in folly::EventBase::loopOnce (this=0x7fb23ecb9410, flags=0) at folly/io/async/EventBase.cpp:330
#24 folly::EventBase::~EventBase (this=0x7fb23ecb9410, vtt=<optimized out>) at folly/io/async/EventBase.cpp:211
#25 0x00000000006f284e in openr::KvStore::~KvStore (this=0x7fb23ecb9400) at openr/kvstore/KvStore.h:532
```

To fix this, we introduced map to hold every individual future from thrift client. However, this occasionally will make KvStore destruction stuck when waiting for all futures to be fulfilled.

We should NOT track every individual future, which is NOT necessary at all.

From the crash trace, clearly, we are doing `mergePublications()` when invoking `processThriftPublication()` before checking if the peer is valid or NOT.

Fix:
Ignore the rest logic of callback `processThriftPublication()` if peerName is NOT valid.

Reviewed By: saifhhasan

Differential Revision: D24262710

fbshipit-source-id: fa69aaa5c6e43cfc861de7431b9c1e26195684a0
facebook-github-bot pushed a commit that referenced this issue Jun 23, 2021
Summary: Fix all rest pyre #9 in the codebase.

Reviewed By: TangoRoxy

Differential Revision: D29317017

fbshipit-source-id: b630ff9a87e7daa9980d0a7fbfaf31a563b6c522
facebook-github-bot pushed a commit that referenced this issue Jul 26, 2023
…_ destroyed

Summary:
# background
Open/R has been haunted with unclean exit issue carried over multiple oncall rotations. Especially the notorious one in `KvStore`, like the following gdb traces happened inside T158806075
```
[xiangxu1121@devvm1867.nao0 ~/local/fbsource (470cb762a)]$ fboss-dbg-helper gdb fsw008.p062.f01.rva2
0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58
Enter the file's number: 0
0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dogpile.unknown_tw_task.servicerouter.1583324.QUEUE_LAG.230715-025056 - Sat Jul 15 02:50:56 2023
1) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dynocat - Fri Jul 21 15:05:59 2023
2) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr - Wed Jul 19 09:07:29 2023Enter the file's number: 2
INFO:root:Received output from file command:
INFO:root:/home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/openr --v=1 --vmodule=BgpSerializer*=1,FiberBgp*=2 --logging=DBG1;def', real uid: 36662, effective uid: 36662, real gid: 36337, effective gid: 36337, execfn: '/usr/sbin/openr', platform: 'x86_64'
(gdb) bt
#0  0x00000000035aa5b4 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/hashtable_policy.h:431
#1  0x00000000035baf56 in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::processThriftFailure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, folly::basic_fbstring<char, std::char_traits<char>, std::allocator<char>, folly::fbstring_core<char> > const&, std::chrono::duration<long, std::ratio<1l, 1000l> >) ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unordered_map.h:869
#2  0x00000000035c4fac in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::requestThriftPeerSync()::{lambda(folly::exception_wrapper const&)#1}::operator()(folly::exception_wrapper const&) const ()
    at fbcode/openr/kvstore/KvStore-inl.h:1753
#3  0x00000000035c4d0c in _ZZNO5folly6FutureINS_4UnitEE13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINS4_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKNS_17exception_wrapperEE_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISD_EEEE5valueES2_E4typeEOSK_NS_7futures6detail18InlineContinuationEENUlONS_8Executor9KeepAliveISU_EEONS_3TryIS1_EEE_clESX_S10_ () at fbcode/folly/futures/Future-inl.h:137
#4  0x00000000035c4bea in _ZN5folly6detail8function14FunctionTraitsIFvRNS_7futures6detail8CoreBaseEONS_8Executor9KeepAliveIS7_EEPNS_17exception_wrapperEEE7callBigIZNS4_4CoreINS_4UnitEE11setCallbackIZNOS_6FutureISH_E13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINSN_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKSB_E_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISB_EEEE5valueESL_E4typeEOS12_NS4_18InlineContinuationEEUlSA_ONS_3TryISH_EEE_EEvS18_OSt10shared_ptrINS_14RequestContextEES19_EUlS6_SA_SC_E_EEvS6_SA_SC_RNS1_4DataE () at fbcode/folly/futures/detail/Core.h:619
warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/futures/detail/__core__/__objects__/Core.cpp.o(0xb8563f2cc6214b1a) referenced by CU at offset 0x10ac73 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo]
#5  0x0000000002408dc9 in folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0::operator()(folly::Executor::KeepAlive<folly::Executor>&&) () at fbcode/folly/Function.h:375
warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/io/async/__async_base__/__objects__/EventBase.cpp.o(0x7990ef623e52da16) referenced by CU at offset 0x1029d6 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo]
#6  0x000000000266571d in folly::EventBase::loopMain(int, bool) () at fbcode/folly/Function.h:375
#7  0x0000000002d20cc3 in folly::EventBase::loopOnce(int) () at fbcode/folly/io/async/EventBase.cpp:345
#8  0x0000000002542c12 in folly::EventBase::~EventBase() () at fbcode/folly/io/async/EventBase.cpp:195
#9  0x00000000035a420e in openr::KvStore<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::~KvStore() ()
    at fbcode/openr/kvstore/KvStore.h:629
#10 0x0000000003585a9e in main ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unique_ptr.h:85
```
# RCA
High level speaking, the issue happens with the following order:
- Step1: `KvStoreDb::stop()` destructed `thriftPeers_` obj
- Step2: `KvStore::stop()` invokes `OpenrEventBase::stop()`, which destructs `folly::EventBase`, which will call `loopOnce` to clean up injected callbacks in the eventbase.
- Step3: In crash scenario, there are >1 callbacks being invoked(e.g. `processThriftSuccess` or `processThriftFailure`).
- Step4: Sample failures are:
   - `processThriftSuccess` P576623566
   - `processThriftFailure` P592005831
We can see there are callbacks being invoked with:
   - `requestThriftPeerSync`
   - `keepAlive`

# Fix
The fix should be straight-forward, which is to cancel the scheduled cb from `folly::AsyncTimeout` perspective via `reset()` or `cancelTimeout()` call.

NOTE: for the keepAlive `AsyncTimeout`, we can consider remove it by leveraging the socket keepalive option and simplify this logic.

Reviewed By: TangoRoxy

Differential Revision: D47691545

fbshipit-source-id: 67108399a7ce9f96d08682d45d03799e6ef608fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants