Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid CIDR #23

Closed
mabunixda opened this issue Jan 15, 2019 · 7 comments
Closed

invalid CIDR #23

mabunixda opened this issue Jan 15, 2019 · 7 comments

Comments

@mabunixda
Copy link

mabunixda commented Jan 15, 2019

Hi,

i tested stellar on 2 VMs before going on baremetal. Everything fine - now on baremetal i get the following error log

$ stellar config --nic enp3s0 > stellar.config
$ stellar -D server --config ./stellar.config
DEBU[0000] seed peers seedPeers="[]"
DEBU[0000] getPeersFromCache: []
DEBU[0000] cluster peers peers="[]" seed_peers="[]"
INFO[0000] registered service id=stellar.services.version.v1
INFO[0000] registered service id=stellar.services.node.v1
INFO[0000] registered service id=stellar.services.health.v1
INFO[0000] registered service id=stellar.services.cluster.v1
INFO[0000] registered service id=stellar.services.datastore.v1
INFO[0000] registered service id=stellar.services.gateway.v1
INFO[0000] registered service id=stellar.services.network.v1
INFO[0000] registered service id=stellar.services.application.v1
INFO[0000] registered service id=stellar.services.nameserver.v1
INFO[0000] registered service id=stellar.services.proxy.v1
INFO[0000] registered service id=stellar.services.events.v1
DEBU[0000] starting server agent
DEBU[0000] starting grpc server addr="172.16.0.6:9000"
DEBU[0000] initializing server
DEBU[0000] network init
DEBU[0000] allocating network subnet for node y
DEBU[0000] service.network allocating subnet
FATA[0000] invalid CIDR address:

The configfile sounds similar:

{
"NodeID": "y",
"GRPCAddress": "172.16.0.6:9000",
"TLSServerCertificate": "",
"TLSServerKey": "",
"TLSClientCertificate": "",
"TLSClientKey": "",
"TLSInsecureSkipVerify": false,
"ContainerdAddr": "/run/containerd/containerd.sock",
"Namespace": "default",
"DataDir": "/var/lib/stellar",
"StateDir": "/run/stellar",
"Bridge": "stellar0",
"UpstreamDNSAddr": "8.8.8.8:53",
"ProxyHTTPPort": 80,
"ProxyHTTPSPort": 443,
"ProxyTLSEmail": "",
"GatewayAddress": "172.16.0.6:9001",
"EventsAddress": "172.16.0.6:4222",
"EventsClusterAddress": "172.16.0.6:5222",
"EventsHTTPAddress": "172.16.0.6:4322",
"CNIBinPaths": [
"/opt/containerd/bin",
"/opt/cni/bin"
],
"ConnectionType": "local",
"ClusterAddress": "172.16.0.6:7946",
"AdvertiseAddress": "172.16.0.6:7946",
"Debug": false,
"Peers": [],
"Subnet": "172.16.0.0/12",
"ProxyHealthcheckInterval": "5s"
}

I tried to modify the subnet, reviewed the code where the output comes from but i cannot get the cause of this failure :-(

Thanks, Martin

@ehazlett
Copy link
Owner

Hmm ya everything looks OK in the config. What does ip a s show for your network devices?

@mabunixda
Copy link
Author

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master br0 state DOWN group default qlen 1000
link/ether 00:01:2e:78:2b:e2 brd ff:ff:ff:ff:ff:ff
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:01:2e:78:2b:e3 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.6/24 brd 172.16.0.255 scope global enp3s0
valid_lft forever preferred_lft forever
inet6 fe80::201:2eff:fe78:2be3/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:bc:c6:52:c8 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever

@ehazlett
Copy link
Owner

I'm trying to re-create this. In the meantime I've pushed some more debug logging around the subnet allocation. Can you try the latest master to see if that can tell us anymore? Thanks!

@mabunixda
Copy link
Author

new output is

DEBU[0000] service.network allocating subnet
DEBU[0000] local subnet from datastore subnet="[60 110 105 108 62]"
FATA[0000] error parsing subnet "" (): invalid CIDR address:

@ehazlett
Copy link
Owner

OK for some reason the subnet is <nil> in the db. I'm not sure how it would have received that. I'm going to do some debug and see if we can add some checks to prevent erroneous routes from being assigned.

@mabunixda
Copy link
Author

with the latest debug information the root cause was a misconfigured datastore from an initial startup where I used subnet 192.168.0.0/24 because my local LAN setup is within 172.16.0.0/16 .. I removed the local data store in /var/lib/stellar and the startup worked with the default setup but killed my routing on the box :-D
After switching to 192.168.0.0/16 stellar started and I was able to use sctl and also deploy the sample.

Must the subnet be /16 range?

@ehazlett
Copy link
Owner

Thanks for the update! I was looking at a way to clean up the boltdb. This is a bug :)

First, we need to detect when the Subnet changes in the config as right now if there is a subnet (good or bad) in the db it will use that.

Second, there appears to be an issue with the subnet division. If /24 is used it does not calculate a valid route for some reason. I'm going to debug this and add some tests for various subnets.

Thanks for the debug! It helps tremendously 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants