New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mkfifo: cannot create fifo 'f': File exists #11142

Closed
khauser opened this Issue Jan 25, 2018 · 10 comments

Comments

Projects
None yet
8 participants
@khauser
Copy link

khauser commented Jan 25, 2018

Rancher versions:
rancher/server: 1.6.12
rancher/agent: 1.2.7

Infrastructure Stack versions:
healthcheck: 0.3.3
ipsec: 0.2.1
network-services: v0.6.3
scheduler:
kubernetes (if applicable):

Docker version: (docker version,docker info preferred)
17.09.1-ce
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
RancherOS v1.1.3 (4.9.75)

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
vSphere
Setup details: (single node rancher vs. HA rancher, internal DB vs. external DB)
HA rancher + external DB
Environment Template: (Cattle/Kubernetes/Swarm/Mesos)
Cattle
Steps to Reproduce:
I do have problems to setup a kafka-cluster, which is working already in another environment with exactly the same input.

We assume to have rancher network issues..

IPSec says:

25.1.2018 11:10:19Refer to router sidekick for logs
25.1.2018 11:10:19mkfifo: cannot create fifo 'f': File exists

network-manager says:

25.1.2018 11:06:01time="2018-01-25T10:06:01Z" level=info msg="hostports: : Applying new rules"
25.1.2018 11:06:01iptables-restore: line 34 failed
25.1.2018 11:06:01hostports: failed to apply rules --- start
25.1.2018 11:06:01
25.1.2018 11:06:01hostports: failed to apply rules --- end
25.1.2018 11:06:01time="2018-01-25T10:06:01Z" level=error msg="hostports:  Failed to apply host rules: exit status 1"
25.1.2018 11:06:01time="2018-01-25T10:06:01Z" level=info msg="arpsync: Network router changed, syncing ARP tables 1/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:06:05time="2018-01-25T10:06:05Z" level=info msg="hostports: : Applying new rules"
25.1.2018 11:06:07time="2018-01-25T10:06:07Z" level=info msg="arpsync: Network router changed, syncing ARP tables 2/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:06:12time="2018-01-25T10:06:12Z" level=info msg="arpsync: Network router changed, syncing ARP tables 3/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:17time="2018-01-25T10:07:17Z" level=info msg="arpsync: Network router changed, syncing ARP tables 4/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:22time="2018-01-25T10:07:22Z" level=info msg="arpsync: Network router changed, syncing ARP tables 5/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:26time="2018-01-25T10:07:26Z" level=info msg="CNI down" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec
25.1.2018 11:07:27time="2018-01-25T10:07:27Z" level=info msg="arpsync: Network router changed, syncing ARP tables 6/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:28time="2018-01-25T10:07:28Z" level=info msg="Setting up resolv.conf for ContainerId [3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491]"
25.1.2018 11:07:28time="2018-01-25T10:07:28Z" level=info msg="CNI up" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec
25.1.2018 11:07:28time="2018-01-25T10:07:28Z" level=info msg="CNI up done" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec result="IP4:{IP:{IP:10.42.125.239 Mask:ffff0000} Gateway:10.42.0.1 Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:10.42.0.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="CNI down" cid=b568a2f55a5274bcde36f07af12dc23fb544978dae44d80bc99d55c8067a2daa networkMode=ipsec
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: Network router changed, syncing ARP tables 7/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:32 Type:1 Flags:0 IP:10.42.234.245 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:1 Type:1 Flags:0 IP:10.42.178.9 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:1 Type:1 Flags:0 IP:10.42.101.158 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:32 Type:1 Flags:0 IP:10.42.248.79 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:1 Type:1 Flags:0 IP:10.42.211.87 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:1 Type:1 Flags:0 IP:10.42.27.137 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=info msg="arpsync: (3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491) wrong ARP entry found={LinkIndex:3 Family:2 State:32 Type:1 Flags:0 IP:10.42.36.5 HardwareAddr:}(expected: 02:fe:f2:93:69:64) for local container, fixing it"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=error msg="arpsync: got error while syncing arp tables for containers=failed to open netns /proc/0/ns/net: failed to Statfs \"/proc/0/ns/net\": no such file or directory"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=error msg="arpsync: while syncing, got error: failed to open netns /proc/0/ns/net: failed to Statfs \"/proc/0/ns/net\": no such file or directory"
25.1.2018 11:07:32time="2018-01-25T10:07:32Z" level=error msg="macsync: i: 5, error syncing MAC addresses: failed to open netns /proc/0/ns/net: failed to Statfs \"/proc/0/ns/net\": no such file or directory"
25.1.2018 11:07:37time="2018-01-25T10:07:37Z" level=info msg="arpsync: Network router changed, syncing ARP tables 8/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:42time="2018-01-25T10:07:42Z" level=info msg="arpsync: Network router changed, syncing ARP tables 9/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:47time="2018-01-25T10:07:47Z" level=info msg="arpsync: Network router changed, syncing ARP tables 10/10 in containers, new MAC: 02:fe:f2:93:69:64"
25.1.2018 11:07:47time="2018-01-25T10:07:47Z" level=info msg="Setting up resolv.conf for ContainerId [64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047]"
25.1.2018 11:07:47time="2018-01-25T10:07:47Z" level=info msg="CNI up" cid=64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047 networkMode=ipsec
25.1.2018 11:07:47time="2018-01-25T10:07:47Z" level=info msg="CNI up done" cid=64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047 networkMode=ipsec result="IP4:{IP:{IP:10.42.96.62 Mask:ffff0000} Gateway:10.42.0.1 Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:10.42.0.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}"
25.1.2018 11:10:17time="2018-01-25T10:10:17Z" level=info msg="CNI down" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec
25.1.2018 11:10:17time="2018-01-25T10:10:17Z" level=info msg="hostports: : Applying new rules"
25.1.2018 11:10:19time="2018-01-25T10:10:19Z" level=info msg="Setting up resolv.conf for ContainerId [3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491]"
25.1.2018 11:10:19time="2018-01-25T10:10:19Z" level=info msg="CNI up" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec
25.1.2018 11:10:19time="2018-01-25T10:10:19Z" level=info msg="CNI up done" cid=3bbcacaaa411857ec2a682a06a4c65df82b8e2ecb71e577a3e0eb61e8f3bd491 networkMode=ipsec result="IP4:{IP:{IP:10.42.125.239 Mask:ffff0000} Gateway:10.42.0.1 Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:10.42.0.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}"
25.1.2018 11:10:19time="2018-01-25T10:10:19Z" level=info msg="hostports: : Applying new rules"
25.1.2018 11:10:50time="2018-01-25T10:10:50Z" level=info msg="CNI down" cid=64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047 networkMode=ipsec
25.1.2018 11:10:52time="2018-01-25T10:10:52Z" level=info msg="Setting up resolv.conf for ContainerId [64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047]"
25.1.2018 11:10:52time="2018-01-25T10:10:52Z" level=info msg="CNI up" cid=64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047 networkMode=ipsec
25.1.2018 11:10:52time="2018-01-25T10:10:52Z" level=info msg="CNI up done" cid=64add4cfa7292b1964dd5c4e3fe588f8c40262661772b787ad6183cc6b47e047 networkMode=ipsec result="IP4:{IP:{IP:10.42.96.62 Mask:ffff0000} Gateway:10.42.0.1 Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:10.42.0.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}"

What could I do on that?

@khauser

This comment has been minimized.

Copy link
Author

khauser commented Jan 25, 2018

Workaround now was to delete all stacks from user and infrastructure.

@wochinge

This comment has been minimized.

Copy link

wochinge commented Jan 26, 2018

Hey,

I have got the same problem. After a while, when everything works fine, the network totally collapses. For me, a restart of all hosts and the rancher server helps (which results probably in the same effect then deleting and re-adding each stack)

EDIT: My problem was because I had a Rancher agent on the same host as the Rancher master. Simply removed it (explicitly defining the ip should also be fine) and everything runs perfectly.

@egor20041

This comment has been minimized.

Copy link

egor20041 commented Jan 30, 2018

Hi, have the same problem. Only restart hosts can help. Once only restart rancher have helped.

UPDATE: Problem was in that rancher-server have been on the same host with rancher-agent. This is not allowed.

@vainkop

This comment has been minimized.

Copy link

vainkop commented Feb 8, 2018

Same problem here on AWS.
1.6.14 + rds

@superseb

This comment has been minimized.

Copy link
Member

superseb commented Feb 8, 2018

The message mkfifo: cannot create fifo 'f': File exists is harmless and doesn't cause interruption in the network.

We assume to have rancher network issues.. Can you elaborate on this, what is not working? The overlay network is not functioning at all or not between certain hosts or certain containers? Same goes for other people experiencing network issues, also the used Rancher version and infrastructure stack versions are viable information.

@Shifter2600

This comment has been minimized.

Copy link

Shifter2600 commented Feb 14, 2018

Same problem here on Bare Metal fresh install.
1.6.14

@superseb

This comment has been minimized.

Copy link
Member

superseb commented Feb 15, 2018

As described earlier, the error is no reason for something not functioning. @khauser Can you elaborate on your problem so we can investigate? For the others, please search through the other issues or create one of your own so we can take a look. There is nothing in this issue which can cause issues in the environment.

@superseb superseb closed this Feb 15, 2018

@CasparChou

This comment has been minimized.

Copy link

CasparChou commented Feb 23, 2018

I got the solution:

  1. Enter container with bash
  2. rm -rf f
  3. restart all of loadbalances
@jonathascarrijo

This comment has been minimized.

Copy link

jonathascarrijo commented Mar 20, 2018

Same happening to me, out of nowhere (I just added some new hosts and scaled up some services). Causing my load balancers to never finish initializing. Production environment affected. I need a workaround!

@superseb

This comment has been minimized.

Copy link
Member

superseb commented Mar 20, 2018

Like mentioned before:

The message mkfifo: cannot create fifo 'f': File exists is harmless and doesn't cause interruption in the network.

If you experience issues, report a new issue with the used versions and behavior so we can investigate.

@rancher rancher locked as resolved and limited conversation to collaborators Mar 20, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.