-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
From Joel in IT:
Grigore,
Fslweb is back up and serving the website. We are, unfortunately not in control of the network update schedule, but we do apologize for the outage. That said, here is some technical information regarding what happened.
The logs were recording many messages of this form:
Oct 26 03:30:33 fslweb NetworkManager[1795]: <warn> error parsing timestamps file '/var/lib/NetworkManager/timestamps': Too many open files
Oct 26 03:30:33 fslweb NetworkManager[1795]: <warn> error saving timestamp: Failed to create file '/var/lib/NetworkManager/timestamps.F7ZEOX': Too many open files
Days and perhaps months prior to the outage. This leads me to believe there was an issue already present that the network
Nov 1 05:40:40 fslweb NetworkManager[1795]: <warn> sysctl: failed to open '/proc/sys/net/ipv6/conf/eth0/accept_ra': (24) Too many open files
Nov 1 05:40:40 fslweb NetworkManager[1795]: <error> [1414838440.4364] [nm-device.c:3486] nm_device_update_ip4_address(): couldn't open control socket.
Nov 1 05:40:40 fslweb NetworkManager[1795]: <error> [1414838440.4476] [nm-system.c:771] nm_system_device_is_up_with_iface(): couldn't open control socket.
Nov 1 05:40:40 fslweb NetworkManager[1795]: <info> (eth0): bringing up device.
Is the time of the actual network outage. You can see that it fails to come back up due to a lack of available file handles. It then spams the same line repeatedly up until I restarted the machine this morning.
Nov 2 03:45:52 fslweb NetworkManager[1795]: <error> [1414921552.22183] [nm-system.c:771] nm_system_device_is_up_with_iface(): couldn't open control socket.
Checking the max open file handles, 1,620,366 is the number of files the system will open concurrently. That's a million and a half open files. From checking the backup stats on the machine it looks like the machine itself has almost 7 million files in just 115GB of space. This leads me to believe that the issue that caused the machine to not come back up after the networking outage was the open files, not something directly related to the network. I need to run to a meeting, but I'll provide additional information this afternoon.
Joel
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels