Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System D /run/systemd/private #266

Closed
Arlion opened this issue Sep 8, 2016 · 3 comments
Closed

System D /run/systemd/private #266

Arlion opened this issue Sep 8, 2016 · 3 comments

Comments

@Arlion
Copy link
Contributor

Arlion commented Sep 8, 2016

After extensive troubleshooting I am at an inpass and hoping there is enough information here.

Symptoms:
During startup, gmond will attempt to start but would fail. I later discovered the service was pausing for 3 minutes while it waits for a port to open, and then pauses for another 2 minutes.

After startup completes, starting the service completes.

Troubleshooting:

  • Looking at the logs
    • -What was weird that systemd was hanging in such a way that journalctl was not producing any output. A regular systemctl issued at the same time as the "systemctl start gmond.service" would show that the service was dead.
    • Modifying systemD unit to include additional debug:
/lib/systemd/system/gmond.service
[Unit]
Description=Ganglia Monitoring Daemon
After=multi-user.target

[Service]
Type=notify
ExecStart=/usr/sbin/gmond
Environment=SYSTEMD_LOG_LEVEL=debug
Requires=dbus.service  ## added to ensure dbus service was up before gmond started.
[Install]
WantedBy=multi-user.target

Does not produce any additional logs (which was still, none)

I finally wrote a script to hook strace to the process on startup and here it is.
http://paste.fedoraproject.org/423402/47325873/

Here are a few excerts:

10:07:19 connect(3, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, 22) = 0
10:07:19 getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
10:07:19 getsockopt(3, SOL_SOCKET, SO_PEERSEC, 0x7f8947fef810, 0x7ffd40deba50) = -1 ENOPROTOOPT (Protocol not available)
10:07:19 fstat(3, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
10:07:19 recvmsg(3, 0x7ffd40dea8a0, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
10:07:19 ppoll([{fd=3, events=POLLIN}], 1, {24, 999975000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {24, 999930711})
10:07:19 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\1\10\0\0\0\6\0\0\0\17\0\0\0\5\1u\0\3\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
10:07:19 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"\10\1g\0\1v\0\0\1b\0\0\0\0\0\0", 16}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 16
10:07:19 recvmsg(3, 0x7ffd40dea950, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
10:07:19 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}])
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1K\0\0\0\7\0\0\0p\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., 179}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 179
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1@\0\0\0\10\0\0\0q\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24

Finally the service continues and then pauses again for another two minutes. The link above contains all the logs undedited.

Server details:
CentOS 7.2
Fully up to date
ganglia.x86_64                   3.7.2-2.el7                         @epel/7    
ganglia-gmond.x86_64             3.7.2-2.el7                         @epel/7    
ganglia-gmond-python.x86_64      3.7.2-2.el7                         @epel/7    
systemd.x86_64                   219-19.el7_2.12                     @updates/7 
systemd-libs.x86_64              219-19.el7_2.12                     @updates/7 
systemd-sysv.x86_64              219-19.el7_2.12                     @updates/7 
dbus.x86_64                      1:1.6.12-14.el7_2                   @updates/7 
dbus-glib.x86_64                 0.100-7.el7                         @anaconda/7
dbus-libs.x86_64                 1:1.6.12-14.el7_2                   @updates/7 
dbus-python.x86_64               1.1.1-9.el7                         @anaconda/7
ls -al /run/systemd/private
srwxrwxrwx 1 root root 0 Sep  7 13:49 /run/systemd/private

Thank you for your time.

@vvuksan
Copy link
Member

vvuksan commented Sep 13, 2016

I am wondering whether

After=multi-user.target

should be changed to

After=network-online.target

@Arlion
Copy link
Contributor Author

Arlion commented Apr 11, 2017

Pull request #282 has been created to address this.

@Arlion
Copy link
Contributor Author

Arlion commented Apr 11, 2017

Pull Request #282 has been merged. Closing issue.

@Arlion Arlion closed this as completed Apr 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants