Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unhandled exception when plugin timeout happens during logging a collected file #2722

Closed
pmoravec opened this issue Oct 8, 2021 · 4 comments · Fixed by #2746
Closed

unhandled exception when plugin timeout happens during logging a collected file #2722

pmoravec opened this issue Oct 8, 2021 · 4 comments · Fixed by #2746

Comments

@pmoravec
Copy link
Contributor

pmoravec commented Oct 8, 2021

Reproducer:

sos report -o selinux --batch --build -k selinux.timeout=1 -vv
..
[plugin:selinux] copying path '/var/lib/selinux/targeted/active/modules/100/webalizer/hll' to archive:'/var/lib/selinux/targeted/active/modules/100/webalizer/hll'

 Plugin selinux timed out

[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/postgresql/hll'

[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/uuidd/hll'
..
[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/sudo/hll'
Your sosreport build tree has been generated in:
	/var/tmp/sosreport-pmoravec-rhel8-2021-10-08-uyaoush


[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/snort/cil'
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
ValueError: I/O operation on closed file.
Call stack:
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 69, in _worker
    work_item.run()
  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/sos-main/sos/report/__init__.py", line 1083, in collect_plugin
    plug.collect()
  File "/root/sos-main/sos/report/plugins/__init__.py", line 2667, in collect
    self._collect_copy_specs()
  File "/root/sos-main/sos/report/plugins/__init__.py", line 2631, in _collect_copy_specs
    self._log_info("collecting path '%s'" % path)
  File "/root/sos-main/sos/report/plugins/__init__.py", line 706, in _log_info
    self.soslog.info(self._format_msg(msg))
Message: "[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/ntp/cil'"
Arguments: ()
[plugin:selinux] collecting path '/var/lib/selinux/targeted/active/modules/100/ntp/cil'

The above appears many times, for _collect_copy_specs as well as for _collect_cmds or for "collected plugin '%s' in %s" % fields log.

The cause is that https://github.com/sosreport/sos/blob/main/sos/report/plugins/__init__.py#L706 attempts to log verobse messages (-vv is crucial for this reproducer) at a time when the plugin is timeouted already and the temporary sos.log file is moved to its final destination - while (timeouted) plugin still attempts to log to the temp file.

@pmoravec
Copy link
Contributor Author

pmoravec commented Oct 8, 2021

selinux plugin is not the required plugin here; simply chose a plugin that collects files longer than its timeout is (and enable verbose logs).

@TurboTurtle
Copy link
Member

A race condition, certainly, but is this at all reproduce-able with timeouts longer than 1 second? Locally, for 1 second timeouts I see the same issue, but for anything longer than 1 second I am unable to reproduce this for any plugin - our normal timeout handling functions as expected.

@pmoravec
Copy link
Contributor Author

pmoravec commented Oct 8, 2021

It started to occur on systems with a high number of CPUs after applying #2665 - processor plugin timeouts after the default 300s.

@pmoravec
Copy link
Contributor Author

pmoravec commented Nov 3, 2021

This is more severe: when calling without verbose mode, the process segfaults after an attempt to write to closed/removed file. Reproducer:

python3 bin/sos report -o kernel,selinux -k selinux.timeout=1 --batch --build

pmoravec added a commit to pmoravec/sos that referenced this issue Nov 3, 2021
Wait for shutting down threads of timeouted plugins, to prevent
them in writing to moved auxiliary files like sos_logs/sos.log

Resolves: sosreport#2722
Closes: sosreport#2746

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
TurboTurtle pushed a commit that referenced this issue Nov 3, 2021
Wait for shutting down threads of timeouted plugins, to prevent
them in writing to moved auxiliary files like sos_logs/sos.log

Resolves: #2722
Closes: #2746

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants