Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Crash during statedump parsing #65

Open
jdesfossez opened this issue Jul 7, 2016 · 3 comments
Open

Crash during statedump parsing #65

jdesfossez opened this issue Jul 7, 2016 · 3 comments
Labels

Comments

@jdesfossez
Copy link
Contributor

jdesfossez commented Jul 7, 2016

$ ~/projects/src/lttng/lttng-analyses/lttng-iousagetop --skip-validation --tid 2873 --debug lttng-startup-1551/
Traceback (most recent call last):      
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 73, in _run_step
    fn()
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 341, in _run_analysis
    self._automaton.process_event(event)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 75, in process_event
    sp.process_event(ev)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/sp.py", line 33, in process_event
    self._cbs[name](ev)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/statedump.py", line 102, in _process_lttng_statedump_file_descriptor
    cpu_id=event['cpu_id'])
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 56, in send_notification_cb
    cb(**kwargs)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/core/io.py", line 314, in _process_update_fd
    fd_list = self.tids[tid].fds[fd]
KeyError: 1662
Error: Cannot run analysis: 1662

The error at the end is wrong, it has nothing to do with an analysis, 1662 is a FD.
It looks like we are trying to update the filename of an FD that is not in the analysis state.

We cannot have access to the trace that triggers this problem.

@milianw
Copy link
Contributor

milianw commented Jul 11, 2016

I use this patch locally to make the scripts work for me:

diff --git a/lttnganalyses/core/io.py b/lttnganalyses/core/io.py
index b33a41d..42c72a5 100644
--- a/lttnganalyses/core/io.py
+++ b/lttnganalyses/core/io.py
@@ -311,8 +311,11 @@ class IoAnalysis(Analysis):
         fd = kwargs['fd']

         new_filename = parent_proc.fds[fd].filename
-        fd_list = self.tids[tid].fds[fd]
-        fd_list[-1].filename = new_filename
+        try:
+            fd_list = self.tids[tid].fds[fd]
+            fd_list[-1].filename = new_filename
+        except:
+            pass


 class DiskStats():

@abusque
Copy link
Contributor

abusque commented Jul 24, 2016

So the issue appears to stem from the way the --tid filter list is handled. The automaton (the state system) is unaware of the list, and will therefore keep track of all file descriptors for all TIDs, regardless of what the analysis filters. That's why it would still send a notification to update the filename tied to this FD for this process, despite it being filtered out by the analysis.

The solution is to perform a _filter_process check in _process_update_fd. I'll submit a patch with this change.

However, the error message generated is very unhelpful and quite confusing. That's because we only print the name of the step (in this case, "run analysis") during which the exception occurred, and then the exception's message, which, in the case of a KeyError like here, is only the value of the key.

I'm not sure how to go about solving this. There are definitely instances where exception messages will be helpful to the user (mostly ones we generate ourselves), so we need to print them, but here it's more confusing than anything else.

Obviously exceptions that we don't raise intentionally shouldn't happen ever, ideally, but that's pretty much unavoidable.

Thoughts?

@jdesfossez
Copy link
Contributor Author

Please wait before working around here, we have a big patchset coming in that changes a lot of code in the analysis to handle multiple concurrent periods (branch overlap_periods in my personal repo if you want to have a look).

The filter by tid/cpu has been removed from the I/O analysis because it only make sense to filter in the CLI, we need all the data in the analysis. As soon as the patchset is merged we'll look if this is still an issue.

Thanks,

Julien

@jdesfossez jdesfossez added the bug label Jul 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants