Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more precise structural summarization instead of NLP #105

Closed
mircealungu opened this issue May 3, 2018 · 8 comments
Closed

more precise structural summarization instead of NLP #105

mircealungu opened this issue May 3, 2018 · 8 comments

Comments

@mircealungu
Copy link
Member

mircealungu commented May 3, 2018

@bogdanp05 , one idea that we should think about is the following:

  • instead of summarizing stuff with NLP maybe we can take inspiration from the python profiler, which has the same task to achieve as the one we have: summarize multiple stack traces.

the profiler, afaik, wakes up multiple times, observes the stack trace, and then it summarizes everything at the end with a view like the following one:

images duckduckgo

maybe we could do something similar, since after all, we also have a bunch of stack traces that we want to summarize!

surely, the profiler estimates time spent in a given method, and we must summarize the number of times the outlier was found being in that method, while calling from the previous method, but it should be a similar thing.

think about this, and let's discuss it the next time we meet!

@bogdanp05 bogdanp05 self-assigned this May 3, 2018
@bogdanp05
Copy link
Member

Ok, this approach looks like it could provide way more useful results than NLP.
It also means changing the way we collect outliers, right? Because right now we don't have any duration info, just the stack trace.

@mircealungu
Copy link
Member Author

i'm not sure exactly how these time-sampling profilers
work.

i'm imagining something like this:

stack trace 1
a
- b
-- c
--- d

stack trace 2
a
- b
-- c
-- e

could in theory be summarized like this:

a --> 2
- b --> 2
-- c --> 2
-- d --> 1
-- e --> 1

i think that exploring something like this, might be one way of summarizing many traces, right?
what do you think?

@mircealungu
Copy link
Member Author

basically, what i think i'm saying is that every stack trace is a graph (directed, acyclic, actually a graph degenerated into a list)

thus, if we could summarize it with a prefix tree where every node has the count of paths that pass through it. or something like this, it's late now :)

@bogdanp05
Copy link
Member

Ok, I think I see what you mean. I will look into this.

@bogdanp05
Copy link
Member

I worked on this issue and the script I have so far is here.
First, I parsed the stack traces as they appeared in the db and tried to represent them in a unified way (one stack trace element per line).
Then, I represented each stack element as a unique tuple of 4:
An element in the stacktrace is uniquely represented as a 4-element tuple:
(file_name, line_number, function, text of line).
What's left now is to actually visualize one list of tuples (i.e. a stack trace) as a tree, and then merging such trees together.
There are still 2 points that should be discussed here:

  1. Over different versions, an endpoint can change its code significantly and this means that also the stack traces will be different. Thus, it might be better to visualize the stack traces of outliers per endpoint per version.
  2. A stack trace of an outlier contains all the individual stack traces of the running threads. Visualizing one such stack trace could be done by simply having one tree per thread. But how could we merge these resulting trees for multiple outliers?

@mircealungu
Copy link
Member Author

  1. agreed with visualizing per version
  2. interesting. can't we just create a big graph that contains all the stack traces, w/o worrying about the individual thread where the action happens?

@bogdanp05
Copy link
Member

  1. We'll probably go for that, at least in the beginning.

@bogdanp05
Copy link
Member

Implemented in #164

Flask-MonitoringDashboard automation moved this from To do to Done Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants