Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Explorer - Filter traces by response time, custom context, endpoint, etc. #33

Open
lmansur opened this Issue Feb 22, 2018 · 17 comments

Comments

Projects
None yet
3 participants
@lmansur
Copy link

lmansur commented Feb 22, 2018

I have a custom context to differentiate Web requests to API requests. It would be very useful if I could filter my traces by that specific context, since most of the time I want to focus on improving web-related issues first.

This would also be very useful when trying to find a performance issue for a specific User, since I also have a context with their information.

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Feb 26, 2018

@lmansur - agreed. Many performance issues are directly related to the size of the data being operated on. This is often correlated to the current user in the session, current account, etc.

We've started working on a POC for this. No ETA yet, but it's a focus for us.

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Mar 7, 2018

We're moving along well on this.

A couple of tools we're excited about for filtering traces, which is a multi-dimensional dataset:

These let you filter data in realtime, making exploration significantly faster than constructing queries that require server-side execution.

@lmansur

This comment has been minimized.

Copy link
Author

lmansur commented Mar 7, 2018

Very interesting tools, thank you for sharing and keeping the issue up to date!

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Mar 20, 2018

Here's a video of the current state - this isn't yet exposed in the UI, but you can get a flavor for the interaction:

screen

@lmansur

This comment has been minimized.

Copy link
Author

lmansur commented Mar 20, 2018

Thank you, Derek!

@itsderek23 itsderek23 changed the title Filter traces by Custom Context Filter traces by response time, custom context, endpoint, etc. Apr 13, 2018

@itsderek23 itsderek23 changed the title Filter traces by response time, custom context, endpoint, etc. Trace Explorer - Filter traces by response time, custom context, endpoint, etc. Apr 18, 2018

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Apr 18, 2018

@lmansur @qrush @pjuanda @justinstern @nathansamson @jonzlin95 - this is now available under our Tech Preview Program. You'll see a new "Traces" link at the top of the app nav. Click this to access Trace Explorer:

image

There are numerous rough edges, but we've been getting a lot of value from Trace Explorer internally and figure others will too. We'll work thru these issues as usual.

In other words 👇

image

Share your initial feedback via this issue or by emailing support@scoutapp.com!

@nathansamson

This comment has been minimized.

Copy link

nathansamson commented Apr 27, 2018

My 2 cents:
A) Context filters: you only see the top 5, and no easy way to discover the other values
B) Context filters: You can't select default context (eg node)
C) Their is a weird bug when clicking on the "By Response Time" diagram (to select only certain values) it actually "detects" the click around 50% further in the diagram. Once you have the estimated zone, you can drag it to the right location. (Firefox, not tested in other browsers)
D) Selecting all values except one (or a few) does not seem to be possible. I would expect this to work with ctrl-click on the selected value (and it would deselect it)

But great feature, and it has helped us tremendously in spotting a few pages that act out some times (but not often enough to impact the averages too much)

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Apr 30, 2018

A) Context filters: you only see the top 5, and no easy way to discover the other values

See #53. Makes sense.

B) Context filters: You can't select default context (eg node)

Anything you are missing besides the node name?

C) Their is a weird bug when clicking on the "By Response Time" diagram (to select only certain values) it actually "detects" the click around 50% further in the diagram. Once you have the estimated zone, you can drag it to the right location. (Firefox, not tested in other browsers)

That is weird. See #52.

D) Selecting all values except one (or a few) does not seem to be possible. I would expect this to work with ctrl-click on the selected value (and it would deselect it)

Makes sense. See #54.

Please follow these specific issues (and 👍 if you're interested) for updates.

@nathansamson

This comment has been minimized.

Copy link

nathansamson commented May 3, 2018

About B) I think node is the most important one, other includes

  • URI (but that is handled mostly by the endpoint)
  • TIme since startup, but that does not seem too helpful either
  • User IP we already have
  • Git revision might be helpful, to see if it is a given deploy that has all the slow requests
@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented May 3, 2018

@nathansamson

This comment has been minimized.

Copy link

nathansamson commented May 9, 2018

image

Did this change recently? Today or so?

It is now displaying the User ID's like they have an ordering.
Previously they got displayed like any other field, this made more sense to me

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented May 9, 2018

Did this change recently? Today or so?

Yes.

See #63 for a proposed fix.

@nathansamson

This comment has been minimized.

Copy link

nathansamson commented May 9, 2018

Another issue I noticed today.

When I run traces for the past 12h the trace with the max duration is 27s.
When I run traces for the past 6h I get a trace with max duration 70s. (and it did happen a few hours ago, not like in the last minute)

How is that possible? The only logical thing I can imagine is the "1,000 selected out of 1,000 traces"
Since over 12h we have way more than 1000 traces in the 12h view it only takes the first 1000 which did not include the later (very slow) ones...

What I would be able to do is check every day for the past 1 day, check ALL traces that are taking longer than a given threshold so I can optimize them. Now they are lost, if the first few traces of the day are quick...

I can understand from a perf POV it is not possible to load 10k traces in the view, but then you should let me pre filter them (everything with response time greater than X for example)

PS: If you prefer me making new issues directly, instead of commenting on this one I am happy to do so

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented May 9, 2018

How is that possible?

We select a 1k sampling of all traces over the time period. Over a 6-hour period, up to 3.6k traces could be collected.

Mind creating a separate issue? I can see two modes to start:

  1. Random sample (best for diversity)
  2. Slowest
@nathansamson

This comment has been minimized.

Copy link

nathansamson commented May 9, 2018

Created issue #64 for this.

Just out of curiosity. Why the 3.6k limit? Where is that being controlled? Is the client only sending one detailed trace per 10 seconds?

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented May 9, 2018

Created issue #64 for this.

Thanks!

Just out of curiosity. Why the 3.6k limit? Where is that being controlled? Is the client only sending one detailed trace per 10 seconds?

The agent sends up to 10 per-minute. We have an algorithm to determine interesting traces (both on the client and on the server) that then takes traces across all hosts and selects up to 10 per-minute per app.

Collecting detailed traces adds more overhead, so this is sampled.

@itsderek23

This comment has been minimized.

Copy link
Member

itsderek23 commented Jun 15, 2018

We've added a chart for memory allocations.

image

This needs #64 so you can view the worst performers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.