-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support scrolling for elasticsearch #696
Conversation
2f570b0
to
0102035
Compare
what problem is this solving? |
@yurishkuro Solving #668 |
Few remarks:
IMO it would be great if it was possible to do some kind of paging on the trace view page. E.g. see 10k spans ordered by time and be able to go to the next/previous 10k, or something like that. Fetching everything will create new problems. |
@mabn Thanks for the comments. I should've raised this concern before , you are totally correct. Instead of 10K limit , will it help to increase the limit to something manageable and possibly use search_after? I am not aware about the UI, and how to do paging from UI. Gotta check more to see how everything will fit together. |
Building a sensible DAG without knowing all the spans seems pretty hard. Trace with 1mm spans is an interesting theoretical use case but I doubt any existing tracing system would be able to deal with those today. On the other hand, traces with >10k spans are fairly common, so it would be great to solve this for ES (not a problem in Cassandra, btw). As for super large traces, I still think the solution is to try to load a complete trace in memory, but employ some form of compression. A lot of span data is repeated, like operation names, tags, etc. Also, in order to construct a DAG most of this data is not needed and the spans can be compressed to just the id/parent + timing + name. Anyway, this is a problem for another day (after I retire and won't care). If we ignore the possibility of not loading complete trace, is the search_after a reasonable solution? From the help page it seems like it will retrieve just as much data as needed if used to retrieve spans for a given trace ID. |
I'm only concerned about the search page - if such 1mm trace is matched by search criteria (and if it's still happening then most likely some of its spans will match) then I'd like to show something rather than explode on memory or wait 30s for the json to download. And it happens in practice :( due to bugs and poor design. |
@mabn if spans > 10K works in cassandra, I mean , won't it be a problem in cassandra too? I think really it becomes practical limitation on number of spans we can fetch on the UI. |
I didn't run jaeger with cassandra in a while, but it might be a "problem" there as well. So - it might make sense to fetch everything from ES and open a separate issue to limit it on the search results page? |
Yes, I can modify this PR to use search_after and we can have a separate issue to support pagination in the UI. |
ceca6a6
to
037fd07
Compare
83aef4d
to
dcc0563
Compare
Made searchrequests in the reader.go simpler Fixed unit tests Signed-off-by: sramakr <sramakr@gmail.com>
I think we don't use jaeger-query in perf tests. Tests read the ES directly. |
@pavolloffay we don't use jaeger-query now but I've been working on this. I hope to have some information tomorrow. |
@pavolloffay @jpkrohling I wrote a test where I created 100,000 traces, and then retrieved them multiple times in chunks of 20,000. When run (just on my laptop with nothing else running) against the Jaeger master, the average time to retrieve 20,000 traces was 5727 milliseconds. When run using this PR the average dropped to 3909 milliseconds. This looks to be a decent performance improvement. I can run something additional if you think its necessary. |
Would it be possible to run the test a couple of times more, just to make sure the second run didn't perform better because of better temporal conditions? Ideally, this would run on a perf server, to avoid desktop-specific issues, but if you have consistent results after a few runs, I'm happy with it :) |
@jpkrohling I did run multiple times and got fairly consistent results. Let me see if I can setup something to run on the CNCF CI Jenkins in the next couple of days. |
Any update on this issue? |
@ledor473 Are you referring to the issue of "Span graph is not showing anymore" ? I didn't get a chance to look into it. Is that an UI issue ? |
I meant the overall state of that PR. Seems like @kevinearls conducted the load testing needed. |
ok cool @ledor473 I am still waiting for 👍 from the reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm giving my +1, as the code looks sane to me, but I guess we'll know about the perf improvements only once this change is delivered.
if err != nil { | ||
return nil, err | ||
} | ||
var nextTime uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be initialized to startTime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure , because nextTime is derived from the results of the query in line 232.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nextTime is derived from the results of the previous 10K , don't think we can initialize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mean that the first ever query always queries from 0 (searchAfterTime isn't initialized) even though we're only interested things after startTime? (I guess since we're looking at a subset of indices, this isn't too bad performance wise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@black-adder Thought about this a bit... I agree , we should initialize next-time to start-time, don't see a problem there. I am going to modify this and update the PR.
} | ||
// set traceIDs to empty | ||
traceIDs = nil | ||
results, err := s.client.MultiSearch().Add(searchRequests...).Index(indices...).Do(s.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 10000 limit for the entire query? ie if we search for 10 traces and the first trace takes up all 10000 spans, will this return a response for only the first trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's per trace. obviously, there is the downside of if there are 2 traces and 1st trace gets only 100 spans and next like 50K spans, entire query is going to wait to fetch 50,100 spans before it returns.
240cdfa
to
58c9ba3
Compare
Initialize nextTime with startTime , so query will start from startTime instead of 0. Signed-off-by: sramak001 <sriram_ramakrishnan@cable.comcast.com>
Thanks! |
Resolves #668
This commit allows to support search_after when fetching traces from elasticsearch.
Steps are,
Signed-off-by: Sriram Ramakrishnan sramakr@gmail.com