Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix2308: Add query parameter to Search API to configure spans-per-span-set in response #2411

Closed
wants to merge 8 commits into from

Conversation

mghildiy
Copy link
Contributor

@mghildiy mghildiy commented Apr 30, 2023

Summary of changes:

  • Add query parameter spss to API /api/search in http.go
  • set number of spans during filtering
  • Update http_test.go for new query parameter
  • Update engine_test.go
  • Update instance_search_test.go

Adds a query parameter to Search API to make spans-per-span-set in response configurable:
Fixes #2308

Checklist

  • [http_test.go, engine_test.go,instance_search_test.go ] Tests updated
  • [docs/sources/tempo/api_docs/_index.md] Documentation added
  • [CHANGELOG.md] updated

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good but we need a few tweaks before we can get it in. A few other things

If you have noticed CI is a mess at the moment. We are working on it, but for now if things are passing locally feel free to submit and i will rerun CI.

Thanks for giving this a shot!

@@ -110,7 +110,7 @@ func Handler(r *http.Request) (*tempopb.SearchResponse, *HTTPError) {
var resp *tempopb.SearchResponse

if api.IsTraceQLQuery(searchReq.SearchReq) {
engine := traceql.NewEngine()
engine := traceql.NewEngineWithSpansPerSpanSet(int(searchReq.SearchReq.SpansPerSpanSet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have NewEngine() take the spans per spanset parameter instead of making a new method. There are 2 places in instance_search that will also need this parameter for it to work.

Copy link
Contributor Author

@mghildiy mghildiy May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see three invocations of NewEngine in instance_search, 2 of which have access to request. This one doesn't have, so I guess I can pass default value of 3 here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a little deeper and I'm wondering why we don't just use the value on the searchReq? We pass it in when we call ExecuteSearch. Can't we just the value on that struct?

Copy link
Contributor Author

@mghildiy mghildiy May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to add spanperspanset field in SearchBlockRequest instead of SearchRequest?

pkg/api/http.go Outdated Show resolved Hide resolved
pkg/tempopb/tempo.proto Show resolved Hide resolved
@mghildiy
Copy link
Contributor Author

mghildiy commented May 3, 2023

I have made the changes as suggested above. But engine_test:TestEngine_Execute still fails because what filter function does is not reflected in the final response, for eactly the same reason as discussed here.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes as suggested above. But engine_test:TestEngine_Execute still fails because what filter function does is not reflected in the final response, for eactly the same reason as discussed #2307.

This should work though b/c every span returned by the filter function is decorated with metadata by the second pass of the fetch layer. I don't think we'll have the same issue here.

pkg/traceql/engine.go Outdated Show resolved Hide resolved
@@ -110,7 +110,7 @@ func Handler(r *http.Request) (*tempopb.SearchResponse, *HTTPError) {
var resp *tempopb.SearchResponse

if api.IsTraceQLQuery(searchReq.SearchReq) {
engine := traceql.NewEngine()
engine := traceql.NewEngineWithSpansPerSpanSet(int(searchReq.SearchReq.SpansPerSpanSet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a little deeper and I'm wondering why we don't just use the value on the searchReq? We pass it in when we call ExecuteSearch. Can't we just the value on that struct?

pkg/api/http.go Outdated Show resolved Hide resolved
@mghildiy
Copy link
Contributor Author

mghildiy commented May 4, 2023

Now I understand the cause of the test failure. Its down to the way we are using those filtered spans in mocked iterator defined in engine_test. There we seem to be not correctly assigning them to the response.

r.Spans = r.Spans[len(ss):]

I think we need to assign only those spans which are now in ss as it represents the filtered result.

PS:
So I modified the mock iterator to take into account what I said above, and now test runs as expected. But I observe another thing.

SpanSet has a field 'Matched' which I understand represents the matching spans found in the storage. We are populating this field here

SpanSet: &tempopb.SpanSet{
			Matched: uint32(len(spanset.Spans)),
		}

spanset represents the filtered result. So it seems, Matched is just the number of spans left after filtering(filtering involves query conditions + taking only number of spans as per spansPerSpanSet configuration). So Matched is actually not exactly the number of spans matched in storage.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 5, 2023

Test failures here made me to try to run tests locally, and I see many test failures in e2e, wal, OTHERS etc. Need to look in more details why those failures happening.

It may take some more time as make tasks are not running out of box in windows. I may have to check those tests one-by-one.

@joe-elliott
Copy link
Member

So Matched is actually not exactly the number of spans matched in storage.

Right, this is what you were looking at fixing previously, but it is going to require some additional work that I was going to do anyway when implementing grouping and select. It's on the list to fix!

Test failures here made me to try to run tests locally, and I see many test failures in e2e, wal, OTHERS etc. Need to look in more details why those failures happening.

Yeah, our CI is not great at the moment. @mapno is looking into cleaning it up some.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 6, 2023

Yeah, our CI is not great at the moment. @mapno is looking into cleaning it up some.

It would be really helpful if makefile can be made window compatible.

@mghildiy mghildiy requested a review from stoewer as a code owner May 7, 2023 12:39
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mghildiy
❌ manish chandra ghildiyal


manish chandra ghildiyal seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@stoewer
Copy link
Collaborator

stoewer commented May 8, 2023

It would be really helpful if makefile can be made window compatible

Have you considered using WSL for development?

There also is a vscode plugin to seamlessly develop using WSL. I have not tried it myself as I don't have a windows environment - but it looks worth trying.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 8, 2023

It would be really helpful if makefile can be made window compatible

Have you considered using WSL for development?

There also is a vscode plugin to seamlessly develop using WSL. I have not tried it myself as I don't have a windows environment - but it looks worth trying.

Yeah, I tried with WSL, but things become pretty slow with it.
Anyways, I have got hold of a mac machine, so not an issue anymore.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 8, 2023

I ran docker-compose for local, referring to a tempo image build with local changes.
And on UI I see:

image

I don't see the new query parameter I added for spans-pers-span-set. It seems I need to make UI changes too.

@joe-elliott
Copy link
Member

I don't see the new query parameter I added for spans-pers-span-set. It seems I need to make UI changes too.

The UI will not automatically update for added query parameters. Once this PR is in we will submit an issue in the grafana repo to support the new param. If you are interested in working on it, you'd be more than welcome to. If not, the team will get to it when they can.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 8, 2023

I don't see the new query parameter I added for spans-pers-span-set. It seems I need to make UI changes too.

The UI will not automatically update for added query parameters. Once this PR is in we will submit an issue in the grafana repo to support the new param. If you are interested in working on it, you'd be more than welcome to. If not, the team will get to it when they can.

Ok, got it. Yup, I am interested on Grafana UI work.
But I am not sure if any more changes are needed on this ticket now.

@joe-elliott
Copy link
Member

joe-elliott commented May 8, 2023

There is one change I'd like you to make. Let's remove the parameter from NewEngine() and just use the value passed into ExecuteSearch:

image

There's no need to construct the engine with this value since we're already passing it into the only place that uses it.

Also, double check the output of "Vendor Check". It looks like proto generation is having issues.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 10, 2023

So I was able to run vendor-check locally, courtesy input I got on Slack channel.
Issue is that proto doesnt seem to allow uint8, or even uint16. I would need to use uint32 instead.

@mghildiy mghildiy force-pushed the fix2308 branch 4 times, most recently from 89d6072 to 3d0ded3 Compare May 10, 2023 14:41
@@ -72,8 +69,9 @@ func (e *Engine) ExecuteSearch(ctx context.Context, searchReq *tempopb.SearchReq

// reduce all evalSS to their max length to reduce meta data lookups
for i := range evalSS {
if len(evalSS[i].Spans) > e.spansPerSpanSet {
evalSS[i].Spans = evalSS[i].Spans[:e.spansPerSpanSet]
spansPerSpanSet := int(searchReq.SpansPerSpanSet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are breaking b/c they are passing in 0 for this value and it is truncating all responses. i think if the value is 0 here we should default to 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to update requests created in testAdvancedTraceQLCompleteBlock. Seems updating above code is a better solution.

@mghildiy
Copy link
Contributor Author

I am not able to reproduce failure of test-with-cover-others locally.

@mghildiy
Copy link
Contributor Author

mghildiy commented May 12, 2023

test-with-cover-tempodb seems to be flaky. Sometimes it fails, sometimes not.

@joe-elliott
Copy link
Member

test-with-cover-tempodb seems to be flaky. Sometimes it fails, sometimes not.

Yup. We are working on it, but it's really struggling right now. If anyone is interested in trying to improve some of those tests please do.

The bigger problem here is:
image
for some reason the CLA job is not kicking off? Maybe due to GH issues the past few days?

fwiw, i think this is ready to merge, i'm just waiting on CI to work itself out.

@mghildiy
Copy link
Contributor Author

I am not sure why it shows so, but I have signed CLA few times.

@joe-elliott
Copy link
Member

I am not sure why it shows so, but I have signed CLA few times.

The job is not even kicking off. It has nothing to do with whether or not you've signed the CLA. One option is to push an empty commit and see if that kick starts it. Otherwise we can wait for it to give up and force restart it. As long as its pending I can't do anything.

@mghildiy
Copy link
Contributor Author

If anyone is interested in trying to improve some of those tests please do.

I can debug it.
Any specific tests that can be directly looked into, as there are many tests in 'test-with-cover-tempodb'.
Or any approach I can follow to zero-in on faulty ones? Currently I see tests failing randomly.

@mghildiy mghildiy closed this May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add query parameter to Search API to configure the spans per span set
4 participants