Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose timeout query param for search requests #2748

Merged
merged 10 commits into from Nov 2, 2023
Merged

Expose timeout query param for search requests #2748

merged 10 commits into from Nov 2, 2023

Conversation

coszio
Copy link
Contributor

@coszio coszio commented Sep 29, 2023

Continues upon #2293 and solves #2623

Exposes timeout param for search requests.

  • in query params for REST
  • in top level args for gRPC

Affects at local shard level and internal client request. For group requests, it affects at the GroupBy level

Checklist:

  • Expose for search and batch search
  • Expose for recommend and batch recommend
  • Expose for group (search/recommend) requests
  • Implement timeout for GroupBy
  • Make tests

Caveats

Search and recommend requests return a 500 Service Error status because of this conversion, even when it is just a timeout error. This is different from what grouping requests return now, which is a standard 408 Request Timeout. We can make both return 500 if desired, but I feel we should fix the former case.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@coszio coszio requested a review from timvisee October 4, 2023 13:12
@coszio coszio marked this pull request as ready for review October 4, 2023 13:12
@coszio coszio requested a review from generall October 4, 2023 13:12
@timvisee
Copy link
Member

timvisee commented Oct 4, 2023

This is different from what grouping requests return now, which is a standard 408 Request Timeout. We can make both return 500 if desired, but I feel we should fix the former case.

I haven't looked into the details yet, but I'd also prefer the former case returning HTTP 408 if we can keep backwards compatibility in a reasonable manner. We can potentially explore this in another PR.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I haven't been able to actually test it jet but did want to post my review comments thus far.

I'll try to test tomorrow.

src/actix/api/read_params.rs Outdated Show resolved Hide resolved
lib/api/build.rs Show resolved Hide resolved
openapi/openapi-main.ytt.yaml Show resolved Hide resolved
@timvisee
Copy link
Member

timvisee commented Oct 5, 2023

Sadly, the recent #2761 and #2762 merges brought some conflicts.

@coszio
Copy link
Contributor Author

coszio commented Oct 5, 2023

An update:
@generall noted that the current changes do not extend the default timeout for remote shard communication.

Here's what I've tried:

  • The way it works with these changes is that we add a timeout to the Request the internal grpc client makes to the remote shard.
  • We also have available our method of with_channel_timeout, but after using it, it doesn't solve our problem.

In first scenario we are limited by the default max_timeout of with_channel_timeout so we sometimes get an error like

"Timeout {}ms reached for uri: {}"

or

"Tonic status error: The operation was cancelled"

In second scenario, we only get the second error.

So what's happening? with_channel_timeout is only applying a separate timeout from the actual channel timeout.

I propose to deal with that complexity in another PR, and we leave the constraint that, for remote shards, timeout will be the lowest of global or query param.

My second proposal is to refactor to get rid of the custom layer of timeout of with_channel_timeout so that the only way to affect it is setting the Request timeout.

Lastly, to be able to actually extend the channel timeout we'd need to always create a new channel, which is not desired. So a solution would be to set it very high by default, and make sure we always set the Request timeout to a default max if it was not specified before

cc @timvisee

@timvisee
Copy link
Member

timvisee commented Oct 6, 2023

After some investigation from my end I came to the same conclusion.

I agree that the right approach is to remove (or make very high) the channel timeout and set it per request everywhere.

Thanks for sharing your findings!

@timvisee timvisee mentioned this pull request Oct 9, 2023
9 tasks
@timvisee
Copy link
Member

timvisee commented Oct 9, 2023

#2771 implements the above suggestion. When we merge it, I think we can merge this as well.

@coszio
Copy link
Contributor Author

coszio commented Nov 2, 2023

Now that #2771 has been merged, I've rebased this one, and manually tested the case of setting timeout to more than the configured one. It now works in both cases.

lib/api/src/grpc/proto/points.proto Outdated Show resolved Hide resolved
@coszio coszio merged commit 4700e2a into dev Nov 2, 2023
18 checks passed
@coszio coszio deleted the search-timeout branch November 2, 2023 16:45
generall pushed a commit that referenced this pull request Dec 6, 2023
* add timeout query param for search requests

* enable timeout for recommend requests

* Add query timeout for group by requests

* update openapi models

* Don't decrease timeout after recommend preprocessing

* Add openapi test

* code review

* add timeout to individual group by requests, non-decreasing

* handle timeout for discover

* Update timeout field tag in SearchBatchPoints
message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants