Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query service "search" api 'limit', 'total' and 'offset' fields are not supplied on response #2027

Open
jkandasa opened this issue Jan 17, 2020 · 17 comments
Labels
area/storage enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@jkandasa
Copy link
Member

Requirement - what kind of business use case are you trying to solve?

set of fields are not supplied on the query service search response

Service: jaeger query
API: /apt/traces
Current response:

{
  data: "...",
  total: 0,
  limit: 0,
  offset: 0
}

Response struct:

type structuredResponse struct {

Response part:
structuredRes := structuredResponse{

Fields total, limit and offset are not supplied on the response part.

I am ready yo submit a PR if someone guide me to get these fields.

@ghost ghost added the needs-triage label Jan 17, 2020
@yurishkuro
Copy link
Member

Thet are not supported by the backend, and pagination is also not supported. There's no benefit in returning them in the current state.

@ohetrifork
Copy link

This really makes the API difficult to use. So there is not way to order the search results, and there is no way to use pagination? In effect, you need to query down to eg. second long time intervals if you need to make sure that you pick up all results in an even moderately loaded system.
I would highly recommend introducing time sorting or pagination

@jpkrohling
Copy link
Contributor

Do you want to give it a try? A first step would probably be to create an interface that defines the pagination functions, and make at least one storage plugin to implement it. The Query handler can then check if the current storage plugin implements the interface and will call the appropriate functions to get paginated results.

@jpkrohling jpkrohling added enhancement good first issue Good for beginners hacktoberfest help wanted Features that maintainers are willing to accept but do not have cycles to implement area/storage and removed needs-triage labels Sep 10, 2020
@sagaranand015
Copy link
Contributor

A first step would probably be to create an interface that defines the pagination functions, and make at least one storage plugin to implement it. The Query handler can then check if the current storage plugin implements the interface and will call the appropriate functions to get paginated results.

@jpkrohling: I was going through this issue and tried a few approaches.
As I understand, the idea is to make the API paginated. To introduce pagination, say in case of api/traces API, can we not use the TraceQueryParams? We already have a limit queryString, adding support for an offset queryString would allow the offset data being passed to the appropriate storage layer implementation and returned paginated (and sorted, if any) from the storage layer itself.

As I understand the approach you mentioned, is it correct to say that the querying method of the storage layer would remain the same? The storage implementation would get the results like it does now and then a pagination function is called to return a paginated response. Is this correct?

I think getting pagination done on the storage query might help the performance, and looks straightforward to implement.
Inputs Please?

Thanks

@yurishkuro
Copy link
Member

  1. Offsets are a poor way of implementing pagination (there's an unnecessary long blog post that explains why: https://tusharsharma.dev/posts/api-pagination-the-right-way#offset-pagination)
  2. Our storage backends (Cassandra & ES) currently do not organize the data in a way that would allow for consistent result ordering to implement pagination

@pranoyk
Copy link

pranoyk commented Sep 17, 2023

@yurishkuro
As I understand in our case, we need to provide the ability to jump to a page number once pagination is implemented. I don't think that is achievable with cursor based approach. Only offset based approach can work for us there. This blog explains the issue: https://dev.to/rahul_ramfort/understanding-offset-vs-cursor-based-pagination-1582

Let me know if my understanding is correct. I would like to implement it.

@yurishkuro
Copy link
Member

@pranoyk the blob you linked also clearly explain why you SHOULD NOT implement offset-based pagination.

We have no specific requirement to implement pagination exactly as page-number access

@pranoyk
Copy link

pranoyk commented Sep 18, 2023

@yurishkuro Alright! In that case, should we go ahead with cursor-based pagination? If so, I can start looking at the implementation since as you mentioned our storage data is not organized to allow pagination.

@yurishkuro
Copy link
Member

feel free to give it a try

@pranoyk
Copy link

pranoyk commented Sep 18, 2023

From what I understand, in ES mapping we have spanID which would be unique, or traceID which can be used as a cursor(need your input on which field would be more suitable) and sort it based on startTime.

PS. Currently, I am planning only for ES. Once this is done, I will have a look at Cassandra.

@pranoyk
Copy link

pranoyk commented Sep 25, 2023

@yurishkuro should I go ahead and give the above a shot? I will have to create a new interface that would do the pagination and then use it. If this is not required I will look into some other issue. Thanks!

@yurishkuro
Copy link
Member

Please explain the approach you want to take.

@pranoyk
Copy link

pranoyk commented Sep 27, 2023

We would have to implement scroll search in elastic search.

  • In order to do that we need to make a initial search request and get a scrollID
    • we need to define a time for which we want to keep the scroll active. In our case we might have to hardcode this value
  • use that scrollID to get subsequent pages
    • we would have to again send the scroll time in this case
  • we need to make sure that we clear the scroll once

All of these are 3 different APIs and hence we would require separate methods. Our api/traces must be able to tell us whether the request is the first request or subsequent page request else we might have to keep track of it somehow (not sure we are already doing something like this)

We also need to make a note that if there are new traces after a scrollID is been created then we won't be able to access them. In such cases we would have to first clear the scroll and then recreate it.

@yurishkuro
Copy link
Member

  • This sounds very specific to ES, I don't see how other storage backends would work via similar API. We need an approach that could be implemented by different backends, and then abstract it such that the implementation details do not leak through the storage interface.
  • ESv7 has explicit warning that ScrollAPI is not recommended

@pranoyk
Copy link

pranoyk commented Sep 29, 2023

I went through the other storage backends we use and we can abstract out the above solution for all of them. We need our api/traces to tell us whether it's the first request or the subsequent one.

  • With Cassandra we get Page State that can be used as a cursor (in Cassandra Page State is new for every request).
  • With Badger we can get the last key of the current page and use it as the cursor for the next page in it.seek()
  • With gRPC I need some help. I understand that we support multiple storage backends through it (currently 2, are we gonna support more?) If we are going to support more then we need to make sure that those storage backends have the ability to implement pagination via cursor.

Also, one open question is how to move backward on the page. Right now, it looks like there are very different approaches for our storage backend. I will have to look more into it in order to abstract it.

@yurishkuro
Copy link
Member

It would be useful when you throw in backend-specific terms like "Page State" if you included links to the documentation.

I am still quite skeptical about any built-in pagination APIs because the tracing data and the way we write it does land itself to stable ordering. The best approximation of a token would be the timestamp, such that the next page (as in "more" not "page #X") would be a query with ts > Tprev. But look, for example, at the duration index in Cassandra:

    PRIMARY KEY ((service_name, operation_name, bucket), duration, start_time, trace_id)

Here, after you execute the first query, more spans can be stored with later ts, but smaller duration (still in the range).

So unless the storage somehow materializes the query and natively supports "next page" operation, it's difficult to support it.

With gRPC I need some help

If there were some more or less common abstraction, for example returning some opaque blob of "current page token" that is specific to the backend, it would provide a unified API for the UI to use.

@pranoyk
Copy link

pranoyk commented Oct 3, 2023

I got your point. I will have to check how we can create some wrapper to paginate rather than using the inbuilt pagination and compare between using the inbuilt pagination APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

No branches or pull requests

6 participants