Skip to content

Support point in time searches #5151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 7, 2020
Merged

Support point in time searches #5151

merged 3 commits into from
Dec 7, 2020

Conversation

stevejgordon
Copy link
Contributor

This PR adds initial support to allow searching over a point in time. It adds a representation of point in time for use in the request and a point in time ID to the response.

During testing, we identified that the current URL resolution for searches against all indices preferred using the _all/_search path. This causes issues with point in time which expects no indices to be specified. I've added logic to ApiUrls which detects when we are searching with _all as the path parameter. It then prefers a match against the root /search URL instead. This required a few test assertions elsewhere to be updated.

Questions

  1. I've opted for some ctors on the PointInTime and PointInTimeDescriptor since an ID is always necessary and this reduces consumer code. I've not seen this applied much in the non-generated code so perhaps this is negative implications? In my opinion it make consumption simpler for the test cases we have at the moment.

By adding this the Fluent syntax is r.Query(...).AllIndices().PointInTime("an-id") and the initializer syntax is:

new SearchRequest<Project>(Nest.Indices.All)
{
	Size = 1,
	Query = new QueryContainer(new MatchAllQuery()),
	PointInTime = new Nest.PointInTime("an-id", "1m")
}
  1. I've used the Time type for KeepAlive on the PointInTimeRequest. This seemed appropriate from other usages, however, the OpenPointInTimeResponse uses a string at the moment (we could change that if it's preferred?)

  2. Would it be nicer if the OpenPointInTimeResponse was deserialised into a PointInTime (without keep alive) so that it could then be passed into the search, rather than creating an instance? One thing to note is this line from the docs

The open point in time request and each subsequent search request can return different id; thus always use the most recently received id for the next search request.

So while convenient for the first request, this might not be super helpful on subsequent searches unless we also converted the pit_id into a PointInTime on the search response also. Some of this we may be able to abstract more nicely behind a helper.

  1. Should we review if ES should allow PIT searches against _all by opening an issue there? The behaviour right now is valid since _all implies to search everything, while a PIT search limits to the scope of the PIT, however, it forces our consumers to use AllIndices.

  2. Future discussion perhaps - Is there any clean way of inferring AllIndices when PointInTime is used, without the consumer having to specify it, or adding support for this?

@Mpdreamz
Copy link
Member

Mpdreamz commented Dec 4, 2020

I've opted for some ctors on the PointInTime and PointInTimeDescriptor since an ID is always necessary and this reduces consumer code. I've not seen this applied much in the non-generated code so perhaps this is negative implications? In my opinion it make consumption simpler for the test cases we have at the moment.

We do this very rarely, largely because we don't do client side validation to enforce required properties. Quite often required properties are contextual. I think in this case it makes a lot of sense.

I've used the Time type for KeepAlive on the PointInTimeRequest. This seemed appropriate from other usages, however, the OpenPointInTimeResponse uses a string at the moment (we could change that if it's preferred?)

Using time on responses has some benefits (e.g ToTimeSpan)

Would it be nicer if the OpenPointInTimeResponse was deserialised into a PointInTime (without keep alive) so that it could then be passed into the search, rather than creating an instance? One thing to note is this line from the docs

The open point in time request and each subsequent search request can return different id; thus always use the most recently received id for the next search request.

So while convenient for the first request, this might not be super helpful on subsequent searches unless we also converted the pit_id into a PointInTime on the search response also. Some of this we may be able to abstract more nicely behind a helper.

Maintaining a 1-1 mapping is key. Long term all of this will be generated. I'd rather focus on the helper.

Should we review if ES should allow PIT searches against _all by opening an issue there? The behaviour right now is valid since _all implies to search everything, while a PIT search limits to the scope of the PIT, however, it forces our consumers to use AllIndices.

This is worthy of a discussion on `elastic/elasticsearch for sure.

Future discussion perhaps - Is there any clean way of inferring AllIndices when PointInTime is used, without the consumer having to specify it, or adding support for this?

I think there is a route to do this through RequestBase.GetUrl

stevejgordon and others added 2 commits December 4, 2020 10:23
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
This allows us to special case pit searches to remove the index
parameter.

Includes new usage tests for doc generation.
@stevejgordon
Copy link
Contributor Author

@Mpdreamz:

I've updated this to support overriding the ResolveUrl method, which I've done for search to handle cases where point in time is in use. This means that consumers are no longer required to know that they must use AllIndices when including a pit ID. I've removed the more hacky search URL fixup I had previously.

I thought one thing and wrote another earlier in regards to OpenPointInTimeResponse using Time. The response only includes the ID so that's a non issue. What I meant to query is the auto-generated OpenPointInTimeRequest where KeepAlive is a string by default based on the spec.

@Mpdreamz
Copy link
Member

Mpdreamz commented Dec 4, 2020

What I meant to query is the auto-generated OpenPointInTimeRequest where KeepAlive is a string by default based on the spec.

Can this also be specified in the body? If so you can disable its generation by adding a CodeConfigurationFile for it e.g:

https://github.com/elastic/elasticsearch-net/blob/master/src/ApiGenerator/Configuration/Overrides/Endpoints/ScrollOverrides.cs#L12

Copy link
Member

@Mpdreamz Mpdreamz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@stevejgordon stevejgordon merged commit 0e7e4cc into 7.10 Dec 7, 2020
@stevejgordon stevejgordon deleted the feature/pit-search branch December 7, 2020 08:28
github-actions bot pushed a commit that referenced this pull request Dec 7, 2020
* Support point in time searches
* Support override of ResolveUrl

This allows us to special case pit searches to remove the index parameter.
Includes new usage tests for doc generation.
github-actions bot pushed a commit that referenced this pull request Dec 7, 2020
* Support point in time searches
* Support override of ResolveUrl

This allows us to special case pit searches to remove the index parameter.
Includes new usage tests for doc generation.
stevejgordon added a commit that referenced this pull request Dec 7, 2020
* Support point in time searches
* Support override of ResolveUrl

This allows us to special case pit searches to remove the index parameter.
Includes new usage tests for doc generation.

Co-authored-by: Steve Gordon <sgordon@hotmail.co.uk>
stevejgordon added a commit that referenced this pull request Dec 8, 2020
* Support point in time searches
* Support override of ResolveUrl

This allows us to special case pit searches to remove the index
parameter.
Includes new usage tests for doc generation.
stevejgordon added a commit that referenced this pull request Dec 8, 2020
* Support point in time searches (#5151)

* Support point in time searches
* Support override of ResolveUrl

This allows us to special case pit searches to remove the index
parameter.
Includes new usage tests for doc generation.

* manual fixup

Co-authored-by: Steve Gordon <sgordon@hotmail.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants