Using the Elastic DSL iterate with an Index

`Search.scan` was the previous method to utilize the Scroll API via elasticsearch-dsl. However, the Scroll API has, for a lot of functionality, been deprecated in favor of the `search_after` approach. To facilitate this, `elasticsearch-dsl` has a `Search.iterate` method which handles the default pagination for the user, default in that, you can't set the page size.

Now, suppose you have a `Search` object, you can set the index via `Search(index='some-index')` or with `Search().index('some-index')`. Regardless, you have a `Search` object on an index, for which you can then iterate the documents in said index.
```
for x in Search(index='some-index').iterate():
     pass # do thing
```

However, this does not behave as I expect it to. In `iterate`,
https://github.com/elastic/elasticsearch-py/blob/44cbf67bf9cfc29bf4253cff3e48fb0286e471ff/elasticsearch/dsl/_sync/search.py#L159
a point-in-time (PIT) is opened up, which makes sense to avoid the data changing under you.

However, my issues lies within the `point_in_time` method.
https://github.com/elastic/elasticsearch-py/blob/44cbf67bf9cfc29bf4253cff3e48fb0286e471ff/elasticsearch/dsl/_sync/search.py#L141

It opens the point in time with the appropriate index, however, next, notice how it takes the `self`, i.e. the current `Search` object and clears the index. It then yields this search object. This might make sense in the situation the doc string describes where you are constructing a point in time for multiple queries, e.g.
```
with s.point_in_time() as neo_s:
    neo_s.index('a').execute()
    neo_s.index('b').execute()
```
however, in the context of `iterate`, this yields issues as, index is never set again. Thus, each `/search` query done by iterate will be against all indices, which could yield issues if the user doesn't have permissions to read from all indices.

Please correct me if I'm wrong, this is just what seemed to be the issue when I tried to iterate on an index with a user with fixed read permissions.

As an aside, since search_after utilizes the response values of the last hit in a query (per https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after )
I am a bit confused as to why
https://github.com/elastic/elasticsearch-py/blob/44cbf67bf9cfc29bf4253cff3e48fb0286e471ff/elasticsearch/dsl/_sync/search.py#L175
is using the `s` object from the context manager, as opposed to `r` from the response. I.e., why it isn't `r.search_after()` per
https://github.com/elastic/elasticsearch-py/blob/44cbf67bf9cfc29bf4253cff3e48fb0286e471ff/elasticsearch/dsl/response/__init__.py#L186

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using the Elastic DSL iterate with an Index #2998

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using the Elastic DSL iterate with an Index #2998

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions