You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following docs: http://www.rubydoc.info/gems/elasticsearch-api/Elasticsearch/API/Actions#scroll-instance_method
When I'm trying to reproduce example 'Call the scroll API until all the documents are returned', I notice that this call
# Call the `scroll` API until empty results are returned
while r = client.scroll(scroll_id: r['_scroll_id'], scroll: '5m') and not r['hits']['hits'].empty? do
puts "--- BATCH #{defined?($i) ? $i += 1 : $i = 1} -------------------------------------------------"
puts r['hits']['hits'].map { |d| d['_source']['title'] }.inspect
puts
end
doesn't contains the results of this initial call:
# Open the "view" of the index with the `scan` search_type
r = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10
So in the end we missing positions counting by size of initial scroll call.
Example.
If we have index ['test1', 'test2',' test3' ..... 'test100']
calling the scroll API with initial size 10 will return ['test11', 'test12', .... 'test100'] with missing first 10 results.
I have same results in elasticsearch console - first call of scroll does not include results of initial call, so seems that the scroll method works like it need.
But the question is in find_each
According docs:
Iterate effectively over models using the `find_in_batches` method.
#
# All the options are passed to `find_in_batches` and each result is yielded to the passed block.
#
# @example Print out the people's names by scrolling through the index
#
# Person.find_each { |person| puts person.name }
#
# # # GET http://localhost:9200/people/person/_search?scroll=5m&search_type=scan&size=20
# # # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# # Test 0
# # Test 1
# # Test 2
# # ...
# # # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# # Test 20
# # Test 21
# # Test 22
#
But, in fact it will return
# # Test 20
# # Test 21
# # Test 22
Think that the problem in rewriting of 'response' var in find_in_batches.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
loop do
hits = response.dig('hits', 'hits')
break if hits.empty?
hits.each do |hit|
# Process/do something with the hit or hits
end
response = Search::Client.scroll(
:body => { :scroll_id => response['_scroll_id'] },
:scroll => '1m'
)
end
Elasticsearch 5.1
Following docs:
http://www.rubydoc.info/gems/elasticsearch-api/Elasticsearch/API/Actions#scroll-instance_methodWhen I'm trying to reproduce example 'Call the
scrollAPI until all the documents are returned', I notice that this calldoesn't contains the results of this initial call:
So in the end we missing positions counting by size of initial scroll call.
Example.
If we have index ['test1', 'test2',' test3' ..... 'test100']
calling the scroll API with initial size 10 will return ['test11', 'test12', .... 'test100'] with missing first 10 results.
I have same results in elasticsearch console - first call of scroll does not include results of initial call, so seems that the scroll method works like it need.
But the question is in find_each
According docs:
But, in fact it will return
Think that the problem in rewriting of 'response' var in find_in_batches.
The text was updated successfully, but these errors were encountered: