Skip to content

Commit

Permalink
Revert "Deprecate sorting in reindex (#49458)"
Browse files Browse the repository at this point in the history
This reverts commit 27d45c9.
  • Loading branch information
henningandersen committed Nov 29, 2019
1 parent 7cf1708 commit 1d745f1
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 95 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.tasks.TaskId;

import java.util.Collections;
Expand Down Expand Up @@ -832,6 +833,10 @@ public void testReindex() throws Exception {
// tag::reindex-request-pipeline
request.setDestPipeline("my_pipeline"); // <1>
// end::reindex-request-pipeline
// tag::reindex-request-sort
request.addSortField("field1", SortOrder.DESC); // <1>
request.addSortField("field2", SortOrder.ASC); // <2>
// end::reindex-request-sort
// tag::reindex-request-script
request.setScript(
new Script(
Expand Down
10 changes: 10 additions & 0 deletions docs/java-rest/high-level/document/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,16 @@ include-tagged::{doc-tests-file}[{api}-request-pipeline]
--------------------------------------------------
<1> set pipeline to `my_pipeline`

If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
selective query to maxDocs and sort.

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-sort]
--------------------------------------------------
<1> add descending sort to`field1`
<2> add ascending sort to `field2`

+{request}+ also supports a `script` that modifies the document. It allows you to
also change the document's metadata. The following example illustrates that.

Expand Down
42 changes: 30 additions & 12 deletions docs/reference/docs/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -476,14 +476,9 @@ which defaults to a maximum size of 100 MB.
(Optional, integer) Total number of slices.

`sort`:::
+
--
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
Use in conjunction with `max_docs` to control what documents are reindexed.

deprecated::[7.6, Sort in reindex is deprecated. Sorting in reindex was never guaranteed to index documents in order and prevents further development of reindex such as resilience and performance improvements. If used in combination with `max_docs`&#44; consider using a query filter instead.]
--

`_source`:::
(Optional, string) If `true` reindexes all source fields.
Set to a list to reindex select fields.
Expand Down Expand Up @@ -607,8 +602,8 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-select-max-docs]]
===== Reindex select documents with `max_docs`
[[docs-reindex-select-sort]]
===== Reindex select documents with sort

You can limit the number of processed documents by setting `max_docs`.
For example, this request copies a single document from `twitter` to
Expand All @@ -629,6 +624,28 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex.
Sorting makes the scroll less efficient but in some contexts it's worth it.
If possible, it's better to use a more selective query instead of `max_docs` and `sort`.

For example, following request copies 10000 documents from `twitter` into `new_twitter`:

[source,console]
--------------------------------------------------
POST _reindex
{
"max_docs": 10000,
"source": {
"index": "twitter",
"sort": { "date": "desc" }
},
"dest": {
"index": "new_twitter"
}
}
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-multiple-indices]]
===== Reindex from multiple indices

Expand Down Expand Up @@ -808,10 +825,11 @@ POST _reindex
"index": "twitter",
"query": {
"function_score" : {
"random_score" : {},
"min_score" : 0.9 <1>
"query" : { "match_all": {} },
"random_score" : {}
}
}
},
"sort": "_score" <1>
},
"dest": {
"index": "random_twitter"
Expand All @@ -820,8 +838,8 @@ POST _reindex
----------------------------------------------------------------
// TEST[setup:big_twitter]

<1> You may need to adjust the `min_score` depending on the relative amount of
data extracted from source.
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
effect unless you override the sort to `_score`.

[[reindex-scripts]]
===== Modify documents during reindexing
Expand Down
6 changes: 1 addition & 5 deletions docs/reference/ilm/ilm-with-existing-indices.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -352,10 +352,6 @@ will mean that all documents in `ilm-mylogs-000001` come before all documents in
`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
the sort will allow the data to be reindexed more quickly.

NOTE: Sorting in reindex is deprecated, see
<<docs-reindex-api-request-body,reindex request body>>. Instead use timestamp
ranges to partition data in separate reindex runs.

IMPORTANT: If your data uses document IDs generated by means other than
Elasticsearch's automatic ID generation, you may need to do additional
processing to ensure that the document IDs don't conflict during the reindex, as
Expand Down Expand Up @@ -408,4 +404,4 @@ PUT _cluster/settings
All of the reindexed data should now be accessible via the alias set up above,
in this case `mylogs`. Once you have verified that all the data has been
reindexed and is available in the new indices, the existing indices can be
safely removed.
safely removed.
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.bytes.BytesReference;
import org.elasticsearch.common.logging.DeprecationLogger;
import org.elasticsearch.common.lucene.uid.Versions;
import org.elasticsearch.common.xcontent.DeprecationHandler;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
Expand All @@ -52,7 +51,6 @@
import org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource;
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptService;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
Expand All @@ -73,9 +71,6 @@
public class Reindexer {

private static final Logger logger = LogManager.getLogger(Reindexer.class);
private static final DeprecationLogger deprecationLogger = new DeprecationLogger(logger);
static final String SORT_DEPRECATED_MESSAGE = "The sort option in reindex is deprecated. " +
"Instead consider using query filtering to find the desired subset of data.";

private final ClusterService clusterService;
private final Client client;
Expand All @@ -93,10 +88,6 @@ public class Reindexer {
}

public void initTask(BulkByScrollTask task, ReindexRequest request, ActionListener<Void> listener) {
SearchSourceBuilder searchSource = request.getSearchRequest().source();
if (searchSource != null && searchSource.sorts() != null && searchSource.sorts().isEmpty() == false) {
deprecationLogger.deprecated(SORT_DEPRECATED_MESSAGE);
}
BulkByScrollParallelizationHelper.initTaskState(task, request, client, listener);
}

Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,8 @@
---
"Sorting and max_docs in body combined":
- skip:
version: " - 7.5.99"
reason: "max_docs introduced in 7.3.0, but sort deprecated in 7.6"
features: "warnings"
version: " - 7.2.99"
reason: "max_docs introduced in 7.3.0"

- do:
index:
Expand All @@ -141,9 +140,6 @@
indices.refresh: {}

- do:
warnings:
- The sort option in reindex is deprecated. Instead consider using query
filtering to find the desired subset of data.
reindex:
refresh: true
body:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -189,10 +189,7 @@ public ReindexRequest setSourceQuery(QueryBuilder queryBuilder) {
*
* @param name The name of the field to sort by
* @param order The order in which to sort
* @deprecated Specifying a sort field for reindex is deprecated. If using this in combination with maxDocs, consider using a
* query filter instead.
*/
@Deprecated
public ReindexRequest addSortField(String name, SortOrder order) {
this.getSearchRequest().source().sort(name, order);
return this;
Expand Down

0 comments on commit 1d745f1

Please sign in to comment.