Merge branch '6.x' into ccr-6.x

* 6.x: Share common readFrom/writeTo code in AcknowledgeResponse (#30983) [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient Fix rest test skip version Fix docs build. Add a doc value format to binary fields. (#30860) Only auto-update license signature if all nodes ready (#30859) Add BlobContainer.writeBlobAtomic() (#30902) Move caching of the size of a directory to `StoreDirectory`. (#30581) Clarify docs about boolean operator precedence. (#30808) Docs: remove notes on sparsity. (#30905) Improve documentation of dynamic mappings. (#30952) Decouple MultiValueMode. (#31075) Docs: Clarify constraints on scripted similarities. (#31076)
elastic · Jun 5, 2018 · 0b26d22 · 0b26d22
2 parents 51963e5 + f485e8c
commit 0b26d22
Show file tree

Hide file tree

Showing 97 changed files with 1,077 additions and 975 deletions.
diff --git a/buildSrc/src/main/resources/checkstyle_suppressions.xml b/buildSrc/src/main/resources/checkstyle_suppressions.xml
@@ -524,8 +524,6 @@
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]settings[/\\]ClusterSettingsIT.java" checks="LineLength" />
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]shards[/\\]ClusterSearchShardsIT.java" checks="LineLength" />
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]structure[/\\]RoutingIteratorTests.java" checks="LineLength" />
-  <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]blobstore[/\\]FsBlobStoreContainerTests.java" checks="LineLength" />
-  <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]blobstore[/\\]FsBlobStoreTests.java" checks="LineLength" />
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]breaker[/\\]MemoryCircuitBreakerTests.java" checks="LineLength" />
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]geo[/\\]ShapeBuilderTests.java" checks="LineLength" />
   <suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]hash[/\\]MessageDigestsTests.java" checks="LineLength" />

diff --git a/docs/reference/how-to/general.asciidoc b/docs/reference/how-to/general.asciidoc
@@ -40,94 +40,3 @@ better. For instance if a user searches for two words `foo` and `bar`, a match
 across different chapters is probably very poor, while a match within the same
 paragraph is likely good.
 
-[float]
-[[sparsity]]
-=== Avoid sparsity
-
-The data-structures behind Lucene, which Elasticsearch relies on in order to
-index and store data, work best with dense data, ie. when all documents have the
-same fields. This is especially true for fields that have norms enabled (which
-is the case for `text` fields by default) or doc values enabled (which is the
-case for numerics, `date`, `ip` and `keyword` by default).
-
-The reason is that Lucene internally identifies documents with so-called doc
-ids, which are integers between 0 and the total number of documents in the
-index. These doc ids are used for communication between the internal APIs of
-Lucene: for instance searching on a term with a `match` query produces an
-iterator of doc ids, and these doc ids are then used to retrieve the value of
-the `norm` in order to compute a score for these documents. The way this `norm`
-lookup is implemented currently is by reserving one byte for each document.
-The `norm` value for a given doc id can then be retrieved by reading the
-byte at index `doc_id`. While this is very efficient and helps Lucene quickly
-have access to the `norm` values of every document, this has the drawback that
-documents that do not have a value will also require one byte of storage.
-
-In practice, this means that if an index has `M` documents, norms will require
-`M` bytes of storage *per field*, even for fields that only appear in a small
-fraction of the documents of the index. Although slightly more complex with doc
-values due to the fact that doc values have multiple ways that they can be
-encoded depending on the type of field and on the actual data that the field
-stores, the problem is very similar. In case you wonder: `fielddata`, which was
-used in Elasticsearch pre-2.0 before being replaced with doc values, also
-suffered from this issue, except that the impact was only on the memory
-footprint since `fielddata` was not explicitly materialized on disk.
-
-Note that even though the most notable impact of sparsity is on storage
-requirements, it also has an impact on indexing speed and search speed since
-these bytes for documents that do not have a field still need to be written
-at index time and skipped over at search time.
-
-It is totally fine to have a minority of sparse fields in an index. But beware
-that if sparsity becomes the rule rather than the exception, then the index
-will not be as efficient as it could be.
-
-This section mostly focused on `norms` and `doc values` because those are the
-two features that are most affected by sparsity. Sparsity also affect the
-efficiency of the inverted index (used to index `text`/`keyword` fields) and
-dimensional points (used to index `geo_point` and numerics) but to a lesser
-extent.
-
-Here are some recommendations that can help avoid sparsity:
-
-[float]
-==== Avoid putting unrelated data in the same index
-
-You should avoid putting documents that have totally different structures into
-the same index in order to avoid sparsity. It is often better to put these
-documents into different indices, you could also consider giving fewer shards
-to these smaller indices since they will contain fewer documents overall.
-
-Note that this advice does not apply in the case that you need to use
-parent/child relations between your documents since this feature is only
-supported on documents that live in the same index.
-
-[float]
-==== Normalize document structures
-
-Even if you really need to put different kinds of documents in the same index,
-maybe there are opportunities to reduce sparsity. For instance if all documents
-in the index have a timestamp field but some call it `timestamp` and others
-call it `creation_date`, it would help to rename it so that all documents have
-the same field name for the same data.
-
-[float]
-==== Avoid types
-
-Types might sound like a good way to store multiple tenants in a single index.
-They are not: given that types store everything in a single index, having
-multiple types that have different fields in a single index will also cause
-problems due to sparsity as described above. If your types do not have very
-similar mappings, you might want to consider moving them to a dedicated index.
-
-[float]
-==== Disable `norms` and `doc_values` on sparse fields
-
-If none of the above recommendations apply in your case, you might want to
-check whether you actually need `norms` and `doc_values` on your sparse fields.
-`norms` can be disabled if producing scores is not necessary on a field, this is
-typically true for fields that are only used for filtering. `doc_values` can be
-disabled on fields that are neither used for sorting nor for aggregations.
-Beware that this decision should not be made lightly since these parameters
-cannot be changed on a live index, so you would have to reindex if you realize
-that you need `norms` or `doc_values`.
-
diff --git a/docs/reference/index-modules/similarity.asciidoc b/docs/reference/index-modules/similarity.asciidoc
@@ -326,7 +326,18 @@ Which yields:
 // TESTRESPONSE[s/"took": 12/"took" : $body.took/]
 // TESTRESPONSE[s/OzrdjxNtQGaqs4DmioFw9A/$body.hits.hits.0._node/]
 
-You might have noticed that a significant part of the script depends on
+WARNING: While scripted similarities provide a lot of flexibility, there is
+a set of rules that they need to satisfy. Failing to do so could make
+Elasticsearch silently return wrong top hits or fail with internal errors at
+search time:
+
+ - Returned scores must be positive.
+ - All other variables remaining equal, scores must not decrease when
+   `doc.freq` increases.
+ - All other variables remaining equal, scores must not increase when
+   `doc.length` increases.
+
+You might have noticed that a significant part of the above script depends on
 statistics that are the same for every document. It is possible to make the
 above slightly more efficient by providing an `weight_script` which will
 compute the document-independent part of the score and will be available
@@ -491,7 +502,6 @@ GET /index/_search?explain=true
 
 ////////////////////
 
-
 Type name: `scripted`
 
 [float]

diff --git a/docs/reference/mapping/dynamic/field-mapping.asciidoc b/docs/reference/mapping/dynamic/field-mapping.asciidoc
@@ -135,6 +135,6 @@ PUT my_index/_doc/1
 }
 --------------------------------------------------
 // CONSOLE
-<1> The `my_float` field is added as a <<number,`double`>> field.
+<1> The `my_float` field is added as a <<number,`float`>> field.
 <2> The `my_integer` field is added as a <<number,`long`>> field.
 
diff --git a/docs/reference/mapping/dynamic/templates.asciidoc b/docs/reference/mapping/dynamic/templates.asciidoc
@@ -46,11 +46,22 @@ name as an existing template, it will replace the old version.
 [[match-mapping-type]]
 ==== `match_mapping_type`
 
-The `match_mapping_type` matches on the datatype detected by
-<<dynamic-field-mapping,dynamic field mapping>>, in other words, the datatype
-that Elasticsearch thinks the field should have.  Only the following datatypes
-can be automatically detected: `boolean`, `date`, `double`, `long`, `object`,
-`string`.  It also accepts `*` to match all datatypes.
+The `match_mapping_type` is the datatype detected by the json parser. Since
+JSON doesn't allow to distinguish a `long` from an `integer` or a `double` from
+a `float`, it will always choose the wider datatype, ie. `long` for integers
+and `double` for floating-point numbers.
+
+The following datatypes may be automatically detected:
+
+ - `boolean` when `true` or `false` are encountered.
+ - `date` when <<date-detection,date detection>> is enabled and a string is
+   found that matches any of the configured date formats.
+ - `double` for numbers with a decimal part.
+ - `long` for numbers without a decimal part.
+ - `object` for objects, also called hashes.
+ - `string` for character strings.
+
+`*` may also be used in order to match all datatypes.
 
 For example, if we wanted to map all integer fields as `integer` instead of
 `long`, and all `string` fields as both `text` and `keyword`, we

diff --git a/docs/reference/query-dsl/query-string-syntax.asciidoc b/docs/reference/query-dsl/query-string-syntax.asciidoc
@@ -235,26 +235,10 @@ states that:
 * `news` must not be present
 * `quick` and `brown` are optional -- their presence increases the relevance
 
-The familiar operators `AND`, `OR` and `NOT` (also written `&&`, `||` and `!`)
-are also supported.  However, the effects of these operators can be more
-complicated than is obvious at first glance.  `NOT` takes precedence over
-`AND`, which takes precedence over `OR`.  While the `+` and `-` only affect
-the term to the right of the operator, `AND` and `OR` can affect the terms to
-the left and right.
-
-****
-Rewriting the above query using `AND`, `OR` and `NOT` demonstrates the
-complexity:
-
-`quick OR brown AND fox AND NOT news`::
-
-This is incorrect, because `brown` is now a required term.
-
-`(quick OR brown) AND fox AND NOT news`::
-
-This is incorrect because at least one of `quick` or `brown` is now required
-and the search for those terms would be scored differently from the original
-query.
+The familiar boolean operators `AND`, `OR` and `NOT` (also written `&&`, `||`
+and `!`) are also supported but beware that they do not honor the usual
+precedence rules, so parentheses should be used whenever multiple operators are
+used together. For instance the previous query could be rewritten as:
 
 `((quick AND fox) OR (brown AND fox) OR fox) AND NOT news`::
 
@@ -272,7 +256,6 @@ would look like this:
         }
     }
 
-****
 
 ===== Grouping
 

diff --git a/...-stats/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSource.java b/...-stats/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSource.java
@@ -47,7 +47,7 @@ public NumericDoubleValues getField(final int ordinal, LeafReaderContext ctx) th
             if (ordinal > names.length) {
                 throw new IndexOutOfBoundsException("ValuesSource array index " + ordinal + " out of bounds");
             }
-            return multiValueMode.select(values[ordinal].doubleValues(ctx), Double.NEGATIVE_INFINITY);
+            return multiValueMode.select(values[ordinal].doubleValues(ctx));
         }
     }
 

diff --git a/...g-expression/src/main/java/org/elasticsearch/script/expression/DateMethodValueSource.java b/...g-expression/src/main/java/org/elasticsearch/script/expression/DateMethodValueSource.java
@@ -54,7 +54,7 @@ class DateMethodValueSource extends FieldDataValueSource {
     public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
         AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
         final Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("UTC"), Locale.ROOT);
-        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
+        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
         return new DoubleDocValues(this) {
             @Override
             public double doubleVal(int docId) throws IOException {

diff --git a/...g-expression/src/main/java/org/elasticsearch/script/expression/DateObjectValueSource.java b/...g-expression/src/main/java/org/elasticsearch/script/expression/DateObjectValueSource.java
@@ -56,7 +56,7 @@ class DateObjectValueSource extends FieldDataValueSource {
     public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
         AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
         MutableDateTime joda = new MutableDateTime(0, DateTimeZone.UTC);
-        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
+        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
         return new DoubleDocValues(this) {
             @Override
             public double doubleVal(int docId) throws IOException {

diff --git a/...ng-expression/src/main/java/org/elasticsearch/script/expression/FieldDataValueSource.java b/...ng-expression/src/main/java/org/elasticsearch/script/expression/FieldDataValueSource.java
@@ -68,7 +68,7 @@ public int hashCode() {
     @SuppressWarnings("rawtypes") // ValueSource uses a rawtype
     public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
         AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
-        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
+        NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
         return new DoubleDocValues(this) {
           @Override
           public double doubleVal(int doc) throws IOException {

diff --git a/modules/rank-eval/src/test/java/org/elasticsearch/index/rankeval/RatedRequestsTests.java b/modules/rank-eval/src/test/java/org/elasticsearch/index/rankeval/RatedRequestsTests.java
@@ -131,6 +131,7 @@ public void testXContentRoundtrip() throws IOException {
         }
     }
 
+    @AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/31104")
     public void testXContentParsingIsNotLenient() throws IOException {
         RatedRequest testItem = createTestItem(randomBoolean());
         XContentType xContentType = randomFrom(XContentType.values());

diff --git a/rest-api-spec/src/main/resources/rest-api-spec/test/search/190_index_prefix_search.yml b/rest-api-spec/src/main/resources/rest-api-spec/test/search/190_index_prefix_search.yml
@@ -66,7 +66,7 @@ setup:
 ---
 "search index prefixes with span_multi":
   - skip:
-      version: " - 6.2.99"
+      version: " - 6.3.99"
       reason: span_multi throws an exception with prefix fields on < versions
 
   - do:

diff --git a/.../org/elasticsearch/action/admin/cluster/repositories/delete/DeleteRepositoryResponse.java b/.../org/elasticsearch/action/admin/cluster/repositories/delete/DeleteRepositoryResponse.java
@@ -20,14 +20,9 @@
 package org.elasticsearch.action.admin.cluster.repositories.delete;
 
 import org.elasticsearch.action.support.master.AcknowledgedResponse;
-import org.elasticsearch.common.io.stream.StreamInput;
-import org.elasticsearch.common.io.stream.StreamOutput;
 import org.elasticsearch.common.xcontent.ConstructingObjectParser;
-import org.elasticsearch.common.xcontent.ToXContentObject;
 import org.elasticsearch.common.xcontent.XContentParser;
 
-import java.io.IOException;
-
 /**
  * Unregister repository response
  */
@@ -47,18 +42,6 @@ public class DeleteRepositoryResponse extends AcknowledgedResponse {
         super(acknowledged);
     }
 
-    @Override
-    public void readFrom(StreamInput in) throws IOException {
-        super.readFrom(in);
-        readAcknowledged(in);
-    }
-
-    @Override
-    public void writeTo(StreamOutput out) throws IOException {
-        super.writeTo(out);
-        writeAcknowledged(out);
-    }
-
     public static DeleteRepositoryResponse fromXContent(XContentParser parser) {
         return PARSER.apply(parser, null);
     }

diff --git a/...n/java/org/elasticsearch/action/admin/cluster/repositories/put/PutRepositoryResponse.java b/...n/java/org/elasticsearch/action/admin/cluster/repositories/put/PutRepositoryResponse.java
@@ -20,13 +20,9 @@
 package org.elasticsearch.action.admin.cluster.repositories.put;
 
 import org.elasticsearch.action.support.master.AcknowledgedResponse;
-import org.elasticsearch.common.io.stream.StreamInput;
-import org.elasticsearch.common.io.stream.StreamOutput;
 import org.elasticsearch.common.xcontent.ConstructingObjectParser;
 import org.elasticsearch.common.xcontent.XContentParser;
 
-import java.io.IOException;
-
 /**
  * Register repository response
  */
@@ -46,18 +42,6 @@ public class PutRepositoryResponse extends AcknowledgedResponse {
         super(acknowledged);
     }
 
-    @Override
-    public void readFrom(StreamInput in) throws IOException {
-        super.readFrom(in);
-        readAcknowledged(in);
-    }
-
-    @Override
-    public void writeTo(StreamOutput out) throws IOException {
-        super.writeTo(out);
-        writeAcknowledged(out);
-    }
-
     public static PutRepositoryResponse fromXContent(XContentParser parser) {
         return PARSER.apply(parser, null);
     }

diff --git a/.../src/main/java/org/elasticsearch/action/admin/cluster/reroute/ClusterRerouteResponse.java b/.../src/main/java/org/elasticsearch/action/admin/cluster/reroute/ClusterRerouteResponse.java
@@ -63,22 +63,32 @@ public RoutingExplanations getExplanations() {
 
     @Override
     public void readFrom(StreamInput in) throws IOException {
-        super.readFrom(in);
-        state = ClusterState.readFrom(in, null);
-        readAcknowledged(in);
-        explanations = RoutingExplanations.readFrom(in);
+        if (in.getVersion().onOrAfter(Version.V_6_4_0)) {
+            super.readFrom(in);
+            state = ClusterState.readFrom(in, null);
+            explanations = RoutingExplanations.readFrom(in);
+        } else {
+            state = ClusterState.readFrom(in, null);
+            acknowledged = in.readBoolean();
+            explanations = RoutingExplanations.readFrom(in);
+        }
     }
 
     @Override
     public void writeTo(StreamOutput out) throws IOException {
-        super.writeTo(out);
-        if (out.getVersion().onOrAfter(Version.V_6_3_0)) {
+        if (out.getVersion().onOrAfter(Version.V_6_4_0)) {
+            super.writeTo(out);
             state.writeTo(out);
+            RoutingExplanations.writeTo(explanations, out);
         } else {
-            ClusterModule.filterCustomsForPre63Clients(state).writeTo(out);
+            if (out.getVersion().onOrAfter(Version.V_6_3_0)) {
+                state.writeTo(out);
+            } else {
+                ClusterModule.filterCustomsForPre63Clients(state).writeTo(out);
+            }
+            out.writeBoolean(acknowledged);
+            RoutingExplanations.writeTo(explanations, out);
         }
-        writeAcknowledged(out);
-        RoutingExplanations.writeTo(explanations, out);
     }
 
     @Override