Skip to content

Commit

Permalink
Update/main (#104974)
Browse files Browse the repository at this point in the history
* Change release version lookup to an instance method (#104902)

* Upgrade to Lucene 9.9.2 (#104753)

This commit upgrades to Lucene 9.9.2.

* Improve `CANNOT_REBALANCE_CAN_ALLOCATE` explanation (#104904)

Clarify that in this situation there is a rebalancing move that would
improve the cluster balance, but there's some reason why rebalancing is
not happening. Also points at the `can_rebalance_cluster_decisions` as
well as the node-by-node decisions since the action needed could be
described in either place.

* Get from translog fails with large dense_vector (#104700)

This change fixes the engine to apply the current codec when retrieving documents from the translog.
We need to use the same codec than the main index in order to ensure that all the source data is indexable.
The internal codec treats some fields differently than the default one, for instance dense_vectors are limited to 1024 dimensions.
This PR ensures that these customizations are applied when indexing document for translog retrieval.

Closes #104639

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* [Connector Secrets] Add delete API endpoint (#104815)

* Add DELETE endpoint for /_connector/_secret/{id}
* Add endpoint to write_connector_secrets cluster privilege

* Merge Aggregations into InternalAggregations (#104896)

This commit merges Aggregations into InternalAggregations in order to remove the unnecessary hierarchy.

* [Profiling] Simplify cost calculation (#104816)

* [Profiling] Add the number of cores to HostMetadata

* Update AWS pricelist (remove cost_factor, add usd_per_hour)

* Switch cost calculations from 'cost_factor' to 'usd_per_hour'

* Remove superfluous CostEntry.toXContent()

* Check for Number type in CostEntry.fromSource()

* Add comment

* Retry get_from_translog during relocations (#104579)

During a promotable relocation, a `get_from_translog` sent by the
unpromotable  shard to handle a real-time get might encounter
`ShardNotFoundException` or  `IndexNotFoundException`. In these cases,
we should retry.

This is just for `GET`. I'll open a second PR for `mGET`.  The relevant
IT is in the  Stateless PR.

Relates ES-5727

* indicating fix for 8.12.1 for int8_hnsw (#104912)

* Removing the assumption from some tests that the request builder's request() method always returns the same object (#104881)

* [DOCS] Adds get setting and update settings asciidoc files to security API index (#104916)

* [DOCS] Adds get setting and update settings asciidoc files to security API index.

* [DOCS] Fixes references in docs.

* Reuse APMMeterService of APMTelemetryProvider (#104906)

* Mute more tests that tend to leak searchhits (#104922)

* ESQL: Fix SearchStats#count(String) to count values not rows (#104891)

SearchStats#count incorrectly counts the number of documents (or rows)
 in which a document appears instead of the actual number of values.
This PR fixes this by looking at the term frequency instead of the doc
 count.

Fix #104795

* Adding request source for cohere (#104926)

* Fixing a broken javadoc comment in ReindexDocumentationIT (#104930)

This fixes a javadoc comment that was broken by #104881

* Fix enabling / disabling of APM agent "recording" in APMAgentSettings (#104324)

* Add `type` parameter support, for sorting, to the Query API Key API (#104625)

This adds support for the `type` parameter, for sorting, to the Query API key API.
The type for an API Key can currently be either `rest` or `cross_cluster`.
This was overlooked in #103695 when support for the `type` parameter
was first introduced only for querying.

* Apply publish plugin to es-opensaml-security-api project (#104933)

* Support `match` for the Query API Key API (#104594)

This adds support for the `match` query type to the Query API key Information API.
Note that since string values associated to API Keys are mapped as `keywords`,
a `match` query with no analyzer parameter is effectively equivalent to a `term` query
for such fields (e.g. `name`, `username`, `realm_name`).

Relates: #101691

* [Connectors API] Relax strict response parsing for get/list operations (#104909)

* Limit concurrent shards per node for ESQL (#104832)

Today, we allow ESQL to execute against an unlimited number of shards 
concurrently on each node. This can lead to cases where we open and hold
too many shards, equivalent to opening too many file descriptors or
using too much memory for FieldInfos in ValuesSourceReaderOperator.

This change limits the number of concurrent shards to 10 per node. This 
number was chosen based on the _search API, which limits it to 5.
Besides the primary reason stated above, this change has other
implications:

We might execute fewer shards for queries with LIMIT only, leading to 
scenarios where we execute only some high-priority shards then stop. 
For now, we don't have a partial reduce at the node level, but if we
introduce one in the future, it might not be as efficient as executing
all shards at the same time.  There are pauses between batches because
batches are executed sequentially one by one.  However, I believe the
performance of queries executing against many shards (after can_match)
is less important than resiliency.

Closes #103666

* [DOCS] Support for nested functions in ES|QL STATS...BY (#104788)

* Document nested expressions for stats

* More docs

* Apply suggestions from review

- count-distinct.asciidoc
  - Content restructured, moving the section about approximate counts to end of doc.

- count.asciidoc
  - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.

- percentile.asciidoc
  - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.

- stats.asciidoc
  - Clarified the `STATS` command
  -  Added a note indicating that individual `null` values are skipped during aggregation

* Comment out mentioning a buggy behavior

* Update sum with inline function example, update test file

* Fix typo

* Delete line

* Simplify wording

* Fix conflict fix typo

---------

Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* [ML] Passing input type through to cohere request (#104781)

* Pushing input type through to cohere request

* switching logic to allow request to always override

* Fixing failure

* Removing getModelId calls

* Addressing feedback

* Switching to enumset

* [Transform] Unmute 2 remaining continuous tests: HistogramGroupByIT and TermsGroupByIT (#104898)

* Adding ActionRequestLazyBuilder implementation of RequestBuilder (#104927)

This introduces a second implementation of RequestBuilder (#104778). As opposed
to ActionRequestBuilder, ActionRequestLazyBuilder does not create its request
until the request() method is called, and does not hold onto that request (so each
call to request() gets a new request instance).
This PR also updates BulkRequestBuilder to inherit from ActionRequestLazyBuilder
as an example of its use.

* Update versions to skip after backport to 8.12 (#104953)

* Update/Cleanup references to old tracing.apm.* legacy settings in favor of the telemetry.* settings (#104917)

* Exclude tests that do not work in a mixed cluster scenario (#104935)

* ES|QL: Improve type validation in aggs for UNSIGNED_LONG and better support for VERSION (#104911)

* [Connector API] Make update configuration action non-additive (#104615)

* Save allocating enum values array in two hot spots (#104952)

Our readEnum code instantiates/clones enum value arrays on read.
Normally, this doesn't matter much but the two spots adjusted here are
visibly hot during bulk indexing, causing GBs of allocations during e.g.
the http_logs indexing run.

* ESQL: Correct out-of-range filter pushdowns (#99961)

Fix pushed down filters for binary comparisons that compare a
byte/short/int/long with an out of range value, like
WHERE some_int_field < 1E300.

* [DOCS] Dense vector element type should be float for OpenAI (#104966)

* Fix test assertions (#104963)

* Move functions that generate lucene geometries under a utility class (#104928)

We have functions that generate lucene geometries scattered in different places of the code. This commit moves 
everything under a utility class.

* fixing index versions

---------

Co-authored-by: Simon Cooper <simon.cooper@elastic.co>
Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Navarone Feekery <13634519+navarone-feekery@users.noreply.github.com>
Co-authored-by: Ignacio Vera <ivera@apache.org>
Co-authored-by: Tim Rühsen <tim.ruehsen@gmx.de>
Co-authored-by: Pooya Salehi <pxsalehi@users.noreply.github.com>
Co-authored-by: Keith Massey <keith.massey@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: Moritz Mack <mmack@apache.org>
Co-authored-by: Costin Leau <costin@users.noreply.github.com>
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Co-authored-by: Mark Vieira <portugee@gmail.com>
Co-authored-by: Jedr Blaszyk <jedrazb@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Joe Gallo <joe.gallo@elastic.co>
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: Luigi Dell'Aquila <luigi.dellaquila@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
  • Loading branch information
1 parent e1b74aa commit 47ca7ae
Show file tree
Hide file tree
Showing 289 changed files with 5,805 additions and 1,895 deletions.
12 changes: 6 additions & 6 deletions TRACING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@ You must supply configuration and credentials for the APM server (see below).
In your `elasticsearch.yml` add the following configuration:

```
tracing.apm.enabled: true
telemetry.tracing.enabled: true
telemetry.agent.server_url: https://<your-apm-server>:443
```

When using a secret token to authenticate with the APM server, you must add it to the Elasticsearch keystore under `tracing.apm.secret_token`. For example, execute:
When using a secret token to authenticate with the APM server, you must add it to the Elasticsearch keystore under `telemetry.secret_token`. For example, execute:

bin/elasticsearch-keystore add tracing.apm.secret_token
bin/elasticsearch-keystore add telemetry.secret_token

then enter the token when prompted. If you are using API keys, change the keystore key name to `tracing.apm.api_key`.
then enter the token when prompted. If you are using API keys, change the keystore key name to `telemetry.api_key`.

All APM settings live under `tracing.apm`. All settings related to the Java agent
go under `telemetry.agent`. Anything you set under there will be propagated to
All APM settings live under `telemetry`. Tracing related settings go under `telemetry.tracing` and settings
related to the Java agent go under `telemetry.agent`. Anything you set under there will be propagated to
the agent.

For agent settings that can be changed dynamically, you can use the cluster
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,10 +201,10 @@ public void beforeStart() {
try {
mockServer.start();
node.setting("telemetry.metrics.enabled", "true");
node.setting("tracing.apm.enabled", "true");
node.setting("tracing.apm.agent.transaction_sample_rate", "0.10");
node.setting("tracing.apm.agent.metrics_interval", "10s");
node.setting("tracing.apm.agent.server_url", "http://127.0.0.1:" + mockServer.getPort());
node.setting("telemetry.tracing.enabled", "true");
node.setting("telemetry.agent.transaction_sample_rate", "0.10");
node.setting("telemetry.agent.metrics_interval", "10s");
node.setting("telemetry.agent.server_url", "http://127.0.0.1:" + mockServer.getPort());
} catch (IOException e) {
logger.warn("Unable to start APM server", e);
}
Expand All @@ -213,9 +213,10 @@ public void beforeStart() {
// if metrics were not enabled explicitly for gradlew run we should disable them
else if (node.getSettingKeys().contains("telemetry.metrics.enabled") == false) { // metrics
node.setting("telemetry.metrics.enabled", "false");
} else if (node.getSettingKeys().contains("tracing.apm.enabled") == false) { // tracing
node.setting("tracing.apm.enable", "false");
}
} else if (node.getSettingKeys().contains("telemetry.tracing.enabled") == false
&& node.getSettingKeys().contains("tracing.apm.enabled") == false) { // tracing
node.setting("telemetry.tracing.enable", "false");
}

}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@ public void testExtractSecureSettings() {

public void testExtractSettings() throws UserException {
Function<String, Settings.Builder> buildSettings = (prefix) -> Settings.builder()
.put("tracing.apm.enabled", true)
.put(prefix + "server_url", "https://myurl:443")
.put(prefix + "service_node_name", "instance-0000000001");

Expand Down Expand Up @@ -158,7 +157,6 @@ public void testExtractSettings() throws UserException {
IllegalStateException.class,
() -> APMJvmOptions.extractApmSettings(
Settings.builder()
.put("tracing.apm.enabled", true)
.put("tracing.apm.agent.server_url", "https://myurl:443")
.put("telemetry.agent.server_url", "https://myurl-2:443")
.build()
Expand Down
5 changes: 5 additions & 0 deletions docs/changelog/104594.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 104594
summary: Support of `match` for the Query API Key API
area: Authentication
type: enhancement
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/104625.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 104625
summary: "Add support for the `type` parameter, for sorting, to the Query API Key\
\ API"
area: Security
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/104753.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 104753
summary: Upgrade to Lucene 9.9.2
area: Search
type: upgrade
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/104832.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 104832
summary: Limit concurrent shards per node for ESQL
area: ES|QL
type: bug
issues:
- 103666
6 changes: 6 additions & 0 deletions docs/changelog/104891.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 104891
summary: "ESQL: Fix `SearchStats#count(String)` to count values not rows"
area: ES|QL
type: bug
issues:
- 104795
5 changes: 5 additions & 0 deletions docs/changelog/104904.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 104904
summary: Improve `CANNOT_REBALANCE_CAN_ALLOCATE` explanation
area: Allocation
type: bug
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/104909.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 104909
summary: "[Connectors API] Relax strict response parsing for get/list operations"
area: Application
type: enhancement
issues: []
7 changes: 7 additions & 0 deletions docs/changelog/104911.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pr: 104911
summary: "ES|QL: Improve type validation in aggs for UNSIGNED_LONG better support\
\ for VERSION"
area: ES|QL
type: bug
issues:
- 102961
5 changes: 5 additions & 0 deletions docs/changelog/104927.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 104927
summary: Adding `ActionRequestLazyBuilder` implementation of `RequestBuilder`
area: Ingest Node
type: enhancement
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/99961.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 99961
summary: "ESQL: Correct out-of-range filter pushdowns"
area: ES|QL
type: bug
issues:
- 99960
19 changes: 17 additions & 2 deletions docs/reference/esql/functions/avg.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ AVG(expression)
----

`expression`::
Numeric expression. If `null`, the function returns `null`.
Numeric expression.
//If `null`, the function returns `null`.
// TODO: Remove comment when https://github.com/elastic/elasticsearch/issues/104900 is fixed.

*Description*

Expand All @@ -20,7 +22,7 @@ The average of a numeric expression.

The result is always a `double` no matter the input type.

*Example*
*Examples*

[source.merge.styled,esql]
----
Expand All @@ -30,3 +32,16 @@ include::{esql-specs}/stats.csv-spec[tag=avg]
|===
include::{esql-specs}/stats.csv-spec[tag=avg-result]
|===

The expression can use inline functions. For example, to calculate the average
over a multivalued column, first use `MV_AVG` to average the multiple values per
row, and use the result with the `AVG` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
|===
64 changes: 38 additions & 26 deletions docs/reference/esql/functions/count-distinct.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
COUNT_DISTINCT(column[, precision_threshold])
COUNT_DISTINCT(expression[, precision_threshold])
----

*Parameters*

`column`::
Column for which to count the number of distinct values.
`expression`::
Expression that outputs the values on which to perform a distinct count.

`precision_threshold`::
Precision threshold. Refer to <<esql-agg-count-distinct-approximate>>. The
Expand All @@ -23,29 +23,6 @@ same effect as a threshold of 40000. The default value is 3000.

Returns the approximate number of distinct values.

[discrete]
[[esql-agg-count-distinct-approximate]]
==== Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.

This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:

include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]

The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The precision_threshold options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000, thresholds above this number will have the
same effect as a threshold of 40000. The default value is `3000`.

*Supported types*

Can take any field type as input.
Expand All @@ -71,3 +48,38 @@ include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
|===

The expression can use inline functions. This example splits a string into
multiple values using the `SPLIT` function and counts the unique values:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result]
|===

[discrete]
[[esql-agg-count-distinct-approximate]]
==== Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.

This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:

include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]

The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The precision_threshold options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000, thresholds above this number will have the
same effect as a threshold of 40000. The default value is `3000`.
20 changes: 16 additions & 4 deletions docs/reference/esql/functions/count.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@

[source,esql]
----
COUNT([input])
COUNT([expression])
----

*Parameters*

`input`::
Column or literal for which to count the number of values. If omitted, returns a
count all (the number of rows).
`expression`::
Expression that outputs values to be counted.
If omitted, equivalent to `COUNT(*)` (the number of rows).

*Description*

Expand Down Expand Up @@ -44,3 +44,15 @@ include::{esql-specs}/docs.csv-spec[tag=countAll]
|===
include::{esql-specs}/docs.csv-spec[tag=countAll-result]
|===

The expression can use inline functions. This example splits a string into
multiple values using the `SPLIT` function and counts the values:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression-result]
|===
21 changes: 17 additions & 4 deletions docs/reference/esql/functions/max.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@

[source,esql]
----
MAX(column)
MAX(expression)
----

*Parameters*

`column`::
Column from which to return the maximum value.
`expression`::
Expression from which to return the maximum value.

*Description*

Returns the maximum value of a numeric column.
Returns the maximum value of a numeric expression.

*Example*

Expand All @@ -28,3 +28,16 @@ include::{esql-specs}/stats.csv-spec[tag=max]
|===
include::{esql-specs}/stats.csv-spec[tag=max-result]
|===

The expression can use inline functions. For example, to calculate the maximum
over an average of a multivalued column, use `MV_AVG` to first average the
multiple values per row, and use the result with the `MAX` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression-result]
|===
20 changes: 17 additions & 3 deletions docs/reference/esql/functions/median-absolute-deviation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
MEDIAN_ABSOLUTE_DEVIATION(column)
MEDIAN_ABSOLUTE_DEVIATION(expression)
----

*Parameters*

`column`::
Column from which to return the median absolute deviation.
`expression`::
Expression from which to return the median absolute deviation.

*Description*

Expand Down Expand Up @@ -44,3 +44,17 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation-result]
|===

The expression can use inline functions. For example, to calculate the the
median absolute deviation of the maximum values of a multivalued column, first
use `MV_MAX` to get the maximum value per row, and use the result with the
`MEDIAN_ABSOLUTE_DEVIATION` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression-result]
|===
Loading

0 comments on commit 47ca7ae

Please sign in to comment.