Skip to content

Commit

Permalink
Add docs for cross cluster search in ES|QL(#105934) (#106093)
Browse files Browse the repository at this point in the history
This change adds a documentation for cross cluster search in ES|QL.

Relates #102954
Closes #105529
  • Loading branch information
dnhatn committed Mar 7, 2024
1 parent 71a6b5e commit cdc59db
Show file tree
Hide file tree
Showing 6 changed files with 254 additions and 2 deletions.
224 changes: 224 additions & 0 deletions docs/reference/esql/esql-across-clusters.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
[[esql-cross-clusters]]
=== Using {esql} across clusters

++++
<titleabbrev>Using {esql} across clusters</titleabbrev>
++++

[partintro]

preview::["{ccs-cap} for {esql} is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."]

With {esql}, you can execute a single query across multiple clusters.

==== Prerequisites

include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-prereqs]

include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-gateway-seed-nodes]

include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-proxy-mode]

[discrete]
[[ccq-remote-cluster-setup]]
==== Remote cluster setup
include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-remote-cluster-setup]

<1> Since `skip_unavailable` was not set on `cluster_three`, it uses
the default of `false`. See the <<ccq-skip-unavailable-clusters>>
section for details.

[discrete]
[[ccq-from]]
==== Query across multiple clusters

In the `FROM` command, specify data streams and indices on remote clusters
using the format `<remote_cluster_name>:<target>`. For instance, the following
{esql} request queries the `my-index-000001` index on a single remote cluster
named `cluster_one`:

[source,esql]
----
FROM cluster_one:my-index-000001
| LIMIT 10
----

Similarly, this {esql} request queries the `my-index-000001` index from
three clusters:

* The local ("querying") cluster
* Two remote clusters, `cluster_one` and `cluster_two`

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
| LIMIT 10
----

Likewise, this {esql} request queries the `my-index-000001` index from all
remote clusters (`cluster_one`, `cluster_two`, and `cluster_three`):

[source,esql]
----
FROM *:my-index-000001
| LIMIT 10
----

[discrete]
[[ccq-enrich]]
==== Enrich across clusters

Enrich in {esql} across clusters operates similarly to <<esql-enrich,local enrich>>.
If the enrich policy and its enrich indices are consistent across all clusters, simply
write the enrich command as you would without remote clusters. In this default mode,
{esql} can execute the enrich command on either the querying cluster or the fulfilling
clusters, aiming to minimize computation or inter-cluster data transfer. Ensuring that
the policy exists with consistent data on both the querying cluster and the fulfilling
clusters is critical for ES|QL to produce a consistent query result.

In the following example, the enrich with `hosts` policy can be executed on
either the querying cluster or the remote cluster `cluster_one`.

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001
| ENRICH hosts ON ip
| LIMIT 10
----

Enrich with an {esql} query against remote clusters only can also happen on
the querying cluster. This means the below query requires the `hosts` enrich
policy to exist on the querying cluster as well.

[source,esql]
----
FROM cluster_one:my-index-000001,cluster_two:my-index-000001
| LIMIT 10
| ENRICH hosts ON ip
----

[discrete]
[[esql-enrich-coordinator]]
==== Enrich with coordinator mode

{esql} provides the enrich `_coordinator` mode to force {esql} to execute the enrich
command on the querying cluster. This mode should be used when the enrich policy is
not available on the remote clusters or maintaining consistency of enrich indices
across clusters is challenging.

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001
| ENRICH _coordinator:hosts ON ip
| SORT host_name
| LIMIT 10
----

[discrete]
[IMPORTANT]
====
Enrich with the `_coordinator` mode usually increases inter-cluster data transfer and
workload on the querying cluster.
====

[discrete]
[[esql-enrich-remote]]
==== Enrich with remote mode

{esql} also provides the enrich `_remote` mode to force {esql} to execute the enrich
command independently on each fulfilling cluster where the target indices reside.
This mode is useful for managing different enrich data on each cluster, such as detailed
information of hosts for each region where the target (main) indices contain
log events from these hosts.

In the below example, the `hosts` enrich policy is required to exist on all
fulfilling clusters: the `querying` cluster (as local indices are included),
the remote cluster `cluster_one`, and `cluster_two`.

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
| ENRICH _remote:hosts ON ip
| SORT host_name
| LIMIT 10
----

A `_remote` enrich cannot be executed after a <<esql-stats-by,stats>>
command. The following example would result in an error:

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
| STATS COUNT(*) BY ip
| ENRICH _remote:hosts ON ip
| SORT host_name
| LIMIT 10
----

[discrete]
[[esql-multi-enrich]]
==== Multiple enrich commands

You can include multiple enrich commands in the same query with different
modes. {esql} will attempt to execute them accordingly. For example, this
query performs two enriches, first with the `hosts` policy on any cluster
and then with the `vendors` policy on the querying cluster.

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
| ENRICH hosts ON ip
| ENRICH _coordinator:vendors ON os
| LIMIT 10
----

A `_remote` enrich command can't be executed after a `_coordinator` enrich
command. The following example would result in an error.

[source,esql]
----
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
| ENRICH _coordinator:hosts ON ip
| ENRICH _remote:vendors ON os
| LIMIT 10
----

[discrete]
[[ccq-exclude]]
==== Excluding clusters or indices from {esql} query

To exclude an entire cluster, prefix the cluster alias with a minus sign in
the `FROM` command, for example: `-my_cluster:*`:

[source,esql]
----
FROM my-index-000001,cluster*:my-index-000001,-cluster_three:*
| LIMIT 10
----

To exclude a specific remote index, prefix the index with a minus sign in
the `FROM` command, such as `my_cluster:-my_index`:

[source,esql]
----
FROM my-index-000001,cluster*:my-index-*,cluster_three:-my-index-000001
| LIMIT 10
----

[discrete]
[[ccq-skip-unavailable-clusters]]
==== Optional remote clusters

{ccs-cap} for {esql} currently does not respect the `skip_unavailable`
setting. As a result, if a remote cluster specified in the request is
unavailable or failed, {ccs} for {esql} queries will fail regardless of the setting.

We are actively working to align the behavior of {ccs} for {esql} with other
{ccs} APIs. This includes providing detailed execution information for each cluster
in the response, such as execution time, selected target indices, and shards.

[discrete]
[[ccq-during-upgrade]]
==== Query across clusters during an upgrade

include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-during-upgrade]
4 changes: 4 additions & 0 deletions docs/reference/esql/esql-using.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@ and set up alerts.
Using {esql} in {elastic-sec} to investigate events in Timeline, create
detection rules, and build {esql} queries using Elastic AI Assistant.

<<esql-cross-clusters>>::
Using {esql} to query across multiple clusters.

<<esql-task-management>>::
Using the <<tasks,task management API>> to list and cancel {esql} queries.

include::esql-rest.asciidoc[]
include::esql-kibana.asciidoc[]
include::esql-security-solution.asciidoc[]
include::esql-across-clusters.asciidoc[]
include::task-management.asciidoc[]
2 changes: 1 addition & 1 deletion docs/reference/esql/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ GROK>> and <<esql-enrich-data,data enrichment with ENRICH>>.

<<esql-using>>::
An overview of using the <<esql-rest>>, <<esql-kibana>>,
<<esql-elastic-security>>, and <<esql-task-management>>.
<<esql-elastic-security>>, <<esql-cross-clusters>>, and <<esql-task-management>>.

<<esql-limitations>>::
The current limitations of {esql}.
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/esql/processing-commands/enrich.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ ENRICH policy [ON match_field] [WITH [new_name1 = ]field1, [new_name2 = ]field2,
The name of the enrich policy. You need to <<esql-set-up-enrich-policy,create>>
and <<esql-execute-enrich-policy,execute>> the enrich policy first.

`mode`::
The mode of the enrich command in cross cluster {esql}.
See <<ccq-enrich, enrich across clusters>>.

`match_field`::
The match field. `ENRICH` uses its value to look for records in the enrich
index. If not specified, the match will be performed on the column with the same
Expand Down
10 changes: 10 additions & 0 deletions docs/reference/esql/source-commands/from.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ or aliases:
FROM employees-00001,other-employees-*
----

Use the format `<remote_cluster_name>:<target>` to query data streams and indices
on remote clusters:

[source,esql]
----
FROM cluster_one:employees-00001,cluster_two:other-employees-*
----

See <<esql-cross-clusters, using {esql} across clusters>>.

Use the optional `METADATA` directive to enable <<esql-metadata-fields,metadata fields>>:

[source,esql]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@ The following APIs support {ccs}:
* experimental:[] <<eql-search-api,EQL search>>
* experimental:[] <<sql-search-api,SQL search>>
* experimental:[] <<search-vector-tile-api,Vector tile search>>
* experimental:[] <<esql,ES|QL>>

[discrete]
[[ccs-prereqs]]
=== Prerequisites
// tag::ccs-prereqs[]

* {ccs-cap} requires remote clusters. To set up remote clusters on {ess},
see link:{cloud}/ec-enable-ccs.html[configure remote clusters on {ess}]. If you
Expand All @@ -39,15 +40,19 @@ To ensure your remote cluster configuration supports {ccs}, see

* The local coordinating node must have the
<<remote-node,`remote_cluster_client`>> node role.
// end::ccs-prereqs[]

[[ccs-gateway-seed-nodes]]
// tag::ccs-gateway-seed-nodes[]
* If you use <<sniff-mode,sniff mode>>, the local coordinating node
must be able to connect to seed and gateway nodes on the remote cluster.
+
We recommend using gateway nodes capable of serving as coordinating nodes.
The seed nodes can be a subset of these gateway nodes.
// end::ccs-gateway-seed-nodes[]

[[ccs-proxy-mode]]
// tag::ccs-proxy-mode[]
* If you use <<proxy-mode,proxy mode>>, the local coordinating node must be able
to connect to the configured `proxy_address`. The proxy at this address must be
able to route connections to gateway and coordinating nodes on the remote
Expand All @@ -56,6 +61,7 @@ cluster.
* {ccs-cap} requires different security privileges on the local cluster and
remote cluster. See <<remote-clusters-privileges-ccs>> and
<<remote-clusters>>.
// end::ccs-proxy-mode[]

[discrete]
[[ccs-example]]
Expand All @@ -64,6 +70,7 @@ remote cluster. See <<remote-clusters-privileges-ccs>> and
[discrete]
[[ccs-remote-cluster-setup]]
==== Remote cluster setup
// tag::ccs-remote-cluster-setup[]

The following <<cluster-update-settings,cluster update settings>> API request
adds three remote clusters: `cluster_one`, `cluster_two`, and `cluster_three`.
Expand Down Expand Up @@ -99,6 +106,7 @@ PUT _cluster/settings
--------------------------------
// TEST[setup:host]
// TEST[s/35.238.149.\d+:930\d+/\${transport_host}/]
// end::ccs-remote-cluster-setup[]

<1> Since `skip_unavailable` was not set on `cluster_three`, it uses
the default of `false`. See the <<skip-unavailable-clusters>>
Expand Down Expand Up @@ -1393,6 +1401,7 @@ cluster as the local cluster when running a {ccs}.
[[ccs-during-upgrade]]
==== {ccs-cap} during an upgrade

// tag::ccs-during-upgrade[]
You can still search a remote cluster while performing a
rolling upgrade on the local cluster. However, the local
coordinating node's "upgrade from" and "upgrade to" version must be compatible
Expand All @@ -1403,3 +1412,4 @@ duration of an upgrade is not supported.

For more information about upgrades, see
{stack-ref}/upgrading-elasticsearch.html[Upgrading {es}].
// end::ccs-during-upgrade[]

0 comments on commit cdc59db

Please sign in to comment.