Add docs for cross cluster search in ES|QL(#105934) (#106093)

This change adds a documentation for cross cluster search in ES|QL. Relates #102954 Closes #105529
elastic · Mar 7, 2024 · cdc59db · cdc59db
1 parent 71a6b5e
commit cdc59db
Show file tree

Hide file tree

Showing 6 changed files with 254 additions and 2 deletions.
diff --git a/docs/reference/esql/esql-across-clusters.asciidoc b/docs/reference/esql/esql-across-clusters.asciidoc
@@ -0,0 +1,224 @@
+[[esql-cross-clusters]]
+=== Using {esql} across clusters
+
+++++
+<titleabbrev>Using {esql} across clusters</titleabbrev>
+++++
+
+[partintro]
+
+preview::["{ccs-cap} for {esql} is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."]
+
+With {esql}, you can execute a single query across multiple clusters.
+
+==== Prerequisites
+
+include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-prereqs]
+
+include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-gateway-seed-nodes]
+
+include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-proxy-mode]
+
+[discrete]
+[[ccq-remote-cluster-setup]]
+==== Remote cluster setup
+include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-remote-cluster-setup]
+
+<1> Since `skip_unavailable` was not set on `cluster_three`, it uses
+the default of `false`. See the <<ccq-skip-unavailable-clusters>>
+section for details.
+
+[discrete]
+[[ccq-from]]
+==== Query across multiple clusters
+
+In the `FROM` command, specify data streams and indices on remote clusters
+using the format `<remote_cluster_name>:<target>`. For instance, the following
+{esql} request queries the `my-index-000001` index on a single remote cluster
+named `cluster_one`:
+
+[source,esql]
+----
+FROM cluster_one:my-index-000001
+| LIMIT 10
+----
+
+Similarly, this {esql} request queries the `my-index-000001` index from
+three clusters:
+
+* The local ("querying") cluster
+* Two remote clusters, `cluster_one` and `cluster_two`
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
+| LIMIT 10
+----
+
+Likewise, this {esql} request queries the `my-index-000001` index from all
+remote clusters (`cluster_one`, `cluster_two`, and `cluster_three`):
+
+[source,esql]
+----
+FROM *:my-index-000001
+| LIMIT 10
+----
+
+[discrete]
+[[ccq-enrich]]
+==== Enrich across clusters
+
+Enrich in {esql} across clusters operates similarly to <<esql-enrich,local enrich>>.
+If the enrich policy and its enrich indices are consistent across all clusters, simply
+write the enrich command as you would without remote clusters. In this default mode,
+{esql} can execute the enrich command on either the querying cluster or the fulfilling
+clusters, aiming to minimize computation or inter-cluster data transfer. Ensuring that
+the policy exists with consistent data on both the querying cluster and the fulfilling
+clusters is critical for ES|QL to produce a consistent query result.
+
+In the following example, the enrich with `hosts` policy can be executed on
+either the querying cluster or the remote cluster `cluster_one`.
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001
+| ENRICH hosts ON ip
+| LIMIT 10
+----
+
+Enrich with an {esql} query against remote clusters only can also happen on
+the querying cluster. This means the below query requires the `hosts` enrich
+policy to exist on the querying cluster as well.
+
+[source,esql]
+----
+FROM cluster_one:my-index-000001,cluster_two:my-index-000001
+| LIMIT 10
+| ENRICH hosts ON ip
+----
+
+[discrete]
+[[esql-enrich-coordinator]]
+==== Enrich with coordinator mode
+
+{esql} provides the enrich `_coordinator` mode to force {esql} to execute the enrich
+command on the querying cluster. This mode should be used when the enrich policy is
+not available on the remote clusters or maintaining consistency of enrich indices
+across clusters is challenging.
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001
+| ENRICH _coordinator:hosts ON ip
+| SORT host_name
+| LIMIT 10
+----
+
+[discrete]
+[IMPORTANT]
+====
+Enrich with the `_coordinator` mode usually increases inter-cluster data transfer and
+workload on the querying cluster.
+====
+
+[discrete]
+[[esql-enrich-remote]]
+==== Enrich with remote mode
+
+{esql} also provides the enrich `_remote` mode to force {esql} to execute the enrich
+command independently on each fulfilling cluster where the target indices reside.
+This mode is useful for managing different enrich data on each cluster, such as detailed
+information of hosts for each region where the target (main) indices contain
+log events from these hosts.
+
+In the below example, the `hosts` enrich policy is required to exist on all
+fulfilling clusters: the `querying` cluster (as local indices are included),
+the remote cluster `cluster_one`, and `cluster_two`.
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
+| ENRICH _remote:hosts ON ip
+| SORT host_name
+| LIMIT 10
+----
+
+A `_remote` enrich cannot be executed after a <<esql-stats-by,stats>>
+command. The following example would result in an error:
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
+| STATS COUNT(*) BY ip
+| ENRICH _remote:hosts ON ip
+| SORT host_name
+| LIMIT 10
+----
+
+[discrete]
+[[esql-multi-enrich]]
+==== Multiple enrich commands
+
+You can include multiple enrich commands in the same query with different
+modes. {esql} will attempt to execute them accordingly. For example, this
+query performs two enriches, first with the `hosts` policy on any cluster
+and then with the `vendors` policy on the querying cluster.
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
+| ENRICH hosts ON ip
+| ENRICH _coordinator:vendors ON os
+| LIMIT 10
+----
+
+A `_remote` enrich command can't be executed after a `_coordinator` enrich
+command. The following example would result in an error.
+
+[source,esql]
+----
+FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index-000001
+| ENRICH _coordinator:hosts ON ip
+| ENRICH _remote:vendors ON os
+| LIMIT 10
+----
+
+[discrete]
+[[ccq-exclude]]
+==== Excluding clusters or indices from {esql} query
+
+To exclude an entire cluster, prefix the cluster alias with a minus sign in
+the `FROM` command, for example: `-my_cluster:*`:
+
+[source,esql]
+----
+FROM my-index-000001,cluster*:my-index-000001,-cluster_three:*
+| LIMIT 10
+----
+
+To exclude a specific remote index, prefix the index with a minus sign in
+the `FROM` command, such as `my_cluster:-my_index`:
+
+[source,esql]
+----
+FROM my-index-000001,cluster*:my-index-*,cluster_three:-my-index-000001
+| LIMIT 10
+----
+
+[discrete]
+[[ccq-skip-unavailable-clusters]]
+==== Optional remote clusters
+
+{ccs-cap} for {esql} currently does not respect the `skip_unavailable`
+setting. As a result, if a remote cluster specified in the request is
+unavailable or failed, {ccs} for {esql} queries will fail regardless of the setting.
+
+We are actively working to align the behavior of {ccs} for {esql} with other
+{ccs} APIs. This includes providing detailed execution information for each cluster
+in the response, such as execution time, selected target indices, and shards.
+
+[discrete]
+[[ccq-during-upgrade]]
+==== Query across clusters during an upgrade
+
+include::{es-repo-dir}/search/search-your-data/search-across-clusters.asciidoc[tag=ccs-during-upgrade]
diff --git a/docs/reference/esql/esql-using.asciidoc b/docs/reference/esql/esql-using.asciidoc
@@ -12,10 +12,14 @@ and set up alerts.
 Using {esql} in {elastic-sec} to investigate events in Timeline, create
 detection rules, and build {esql} queries using Elastic AI Assistant.
 
+<<esql-cross-clusters>>::
+Using {esql} to query across multiple clusters.
+
 <<esql-task-management>>::
 Using the <<tasks,task management API>> to list and cancel {esql} queries.
 
 include::esql-rest.asciidoc[]
 include::esql-kibana.asciidoc[]
 include::esql-security-solution.asciidoc[]
+include::esql-across-clusters.asciidoc[]
 include::task-management.asciidoc[]
diff --git a/docs/reference/esql/index.asciidoc b/docs/reference/esql/index.asciidoc
@@ -56,7 +56,7 @@ GROK>> and <<esql-enrich-data,data enrichment with ENRICH>>.
 
 <<esql-using>>::
 An overview of using the <<esql-rest>>, <<esql-kibana>>,
-<<esql-elastic-security>>, and <<esql-task-management>>.
+<<esql-elastic-security>>, <<esql-cross-clusters>>, and <<esql-task-management>>.
 
 <<esql-limitations>>::
 The current limitations of {esql}.

diff --git a/docs/reference/esql/processing-commands/enrich.asciidoc b/docs/reference/esql/processing-commands/enrich.asciidoc
@@ -15,6 +15,10 @@ ENRICH policy [ON match_field] [WITH [new_name1 = ]field1, [new_name2 = ]field2,
 The name of the enrich policy. You need to <<esql-set-up-enrich-policy,create>>
 and <<esql-execute-enrich-policy,execute>> the enrich policy first.
 
+`mode`::
+The mode of the enrich command in cross cluster {esql}.
+See <<ccq-enrich, enrich across clusters>>.
+
 `match_field`::
 The match field. `ENRICH` uses its value to look for records in the enrich
 index. If not specified, the match will be performed on the column with the same

diff --git a/docs/reference/esql/source-commands/from.asciidoc b/docs/reference/esql/source-commands/from.asciidoc
@@ -66,6 +66,16 @@ or aliases:
 FROM employees-00001,other-employees-*
 ----
 
+Use the format `<remote_cluster_name>:<target>` to query data streams and indices
+on remote clusters:
+
+[source,esql]
+----
+FROM cluster_one:employees-00001,cluster_two:other-employees-*
+----
+
+See <<esql-cross-clusters, using {esql} across clusters>>.
+
 Use the optional `METADATA` directive to enable <<esql-metadata-fields,metadata fields>>:
 
 [source,esql]

diff --git a/docs/reference/search/search-your-data/search-across-clusters.asciidoc b/docs/reference/search/search-your-data/search-across-clusters.asciidoc
@@ -22,10 +22,11 @@ The following APIs support {ccs}:
 * experimental:[] <<eql-search-api,EQL search>>
 * experimental:[] <<sql-search-api,SQL search>>
 * experimental:[] <<search-vector-tile-api,Vector tile search>>
+* experimental:[] <<esql,ES|QL>>
 
 [discrete]
-[[ccs-prereqs]]
 === Prerequisites
+// tag::ccs-prereqs[]
 
 * {ccs-cap} requires remote clusters. To set up remote clusters on {ess},
 see link:{cloud}/ec-enable-ccs.html[configure remote clusters on {ess}]. If you
@@ -39,15 +40,19 @@ To ensure your remote cluster configuration supports {ccs}, see
 
 * The local coordinating node must have the
 <<remote-node,`remote_cluster_client`>> node role.
+// end::ccs-prereqs[]
 
 [[ccs-gateway-seed-nodes]]
+// tag::ccs-gateway-seed-nodes[]
 * If you use <<sniff-mode,sniff mode>>, the local coordinating node
 must be able to connect to seed and gateway nodes on the remote cluster.
 +
 We recommend using gateway nodes capable of serving as coordinating nodes.
 The seed nodes can be a subset of these gateway nodes.
+// end::ccs-gateway-seed-nodes[]
 
 [[ccs-proxy-mode]]
+// tag::ccs-proxy-mode[]
 * If you use <<proxy-mode,proxy mode>>, the local coordinating node must be able
 to connect to the configured `proxy_address`. The proxy at this address must be
 able to route connections to gateway and coordinating nodes on the remote
@@ -56,6 +61,7 @@ cluster.
 * {ccs-cap} requires different security privileges on the local cluster and
 remote cluster. See <<remote-clusters-privileges-ccs>> and
 <<remote-clusters>>.
+// end::ccs-proxy-mode[]
 
 [discrete]
 [[ccs-example]]
@@ -64,6 +70,7 @@ remote cluster. See <<remote-clusters-privileges-ccs>> and
 [discrete]
 [[ccs-remote-cluster-setup]]
 ==== Remote cluster setup
+// tag::ccs-remote-cluster-setup[]
 
 The following <<cluster-update-settings,cluster update settings>> API request
 adds three remote clusters: `cluster_one`, `cluster_two`, and `cluster_three`.
@@ -99,6 +106,7 @@ PUT _cluster/settings
 --------------------------------
 // TEST[setup:host]
 // TEST[s/35.238.149.\d+:930\d+/\${transport_host}/]
+// end::ccs-remote-cluster-setup[]
 
 <1> Since `skip_unavailable` was not set on `cluster_three`, it uses
 the default of `false`. See the <<skip-unavailable-clusters>>
@@ -1393,6 +1401,7 @@ cluster as the local cluster when running a {ccs}.
 [[ccs-during-upgrade]]
 ==== {ccs-cap} during an upgrade
 
+// tag::ccs-during-upgrade[]
 You can still search a remote cluster while performing a
 rolling upgrade on the local cluster. However, the local
 coordinating node's "upgrade from" and "upgrade to" version must be compatible
@@ -1403,3 +1412,4 @@ duration of an upgrade is not supported.
 
 For more information about upgrades, see
 {stack-ref}/upgrading-elasticsearch.html[Upgrading {es}].
+// end::ccs-during-upgrade[]