From e7dfb1c9e168e560b787026aff07f98244f937e2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 12 Feb 2025 10:21:47 +0100 Subject: [PATCH] [E&A] Refines aggregations. --- explore-analyze/aggregations.md | 31 +++------ ...-data-with-aggregations-using-query-dsl.md | 65 ++++--------------- 2 files changed, 21 insertions(+), 75 deletions(-) diff --git a/explore-analyze/aggregations.md b/explore-analyze/aggregations.md index 92d7dbb64a..fd03ee9362 100644 --- a/explore-analyze/aggregations.md +++ b/explore-analyze/aggregations.md @@ -18,8 +18,7 @@ An aggregation summarizes your data as metrics, statistics, or other analytics. * [Bucket](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html) aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria. * [Pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html) aggregations that take input from other aggregations instead of documents or fields. - -## Run an aggregation [run-an-agg] +## Run an aggregation [run-an-agg] You can run aggregations as part of a [search](../solutions/search/querying-for-search.md) by specifying the [search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html)'s `aggs` parameter. The following search runs a [terms aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) on `my-field`: @@ -68,9 +67,7 @@ Aggregation results are in the response’s `aggregations` object: 1. Results for the `my-agg-name` aggregation. - - -## Change an aggregation’s scope [change-agg-scope] +## Change an aggregation’s scope [change-agg-scope] Use the `query` parameter to limit the documents on which an aggregation runs: @@ -95,8 +92,7 @@ GET /my-index-000001/_search } ``` - -## Return only aggregation results [return-only-agg-results] +## Return only aggregation results [return-only-agg-results] By default, searches containing an aggregation return both search hits and aggregation results. To return only aggregation results, set `size` to `0`: @@ -114,7 +110,6 @@ GET /my-index-000001/_search } ``` - ## Run multiple aggregations [run-multiple-aggs] You can specify multiple aggregations in the same request: @@ -137,8 +132,7 @@ GET /my-index-000001/_search } ``` - -## Run sub-aggregations [run-sub-aggs] +## Run sub-aggregations [run-sub-aggs] Bucket aggregations support bucket or metric sub-aggregations. For example, a terms aggregation with an [avg](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html) sub-aggregation calculates an average value for each bucket of documents. There is no level or depth limit for nesting sub-aggregations. @@ -188,8 +182,6 @@ The response nests sub-aggregation results under their parent aggregation: 1. Results for the parent aggregation, `my-agg-name`. 2. Results for `my-agg-name`'s sub-aggregation, `my-sub-agg-name`. - - ## Add custom metadata [add-metadata-to-an-agg] Use the `meta` object to associate custom metadata with an aggregation: @@ -228,8 +220,7 @@ The response returns the `meta` object in place: } ``` - -## Return the aggregation type [return-agg-type] +## Return the aggregation type [return-agg-type] By default, aggregation results include the aggregation’s name but not its type. To return the aggregation type, use the `typed_keys` query parameter. @@ -249,11 +240,10 @@ GET /my-index-000001/_search?typed_keys The response returns the aggregation type as a prefix to the aggregation’s name. -::::{important} +::::{important} Some aggregations return a different aggregation type from the type in the request. For example, the terms, [significant terms](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html), and [percentiles](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html) aggregations return different aggregations types depending on the data type of the aggregated field. :::: - ```console-result { ... @@ -267,8 +257,6 @@ Some aggregations return a different aggregation type from the type in the reque 1. The aggregation type, `histogram`, followed by a `#` separator and the aggregation’s name, `my-agg-name`. - - ## Use scripts in an aggregation [use-scripts-in-an-agg] When a field doesn’t exactly match the aggregation you need, you should aggregate on a [runtime field](../manage-data/data-store/mapping/runtime-fields.md): @@ -295,15 +283,12 @@ GET /my-index-000001/_search?size=0 Scripts calculate field values dynamically, which adds a little overhead to the aggregation. In addition to the time spent calculating, some aggregations like [`terms`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) and [`filters`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html) can’t use some of their optimizations with runtime fields. In total, performance costs for using a runtime field varies from aggregation to aggregation. - -## Aggregation caches [agg-caches] +## Aggregation caches [agg-caches] For faster responses, {{es}} caches the results of frequently run aggregations in the [shard request cache](https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html). To get cached results, use the same [`preference` string](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-shard-routing.html#shard-and-node-preference) for each search. If you don’t need search hits, [set `size` to `0`](#return-only-agg-results) to avoid filling the cache. {{es}} routes searches with the same preference string to the same shards. If the shards' data doesn’t change between searches, the shards return cached aggregation results. - -## Limits for `long` values [limits-for-long-values] +## Limits for `long` values [limits-for-long-values] When running aggregations, {{es}} uses [`double`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) values to hold and represent numeric data. As a result, aggregations on [`long`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) numbers greater than `253` are approximate. - diff --git a/explore-analyze/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md b/explore-analyze/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md index 71b76a69b7..a0e068a5c7 100644 --- a/explore-analyze/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md +++ b/explore-analyze/aggregations/tutorial-analyze-ecommerce-data-with-aggregations-using-query-dsl.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/aggregations-tutorial.html --- - - # Tutorial: Analyze eCommerce data with aggregations using Query DSL [aggregations-tutorial] - This hands-on tutorial shows you how to analyze eCommerce data using {{es}} [aggregations](../aggregations.md) with the `_search` API and Query DSL. You’ll learn how to: @@ -18,7 +15,6 @@ You’ll learn how to: * Compare performance across product categories * Track moving averages and cumulative totals - ## Requirements [aggregations-tutorial-requirements] You’ll need: @@ -39,8 +35,6 @@ You’ll need: * Select the **Other sample data sets** collapsible. * Add the **Sample eCommerce orders** data set. This will create and populate an index called `kibana_sample_data_ecommerce`. - - ## Inspect index structure [aggregations-tutorial-inspect-data] Before we start analyzing the data, let’s examine the structure of the documents in our sample eCommerce index. Run this command to see the field [mappings](../../manage-data/data-store/index-basics.md#elasticsearch-intro-documents-fields-mappings): @@ -52,6 +46,7 @@ GET kibana_sample_data_ecommerce/_mapping The response shows the field mappings for the `kibana_sample_data_ecommerce` index. ::::{dropdown} Example response + ```console-response { "kibana_sample_data_ecommerce": { @@ -268,34 +263,28 @@ The response shows the field mappings for the `kibana_sample_data_ecommerce` ind 3. `geoip.location`: Geographic coordinates stored as geo_point for location-based queries 4. `products.properties`: Nested structure containing details about items in each order - :::: - The sample data includes the following [field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html): * [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) and [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) for text fields - - * Most `text` fields have a `.keyword` subfield for exact matching using [multi-fields](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) + * Most `text` fields have a `.keyword` subfield for exact matching using [multi-fields](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) * [`date`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) for date fields * 3 [numeric](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) types: - - * `integer` for whole numbers - * `long` for large whole numbers - * `half_float` for floating-point numbers + * `integer` for whole numbers + * `long` for large whole numbers + * `half_float` for floating-point numbers * [`geo_point`](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html) for geographic coordinates * [`object`](https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html) for nested structures such as `products`, `geoip`, `event` Now that we understand the structure of our sample data, let’s start analyzing it. - ## Get key business metrics [aggregations-tutorial-basic-metrics] Let’s start by calculating important metrics about orders and customers. - ### Get average order size [aggregations-tutorial-order-value] Calculate the average order value across all orders in the dataset using the [`avg`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html) aggregation. @@ -318,8 +307,8 @@ GET kibana_sample_data_ecommerce/_search 2. A meaningful name that describes what this metric represents 3. Configures an `avg` aggregation, which calculates a simple arithmetic mean - ::::{dropdown} Example response + ```console-result { "took": 0, @@ -351,11 +340,8 @@ GET kibana_sample_data_ecommerce/_search 3. Results appear under the name we specified in the request 4. The average order value is calculated dynamically from all the orders in the dataset - :::: - - ### Get multiple order statistics at once [aggregations-tutorial-order-stats] Calculate multiple statistics about orders in one request using the [`stats`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-stats-aggregation.html) aggregation. @@ -377,8 +363,8 @@ GET kibana_sample_data_ecommerce/_search 1. A descriptive name for this set of statistics 2. `stats` returns count, min, max, avg, and sum at once - ::::{dropdown} Example response + ```console-result { "aggregations": { @@ -399,22 +385,17 @@ GET kibana_sample_data_ecommerce/_search 4. `"avg"`: Average value per order across all orders 5. `"sum"`: Total revenue from all orders combined - :::: - ::::{tip} The [stats aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-stats-aggregation.html) is more efficient than running individual min, max, avg, and sum aggregations. :::: - - ## Analyze sales patterns [aggregations-tutorial-sales-patterns] Let’s group orders in different ways to understand sales patterns. - ### Break down sales by category [aggregations-tutorial-category-breakdown] Group orders by category to see which product categories are most popular, using the [`terms`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) aggregation. @@ -441,8 +422,8 @@ GET kibana_sample_data_ecommerce/_search 4. Limit to top 5 categories 5. Order by number of orders (descending) - ::::{dropdown} Example response + ```console-result { "took": 4, @@ -498,11 +479,8 @@ GET kibana_sample_data_ecommerce/_search 4. Category name. 5. Number of orders in this category. - :::: - - ### Track daily sales patterns [aggregations-tutorial-daily-sales] Group orders by day to track daily sales patterns using the [`date_histogram`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html) aggregation. @@ -530,8 +508,8 @@ GET kibana_sample_data_ecommerce/_search 4. Formats dates in response using [date patterns](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html) (e.g. "yyyy-MM-dd"). Refer to [date math expressions](https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math) for additional options. 5. When `min_doc_count` is 0, returns buckets for days with no orders, useful for continuous time series visualization. - ::::{dropdown} Example response + ```console-result { "took": 2, @@ -720,16 +698,12 @@ GET kibana_sample_data_ecommerce/_search 4. `key` is the same date represented as the Unix timestamp for this bucket 5. `doc_count` counts the number of documents that fall into this time bucket - :::: - - ## Combine metrics with groupings [aggregations-tutorial-combined-analysis] Now let’s calculate [metrics](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html) within each group to get deeper insights. - ### Compare category performance [aggregations-tutorial-category-metrics] Calculate metrics within each category to compare performance across categories. @@ -773,8 +747,8 @@ GET kibana_sample_data_ecommerce/_search 4. Average order value in the category 5. Total number of items sold - ::::{dropdown} Example response + ```console-result { "aggregations": { @@ -810,11 +784,8 @@ GET kibana_sample_data_ecommerce/_search 4. Average order value for this category 5. Total quantity of items sold - :::: - - ### Analyze daily sales performance [aggregations-tutorial-daily-metrics] Let’s combine metrics to track daily trends: daily revenue, unique customers, and average basket size. @@ -856,8 +827,8 @@ GET kibana_sample_data_ecommerce/_search 2. Uses the [`cardinality`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html) aggregation to count unique customers per day 3. Average number of items per order - ::::{dropdown} Example response + ```console-result { "took": 119, @@ -1321,13 +1292,10 @@ GET kibana_sample_data_ecommerce/_search :::: - - ## Track trends and patterns [aggregations-tutorial-trends] You can use [pipeline aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html) on the results of other aggregations. Let’s analyze how metrics change over time. - ### Smooth out daily fluctuations [aggregations-tutorial-moving-average] Moving averages help identify trends by reducing day-to-day noise in the data. Let’s observe sales trends more clearly by smoothing daily revenue variations, using the [Moving Function](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-movfn-aggregation.html) aggregation. @@ -1368,8 +1336,8 @@ GET kibana_sample_data_ecommerce/_search 5. Use a 3-day window — use different window sizes to see trends at different time scales. 6. Use the built-in unweighted average function in the `moving_fn` aggregation. - ::::{dropdown} Example response + ```console-result { "took": 13, @@ -1744,17 +1712,13 @@ GET kibana_sample_data_ecommerce/_search 4. First day has no smoothed value as it needs previous days for the calculation 5. Moving average starts from second day, using a 3-day window - :::: - ::::{tip} Notice how the smoothed values lag behind the actual values - this is because they need previous days' data to calculate. The first day will always be null when using moving averages. :::: - - ### Track running totals [aggregations-tutorial-cumulative] Track running totals over time using the [`cumulative_sum`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-cumulative-sum-aggregation.html) aggregation. @@ -1790,8 +1754,8 @@ GET kibana_sample_data_ecommerce/_search 2. `cumulative_sum` adds up values across buckets 3. Reference the revenue we want to accumulate - ::::{dropdown} Example response + ```console-result { "took": 4, @@ -2166,11 +2130,8 @@ GET kibana_sample_data_ecommerce/_search 4. `revenue`: Daily revenue for this date 5. `cumulative_revenue`: Running total of revenue up to this date - :::: - - ## Next steps [aggregations-tutorial-next-steps] Refer to the [aggregations reference](../aggregations.md) for more details on all available aggregation types.