diff --git a/docs/en/stack/ml/anomaly-detection/anomaly-how-tos.asciidoc b/docs/en/stack/ml/anomaly-detection/anomaly-how-tos.asciidoc index a8d6ab6cd..783c4e688 100644 --- a/docs/en/stack/ml/anomaly-detection/anomaly-how-tos.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/anomaly-how-tos.asciidoc @@ -14,6 +14,7 @@ The guides in this section describe some best practices for generating useful * <> * <> * <> +* <> * <> * <> * <> diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-anomalies.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomalies.png new file mode 100644 index 000000000..8ff3c3996 Binary files /dev/null and b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomalies.png differ diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-anomaly.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomaly.png new file mode 100644 index 000000000..da367f8d5 Binary files /dev/null and b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomaly.png differ diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-wizard.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-wizard.png new file mode 100644 index 000000000..14e86d686 Binary files /dev/null and b/docs/en/stack/ml/anomaly-detection/images/ml-population-wizard.png differ diff --git a/docs/en/stack/ml/anomaly-detection/index.asciidoc b/docs/en/stack/ml/anomaly-detection/index.asciidoc index d3d2af798..6a24d3ba0 100644 --- a/docs/en/stack/ml/anomaly-detection/index.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/index.asciidoc @@ -38,6 +38,8 @@ include::{es-repo-dir}/ml/anomaly-detection/ml-configuring-detector-custom-rules include::ml-detect-categories.asciidoc[leveloffset=+2] +include::ml-population-analysis.asciidoc[leveloffset=+2] + include::ml-revert-model-snapshot.asciidoc[leveloffset=+2] include::geographic-anomalies.asciidoc[leveloffset=+2] diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-job-types.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-job-types.asciidoc index 35d109e74..4dc04db87 100644 --- a/docs/en/stack/ml/anomaly-detection/ml-ad-job-types.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/ml-ad-job-types.asciidoc @@ -82,7 +82,9 @@ event is anomalous if the request rate of an IP address is unusually high or low compared to the request rate of all IP addresses in the population. The population job builds a model of the typical number of requests for the IP addresses collectively and compares the behavior of each IP address against that -collective model to detect outliers. +collective model to detect outliers. + +Refer to <> to learn more. [discrete] @@ -114,6 +116,8 @@ job can detect anomalous behavior, such as an unusual number of events in a category by using the `count` function or messages that rarely occur by using the `rare` function. +Refer to <> to learn more. + [discrete] [[rare-jobs]] diff --git a/docs/en/stack/ml/anomaly-detection/ml-detect-categories.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-detect-categories.asciidoc index b929c9358..fbd0aefe3 100644 --- a/docs/en/stack/ml/anomaly-detection/ml-detect-categories.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/ml-detect-categories.asciidoc @@ -34,7 +34,7 @@ Avoid using human-generated data for categorization analysis. == Creating categorization jobs . In {kib}, navigate to **{ml-app} > Anomaly Detection > Jobs**. -. Click **Create {anomaly-jobs}**, select the {data-view} you want to analyze. +. Click **Create job**, select the {data-view} you want to analyze. . Select the **Categorization** wizard from the list. . Choose a categorization detector - it's the `count` function in this example - and the field you want to categorize - the `message` field in this example. + diff --git a/docs/en/stack/ml/anomaly-detection/ml-population-analysis.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-population-analysis.asciidoc new file mode 100644 index 000000000..6156f0728 --- /dev/null +++ b/docs/en/stack/ml/anomaly-detection/ml-population-analysis.asciidoc @@ -0,0 +1,96 @@ +[[ml-configuring-populations]] += Performing population analysis + +Population analysis is a method of detecting anomalies by comparing the behavior of entities or events within a specified population. +In this approach, {ml} analytics create a profile of what is considered "typical" behavior for users, machines, or other entities over a specified time period. +An entity is considered as anomalous when its behavior deviates from that of the population, indicating abnormal activity compared to the rest of the population. + +This type of analysis is most effective when the behavior within a group is generally homogeneous, allowing for the identification of unusual patterns. +However, it is less useful when members of the population show vastly different behaviors. +In such cases, you can segment your data into groups with similar behaviors and run separate jobs for each. +This can be done by using a query filter in the datafeed or by applying the `partition_field_name` to split the analysis across different groups. + +Population analysis is resource-efficient and scales well, enabling the analysis of populations consisting of hundreds of thousands or even millions of entities with a lower resource footprint than analyzing each series individually. + + + +[discrete] +[[population-recommendations]] +== Recommendations + +* Use population analysis when the behavior within a group is mostly homogeneous, as it helps identify anomalous patterns effectively. +* Leverage population analysis when dealing with large-scale datasets. +* Avoid using population analysis when members of the population exhibit vastly different behaviors, as it may not be effective. + + +[discrete] +[[creating-population-jobs]] +== Creating population jobs + +. In {kib}, navigate to **{ml-app} > Anomaly Detection > Jobs**. +. Click **Create job**, select the {data-source} you want to analyze. +. Select the **Population** wizard from the list. +. Choose a population field - it's the `clientip` field in this example - and the metric you want to use for the analysis - `Mean(bytes)` in this example. ++ +-- +[role="screenshot"] +image::images/ml-population-wizard.png[Creating a population job in Kibana] +-- +. Click **Next**. +. Provide a job ID and click **Next**. +. If the validation is successful, click **Next** to review the summary of the job creation. +. Click **Create job**. + +[%collapsible] +.API example +==== +To specify the population, use the `over_field_name` property. For example: + +[source,console] +---------------------------------- +PUT _ml/anomaly_detectors/population +{ + "description" : "Population analysis", + "analysis_config" : { + "bucket_span":"15m", + "influencers": [ + "clientip" + ], + "detectors": [ + { + "function": "mean", + "field_name": "bytes", + "over_field_name": "clientip" <1> + } + ] + }, + "data_description" : { + "time_field":"timestamp", + "time_format": "epoch_ms" + } +} +---------------------------------- +// TEST[skip:needs-licence] + +<1> This `over_field_name` property indicates that the metrics for each client (as identified by their IP address) are analyzed relative to other clients in each bucket. +==== + +[discrete] +[[population-job-results]] +=== Viewing the job results + +Use the **Anomaly Explorer** in {kib} to view the analysis results: + +[role="screenshot"] +image::images/ml-population-anomalies.png["Population results in the Anomaly Explorer"] + +The results are often quite sparse. +There might be just a few data points for the selected time period. +Population analysis is particularly useful when you have many entities and the data for specific entitles is sporadic or sparse. +If you click on a section in the timeline or swim lanes, you can see more details about the anomalies: + +[role="screenshot"] +image::images/ml-population-anomaly.png["Anomaly details for a specific user"] + +In this example, the client IP address `167.145.234.154` received a high volume of bytes on the date and time shown. +This event is anomalous because the mean is four times higher than the expected behavior of the population. \ No newline at end of file diff --git a/docs/en/stack/ml/redirects.asciidoc b/docs/en/stack/ml/redirects.asciidoc index ba4bc5bfc..6c63e76ca 100644 --- a/docs/en/stack/ml/redirects.asciidoc +++ b/docs/en/stack/ml/redirects.asciidoc @@ -13,16 +13,6 @@ This page has moved. See <>. This page has moved. See <>. -[role="exclude",id="ml-configuring-pop"] -=== Performing population analysis - -This page has been removed. Refer to <>. - -[role="exclude",id="ml-configuring-populations"] -=== Configuring population analysis - -This page has been removed. Refer to <>. - [role="exclude",id="ml-inference-models"] === Trained {ml} models as functions