Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Updates ML/anomaly detection terms in the Kibana guide #41965

Merged
merged 6 commits into from
Jul 30, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/ml/creating-df-kib.asciidoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[role="xpack"]
[[creating-df-kib]]
== Creating {dataframe-transforms}

Expand Down
14 changes: 7 additions & 7 deletions docs/ml/creating-jobs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[role="xpack"]
[[ml-jobs]]
== Creating machine learning jobs
== Creating {anomaly-jobs}

Machine learning jobs contain the configuration information and metadata
{anomaly-jobs-cap} contain the configuration information and metadata
necessary to perform an analytics task.

{kib} provides the following wizards to make it easier to create jobs:
Expand Down Expand Up @@ -33,7 +33,7 @@ appears:
[role="screenshot"]
image::ml/images/ml-data-recognizer-sample.jpg[A screenshot of the {kib} sample data web log job creation wizard]

TIP: Alternatively, after you load a sample data set on the {kib} home page, you can click *View data* > *ML jobs*. There are {ml} jobs for both the sample eCommerce orders data set and the sample web logs data set.
TIP: Alternatively, after you load a sample data set on the {kib} home page, you can click *View data* > *ML jobs*. There are {anomaly-jobs} for both the sample eCommerce orders data set and the sample web logs data set.

If you use {filebeat-ref}/index.html[{filebeat}]
to ship access logs from your
Expand All @@ -57,17 +57,17 @@ wizards appear:
[role="screenshot"]
image::ml/images/ml-data-recognizer-metricbeat.jpg[A screenshot of the {metricbeat} job creation wizards]

These wizards create {ml} jobs, dashboards, searches, and visualizations that
are customized to help you analyze your {auditbeat}, {filebeat}, and
These wizards create {anomaly-jobs}, dashboards, searches, and visualizations
that are customized to help you analyze your {auditbeat}, {filebeat}, and
{metricbeat} data.

[NOTE]
===============================
If your data is located outside of {es}, you cannot use {kib} to create
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
Machine learning analysis is still possible, however, by using APIs to
{anomal-detect-cap} is still possible, however, by using APIs to
create and manage jobs and post data to them. For more information, see
{ref}/ml-apis.html[Machine Learning APIs].
{ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].
===============================

////
Expand Down
47 changes: 24 additions & 23 deletions docs/ml/index.asciidoc
Original file line number Diff line number Diff line change
@@ -1,35 +1,36 @@
[role="xpack"]
[[xpack-ml]]
= Machine Learning
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
= {ml-cap}

[partintro]
--

As datasets increase in size and complexity, the human effort required to
inspect dashboards or maintain rules for spotting infrastructure problems,
cyber attacks, or business issues becomes impractical. The Elastic {ml-features}
automatically model the normal behavior of your time series data — learning
trends, periodicity, and more — in real time to identify anomalies, streamline
root cause analysis, and reduce false positives.
cyber attacks, or business issues becomes impractical. The Elastic {ml}
{anomaly-detect} feature automatically models the normal behavior of your time
series data — learning trends, periodicity, and more — in real time to identify
anomalies, streamline root cause analysis, and reduce false positives.

The {ml-features} run in and scale with {es}, and include an
intuitive UI on the {kib} *Machine Learning* page for creating anomaly detection
jobs and understanding results.
{anomaly-detect-cap} runs in and scales with {es}, and includes an
intuitive UI on the {kib} *Machine Learning* page for creating {anomaly-jobs}
and understanding results.

If you have a basic license, you can use the *Data Visualizer* to learn more
about your data. In particular, if your data is stored in {es} and contains a
time field, you can use the *Data Visualizer* to identify possible fields for
{ml} analysis:
{anomaly-detect}:

[role="screenshot"]
image::ml/images/ml-data-visualizer-sample.jpg[Data Visualizer for sample flight data]

experimental[] You can also upload a CSV, NDJSON, or log file (up to 100 MB in size).
The {ml-features} identify the file format and field mappings. You can then
optionally import that data into an {es} index.
experimental[] You can also upload a CSV, NDJSON, or log file (up to 100 MB in
size). The *Data Visualizer* identifies the file format and field mappings. You
can then optionally import that data into an {es} index.

If you have a trial or platinum license, you can <<ml-jobs,create {ml} jobs>>
and manage jobs and {dfeeds} from the *Job Management* pane:
If you have a trial or platinum license, you can
<<ml-jobs,create {anomaly-jobs}>> and manage jobs and {dfeeds} from the *Job
Management* pane:

[role="screenshot"]
image::ml/images/ml-job-management.jpg[Job Management]
Expand All @@ -42,7 +43,7 @@ You can use the *Settings* pane to create and edit
image::ml/images/ml-settings.jpg[Calendar Management]

The *Anomaly Explorer* and *Single Metric Viewer* display the results of your
{ml} jobs. For example:
{anomaly-jobs}. For example:

[role="screenshot"]
image::ml/images/ml-single-metric-viewer.jpg[Single Metric Viewer]
Expand All @@ -56,17 +57,17 @@ occurring in your operational environment at that time:
image::ml/images/ml-annotations-list.jpg[Single Metric Viewer with annotations]

In some circumstances, annotations are also added automatically. For example, if
the {ml} analytics detect that there is missing data, it annotates the affected
the {anomaly-job} detects that there is missing data, it annotates the affected
time period. For more information, see
{stack-ov}/ml-delayed-data-detection.html[Handling delayed data].
The *Job Management* pane shows the full list of annotations for each job.
{stack-ov}/ml-delayed-data-detection.html[Handling delayed data]. The
*Job Management* pane shows the full list of annotations for each job.

NOTE: The {kib} {ml-features} use pop-ups. You must configure your
web browser so that it does not block pop-up windows or create an exception for
your {kib} URL.
NOTE: The {kib} {ml-features} use pop-ups. You must configure your web
browser so that it does not block pop-up windows or create an exception for your
{kib} URL.

For more information about {ml}, see
{stack-ov}/xpack-ml.html[Machine learning in the {stack}].
For more information about the {anomaly-detect} feature, see
{stack-ov}/xpack-ml.html[{ml-cap} {anomaly-detect}].

--

Expand Down
43 changes: 22 additions & 21 deletions docs/ml/job-tips.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,25 @@
<titleabbrev>Job tips</titleabbrev>
++++

When you are creating a job in {kib}, the job creation wizards can provide
advice based on the characteristics of your data. By heeding these suggestions,
you can create jobs that are more likely to produce insightful {ml} results.
When you create an {anomaly-job} in {kib}, the job creation wizards can
provide advice based on the characteristics of your data. By heeding these
suggestions, you can create jobs that are more likely to produce insightful {ml}
results.

[[bucket-span]]
==== Bucket span

The bucket span is the time interval that {ml} analytics use to summarize and
model data for your job. When you create a job in {kib}, you can choose to
estimate a bucket span value based on your data characteristics.
model data for your job. When you create an {anomaly-job} in {kib}, you can
choose to estimate a bucket span value based on your data characteristics.

NOTE: The bucket span must contain a valid time interval. For more information,
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].

If you choose a value that is larger than one day or is significantly different
than the estimated value, you receive an informational message. For more
information about choosing an appropriate bucket span, see
{xpack-ref}/ml-buckets.html[Buckets].
{stack-ov}/ml-buckets.html[Buckets].

[[cardinality]]
==== Cardinality
Expand All @@ -40,14 +41,14 @@ job uses more memory resources. In particular, if the cardinality of the
Likewise if you are performing population analysis and the cardinality of the
`over_field_name` is below 10, you are advised that this might not be a suitable
field to use. For more information, see
{xpack-ref}/ml-configuring-pop.html[Performing Population Analysis].
{stack-ov}/ml-configuring-pop.html[Performing Population Analysis].

[[detectors]]
==== Detectors

Each job must have one or more _detectors_. A detector applies an analytical
function to specific fields in your data. If your job does not contain a
detector or the detector does not contain a
Each {anomaly-job} must have one or more _detectors_. A detector applies an
analytical function to specific fields in your data. If your job does not
contain a detector or the detector does not contain a
{stack-ov}/ml-functions.html[valid function], you receive an error.

If a job contains duplicate detectors, you also receive an error. Detectors are
Expand All @@ -57,9 +58,9 @@ duplicates if they have the same `function`, `field_name`, `by_field_name`,
[[influencers]]
==== Influencers

When you create a job, you can specify _influencers_, which are also sometimes
referred to as _key fields_. Picking an influencer is strongly recommended for
the following reasons:
When you create an {anomaly-job}, you can specify _influencers_, which are also
sometimes referred to as _key fields_. Picking an influencer is strongly
recommended for the following reasons:

* It allows you to more easily assign blame for the anomaly
* It simplifies and aggregates the results
Expand All @@ -78,11 +79,11 @@ The job creation wizards in {kib} can suggest which fields to use as influencers
[[model-memory-limits]]
==== Model memory limits

For each job, you can optionally specify a `model_memory_limit`, which is the
approximate maximum amount of memory resources that are required for analytical
processing. The default value is 1 GB. Once this limit is approached, data
pruning becomes more aggressive. Upon exceeding this limit, new entities are not
modeled.
For each {anomaly-job}, you can optionally specify a `model_memory_limit`, which
is the approximate maximum amount of memory resources that are required for
analytical processing. The default value is 1 GB. Once this limit is approached,
data pruning becomes more aggressive. Upon exceeding this limit, new entities
are not modeled.

You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
By default, it's not set, which means there is no upper bound on the acceptable
Expand All @@ -92,9 +93,9 @@ TIP: If you set the `model_memory_limit` too high, it will be impossible to open
the job; jobs cannot be allocated to nodes that have insufficient memory to run
them.

If the estimated model memory limit for a job is greater than the model memory
limit for the job or the maximum model memory limit for the cluster, the job
creation wizards in {kib} generate a warning. If the estimated memory
If the estimated model memory limit for an {anomaly-job} is greater than the
model memory limit for the job or the maximum model memory limit for the cluster,
the job creation wizards in {kib} generate a warning. If the estimated memory
requirement is only a little higher than the `model_memory_limit`, the job will
probably produce useful results. Otherwise, the actions you take to address
these warnings vary depending on the resources available in your cluster:
Expand Down