GitHub - plinde/ml-anomaly-injector: Generate a dataset containing a simple anomaly and load into ElasticSearch ML

The intention for this utility is to provide a 'training' dataset for Elastic's X-Pack Machine Learning product. This may be useful for learning the ML product by providing a simple dataset with 'known' anomaly in it.

The utility will generate a linear time-series set of 'HTTP access log' type events into Elasticsearch, along with a semi-randomized "anomaly" event period. In the initial version, this is a "10x" traffic spike.

NOTE: Redis and Logstash are used for some simplicity. Some may argue that requiring Redis/Logstash are actually reducing simplicity and adding complexity, but IMO these tools are already integral to working with the Elastic Stack and actually provide a comfortable abstraction. This code could easily be modified to write directly into ElasticSearch.

In ml/jobs, there are example ML jobs which can be imported with this dataset.

* In logstash/, there are some Logstash configs which help with ingestion of the events.

Expectations

Python 2.7+
Elastic Stack 5.4+
install python dependencies

pip install -r requirements.txt

Configuration

Edit the ES_ variables in generator.py to match your environment (Host,Port,Auth,SSL)

Run the utility

python generator.py

Output:

creating time series for previous 7 days
creating anomaly time series range within the previous 7 days
Anomaly Start Time 2017-03-01T18:31:23Z
Anomaly End Time 2017-03-01T19:01:23Z

Check your Kibana (localhost:5601) and define an index pattern for "smoke_event*", using timestamp as the timestamp field. You should see time-series data for the period requested (e.g. 7 days). You should also see a 'spike' up in the histogram. This is your anomaly.

In ML (assuming you have this already going), you can start with a Single Metric job, select smoke_event* as your index pattern and you should see a histogram with an obvious anomaly/spike.

If you're adventurous, you might look at defining your job via JSON from inside ml/jobs. Since this is a fixed time-series with a definite endpoint, you can define the ML job to terminate (NOT real-time).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
logstash		logstash
ml/jobs		ml/jobs
wiki		wiki
.gitignore		.gitignore
README.md		README.md
generator.py		generator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logstash

logstash

ml/jobs

ml/jobs

wiki

wiki

.gitignore

.gitignore

README.md

README.md

generator.py

generator.py

requirements.txt

requirements.txt

Repository files navigation

Expectations

About

Releases

Packages

Languages

plinde/ml-anomaly-injector

Folders and files

Latest commit

History

Repository files navigation

Expectations

About

Topics

Resources

Stars

Watchers

Forks

Languages