[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/elastic/geneve/main?labpath=docs/events_generation_walk-through.ipynb)

# Rule-based events generation walk-through

You can access an interactive version of this document by clicking on the "launch binder" badge above. You will be able to edit the `In [ ]` cells below by clicking in the grey area and executing them by pressing `Ctrl+Enter`.

## Preliminaries

The API for generating events is exposed by module `geneve.events_emitter`, the `SourceEvents` class provides the front-end for the most common use cases.

Here the module imports needed for the rest of this notebook.

In [1]:
import ipynb  # needed only in Jupiter notebooks

from geneve.events_emitter import SourceEvents
from geneve.utils import load_schema, load_rules

from itertools import islice
from pathlib import Path

## Schema

Generating documents requires a specification of the fields types, such as `long` or `ip`. This is the duty of the _schema_.

An example of schema defining fields `x`, `y`, and `z` as float numbers is `{'x': {'type': 'float'}, 'y': {'type': 'float'}, 'z': {'type': 'float'}}`. Fields not defined in the schema are considered of type `keyword`, a kind of string.

The default schema is just `{}`, therefore every field is considered of type `keyword`. To change that, assign to `SourceEvents.schema` as follows:

In [2]:
SourceEvents.schema = {'x': {'type': 'float'}, 'y': {'type': 'float'}, 'z': {'type': 'float'}}

From now on every document generator will use the assigned schema.

In [3]:
a = SourceEvents()
a.schema

{'x': {'type': 'float'}, 'y': {'type': 'float'}, 'z': {'type': 'float'}}

In [4]:
b = SourceEvents()
b.schema

{'x': {'type': 'float'}, 'y': {'type': 'float'}, 'z': {'type': 'float'}}

It's also possible to specify the schema on a per-case basis.

In [5]:
c = SourceEvents(schema={'x': {'type': 'float'}})
c.schema

{'x': {'type': 'float'}}

The rest of the notebook assumes the rich and standardized [ECS](https://www.elastic.co/guide/en/ecs/current/ecs-reference.html) 8.11.0 as default schema, as loaded below.

In [6]:
SourceEvents.schema = load_schema('./etc/ecs-v8.11.0.tar.gz', 'generated/ecs/ecs_flat.yml')

## Generate documents from queries

In the simplest form, documents can be generated as follows:

In [7]:
next(SourceEvents.from_query('process.name : *.exe'))

[Event(meta=None, doc={'process': {'name': 'Okb.exe'}, '@timestamp': '2023-02-06T11:54:08.391+01:00'})]

Which is equivalent to

In [8]:
se = SourceEvents()
se.add_query('process.name : *.exe')
next(se)

[Event(meta=None, doc={'process': {'name': 'zDCtQRQAWVhZQ.exe'}, '@timestamp': '2023-02-06T11:54:08.397+01:00'})]

In this second form, multiple queries can be added and the generated documents shall match either of them:

In [9]:
se = SourceEvents()
se.add_query('process.name : *.exe')
se.add_query('source.ip : 10.0.0.0/8')
next(se)

[Event(meta=None, doc={'process': {'name': 'UlX.exe'}, '@timestamp': '2023-02-06T11:54:08.408+01:00'})]

The query language is guessed automatically. While the above are Kuery based, what follows uses EQL

In [10]:
se = SourceEvents()
se.add_query('process where process.name : "*.exe"')
next(se)

[Event(meta=None, doc={'event': {'category': ['process']}, 'process': {'name': 'eXomVOjfNESLV.exe'}, '@timestamp': '2023-02-06T11:54:08.423+01:00'})]

Currently only Kuery and EQL are supported though others will be added.

Note how `next(se)` returns an array of `Event(..)` objects. It's an array because the query could result in the generation of multiple events, think at the `sequence` EQL queries.

The `Event(..)` object is used instead of a simple document so to wrap also meta data passed at the time of query insertion, ex. `add_query(.., meta=<your meta data>)`.

A way to unpack the documents is:

In [11]:
[event.doc for event in next(se)]

[{'event': {'category': ['process']},
  'process': {'name': 'yO.exe'},
  '@timestamp': '2023-02-06T11:54:08.429+01:00'}]

## Generate documents from rules

Similarily to the query cases, it's possible to generate documents from one or more rules.

In [12]:
# use the local copy of detection-rules 8.12.6
_, rules = load_rules('./etc/security_detection_engine-8.12.6.zip', (
    # SMTP on Port 26/TCP
    'kibana/security_rule/d7e62693-aab9-4f66-a21a-3d79ecdd603d_100.json',
))

# load the rules
se = SourceEvents()
for rule in rules:
    se.add_rule(rule)

# generate one document
next(se)

[Event(meta=None, doc={'destination': {'port': 26}, 'event': {'category': ['network_traffic'], 'dataset': 'zeek.smtp'}, 'network': {'transport': 'tcp'}, '@timestamp': '2023-02-06T11:54:09.161+01:00'})]

## Generate documents over and over

Adding queries and rules to a `SourceEvents` object triggers some operations like parsing the query, collecting the field constraints, trying to generate documents for the first time.

When it's important to avoid unnecessary computations, the result of such initial operations can be reused by preserving the `SourceEvents` object between the calls to `next`.

In [13]:
se = SourceEvents.from_query('process.name : (*.exe or *.dll)')

[next(se) for n in range(5)]

[[Event(meta=None, doc={'process': {'name': 'UW.dll'}, '@timestamp': '2023-02-06T11:54:09.170+01:00'})],
 [Event(meta=None, doc={'process': {'name': 'QcPNdamyOhK.dll'}, '@timestamp': '2023-02-06T11:54:09.170+01:00'})],
 [Event(meta=None, doc={'process': {'name': 'IUOqztL.dll'}, '@timestamp': '2023-02-06T11:54:09.170+01:00'})],
 [Event(meta=None, doc={'process': {'name': 'LXiXjZzvOtUOmZe.dll'}, '@timestamp': '2023-02-06T11:54:09.170+01:00'})],
 [Event(meta=None, doc={'process': {'name': 'PzWTf.exe'}, '@timestamp': '2023-02-06T11:54:09.170+01:00'})]]

## Mappings of the generated documents

The `SourceEvents` object can build mappings describing all the fields used in the loaded queries and rules. For this task it's employed the same schema used for the documents generation.

In [14]:
# use the local copy of detection-rules 8.12.6
_, rules = load_rules('./etc/security_detection_engine-8.12.6.zip', (
    # SMTP on Port 26/TCP
    'kibana/security_rule/d7e62693-aab9-4f66-a21a-3d79ecdd603d_100.json',
))

# load the rules
se = SourceEvents()
for rule in rules:
    se.add_rule(rule)

# generate the mappings
se.mappings()

{'properties': {'@timestamp': {'type': 'date'},
  'network': {'properties': {'transport': {'type': 'keyword'}}},
  'destination': {'properties': {'port': {'type': 'long'}}},
  'event': {'properties': {'category': {'type': 'keyword'},
    'dataset': {'type': 'keyword'}}}}}

## Query validation

In [15]:
try:
    SourceEvents.from_query('destination.port < 1024 and (destination.port > 512 or destination.port > 1024)')
except Exception as e:
    print(e)

Unsolvable constraints: destination.port (empty solution space, 1025 <= x <= 1023)


## Using as iterator

The `SourceEvents` class implements the iterator protocol, as the usage of `next` for generating new documents anticipated. Because documents can be generated indefinitely, the iterator is infinite and therefore some care is needed.

Uses like `list(se)` or `for docs in se: print(docs)` are troublesome, the first would sooner or later exhaust all the resources, the second would never terminate spontaneously. You then need to ask yourself how many documents you need or what other conditions will break the loop.

As example, this prints 10 documents:

In [16]:
se = SourceEvents.from_query('process.name : (*.exe or *.dll)')

for docs in islice(se, 10):
    print(docs)

[Event(meta=None, doc={'process': {'name': 'aAiUtZuFzS.exe'}, '@timestamp': '2023-02-06T11:54:09.198+01:00'})]
[Event(meta=None, doc={'process': {'name': 'exGSt.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'fHegHLwRFdRU.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'NQUfbtQnvOfc.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'RNMkiiucklGs.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'CjwPWZufhZj.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'BaZ.dll'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'BpuTSHUHPLzr.dll'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Event(meta=None, doc={'process': {'name': 'XzxmYFKtCcN.exe'}, '@timestamp': '2023-02-06T11:54:09.199+01:00'})]
[Ev