[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/cavokz/detection-rules/emit-events?labpath=docs/signals_generation_walk-through.ipynb)

# Rule-based signals generation walk-through

## Preliminaries

The API for generating source events is provided by module `detection_rules.events_emitter`, the `SourceEvents` class provides the front-end for the most common uses.

You can access an interactive version of this document by clicking on the "launch binder" badge above. You will be able to edit the `In [ ]` cells below by clicking in the grey area and executing them by pressing `Ctrl+Enter`.

Here are the module imports needed for the rest of this notebook.

In [1]:
import os; os.chdir('..')  # use the repo's root as base for importing local modules
from detection_rules.events_emitter import SourceEvents
from detection_rules.rule_loader import RuleCollection
from itertools import islice
from pathlib import Path

## Generate documents from queries

In the simplest form, documents can be generated as follows:

In [2]:
next(SourceEvents.from_query('process.name : *.exe'))

[{'process': {'name': 'fx.exe'}, '@timestamp': 1645549759774}]

Which is equivalent to

In [3]:
se = SourceEvents()
se.add_query('process.name : *.exe')
next(se)

[{'process': {'name': 'apbbtpkmwgptgjk.exe'}, '@timestamp': 1645549759783}]

In this second form, multiple queries can be added and the generated documents shall match either of them:

In [4]:
se = SourceEvents()
se.add_query('process.name : *.exe')
se.add_query('source.ip : 10.0.0.0/8')
next(se)

[{'source': {'ip': '10.63.97.9'}, '@timestamp': 1645549759791}]

The query language is guessed automatically. While the above are Kuery based, what follows uses EQL

In [5]:
se = SourceEvents()
se.add_query('process where process.name : "*.exe"')
next(se)

[{'process': {'name': 'ysxaf.exe'},
  'event': {'category': ['process']},
  '@timestamp': 1645549759802}]

Currently only Kuery and EQL are supported though others can be added.

## Generate documents from rules

Similarily to the query cases, it's possible to generate documents from one or more rules.

In [6]:
rules = RuleCollection()
rules.load_file(Path("rules/network/command_and_control_port_26_activity.toml"))
rules.load_file(Path("rules/network/command_and_control_telnet_port_activity.toml"))

se = SourceEvents()
for rule in rules:
    se.add_rule(rule)

next(se)

[{'event': {'category': ['network']},
  'network': {'transport': 'tcp'},
  'destination': {'port': 26},
  '@timestamp': 1645549760032}]

## Generate documents over and over

Adding queries and rules to a `SourceEvents` object triggers some operations like parsing the query, collecting the field constraints, trying to generate documents for the first time.

When it's important to avoid unnecessary computations, the result of such initial operations can be reused by preserving the `SourceEvents` object between the calls to `next`.

In [7]:
se = SourceEvents.from_query('process.name : (*.exe or *.dll)')

[next(se) for n in range(5)]

[[{'process': {'name': 'yumkaujvfp.exe'}, '@timestamp': 1645549760043}],
 [{'process': {'name': 't.exe'}, '@timestamp': 1645549760043}],
 [{'process': {'name': 'fwdovovtnqsdtva.dll'}, '@timestamp': 1645549760043}],
 [{'process': {'name': 'xybraswwvlc.dll'}, '@timestamp': 1645549760044}],
 [{'process': {'name': 'rym.dll'}, '@timestamp': 1645549760044}]]

## Schema used for the generation

By default the documents are generated according to the most recent version of ECS available in `detetion_rules`. Any field not defined in ECS is assumed to be of type `keyword`.

In [8]:
SourceEvents.ecs_version

'1.12.1'

For non-ECS non-keyword fields, the `custom_schema` class attribute is available and is merged with the ECS schema when the `SourceEvents` object is created. Modifying such attribute affects only newly created objects and none of the existing.

The default value of `custom_schema` is chosen for testing purposes so to satisfy as many detection rules as possible.

In [9]:
SourceEvents.custom_schema

{'file.Ext.windows.zone_identifier': {'type': 'long'},
 'process.parent.Ext.real.pid': {'type': 'long'}}

It's also possible to completely override the schema by passing one at creation time

In [10]:
se = SourceEvents(schema={"x": {"type": "long"}})
se.add_query('x > 0 and x < 100')
next(se)

[{'x': 39, '@timestamp': 1645549760065}]

## Mappings of the generated documents

The `SourceEvents` object can build mappings describing all the fields used in the loaded queries and rules. For this task it's employed the same schema used for the documents generation.

In [18]:
rules = RuleCollection()
rules.load_file(Path("rules/network/command_and_control_port_26_activity.toml"))
rules.load_file(Path("rules/network/command_and_control_telnet_port_activity.toml"))

se = SourceEvents()
for rule in rules:
    se.add_rule(rule)

next(se)

[{'event': {'category': ['network_traffic']},
  'network': {'transport': 'tcp'},
  'destination': {'port': 23},
  '@timestamp': 1645550673085}]

## Query validation

In [36]:
try:
    SourceEvents.from_query('destination.port < 1024 and (destination.port > 512 or destination.port > 1024)')
except Exception as e:
    print(e)

Unsolvable constraints: destination.port (empty solution space, 1025 <= x <= 1023)


## Using as iterator

The `SourceEvents` class implements the iterator protocol, as the usage of `next` for generating new documents anticipated. Because documents can be generated indefinitely, the iterator is infinite and therefore some care is needed.

Uses like `list(se)` or `for docs in se: print(docs)` are troublesome, the first would sooner or later exhaust all the resources, the second would never terminate spontaneously. You then need to ask yourself how many documents you want or what other condition you want to break your loop on.

As example, this prints 10 documents:

In [37]:
se = SourceEvents.from_query('process.name : (*.exe or *.dll)')

for docs in islice(se, 10):
    print(docs)

[{'process': {'name': 'see.exe'}, '@timestamp': 1645553255814}]
[{'process': {'name': 'bsougqsdhdw.dll'}, '@timestamp': 1645553255814}]
[{'process': {'name': 'z.dll'}, '@timestamp': 1645553255814}]
[{'process': {'name': 'ghzvahxmimxy.exe'}, '@timestamp': 1645553255814}]
[{'process': {'name': 'rdnsdccjprmlo.exe'}, '@timestamp': 1645553255814}]
[{'process': {'name': 'fwxxlane.exe'}, '@timestamp': 1645553255815}]
[{'process': {'name': 'l.exe'}, '@timestamp': 1645553255815}]
[{'process': {'name': 'kc.exe'}, '@timestamp': 1645553255815}]
[{'process': {'name': 'spfzowndforx.dll'}, '@timestamp': 1645553255815}]
[{'process': {'name': 'ftjkltrsj.exe'}, '@timestamp': 1645553255815}]
