This example demonstrates calculating [ralative risks](https://en.wikipedia.org/wiki/Relative_risk)
between evidence and outomes.

It's useful to quickly assess correlations between different health indicators.

In [1]:
# Not necessary, but useful for live-reloading changes to Obsinthe itself.
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
from obsinthe.utils.relative_risks import RelativeRisks
from obsinthe.prometheus.data import one_hot_encode

We simulate having different clusters with different versions of software installed.

In [3]:
evidence = pd.DataFrame(
    columns=["instance", "k8s-version", "elasticsearch-version"],
    data=[
        ["cluster-1", "1.30", "8.14.1"],
        ["cluster-2", "1.29", "8.13.0"],
        ["cluster-3", "1.29", "8.13.0"],
        ["cluster-4", "1.30", "8.13.0"],
        ["cluster-5", "1.29", "8.14.1"],
        ["cluster-6", "1.29", "8.14.1"],
        ["cluster-7", "1.30", "8.13.0"],
        ["cluster-8", "1.30", "8.13.0"],
        ["cluster-9", "1.30", "8.14.0"],
        ["cluster-10", "1.29", "8.14.0"],
    ],
)
evidence

Unnamed: 0,instance,k8s-version,elasticsearch-version
0,cluster-1,1.3,8.14.1
1,cluster-2,1.29,8.13.0
2,cluster-3,1.29,8.13.0
3,cluster-4,1.3,8.13.0
4,cluster-5,1.29,8.14.1
5,cluster-6,1.29,8.14.1
6,cluster-7,1.3,8.13.0
7,cluster-8,1.3,8.13.0
8,cluster-9,1.3,8.14.0
9,cluster-10,1.29,8.14.0


Outcomes are observations of particular health indicators, such as alerts.

In [4]:
outcomes = pd.DataFrame(
    columns=["instance", "alert"],
    data=[
        ["cluster-1", "ElasticsearchClusterNotHealthy" ],
        ["cluster-1", "AlertmanagerReceiversNotConfigured" ],
        ["cluster-2", "AlertmanagerReceiversNotConfigured" ],
        ["cluster-3", "AlertmanagerReceiversNotConfigured" ],
        ["cluster-4", "ElasticsearchClusterNotHealthy" ],
        ["cluster-5", "ElasticsearchClusterNotHealthy" ],
        ["cluster-6", "AlertmanagerReceiversNotConfigured" ],
        ["cluster-8", "AlertmanagerReceiversNotConfigured" ],
        ["cluster-9", "AlertmanagerReceiversNotConfigured" ],
    ],
)
outcomes

Unnamed: 0,instance,alert
0,cluster-1,ElasticsearchClusterNotHealthy
1,cluster-1,AlertmanagerReceiversNotConfigured
2,cluster-2,AlertmanagerReceiversNotConfigured
3,cluster-3,AlertmanagerReceiversNotConfigured
4,cluster-4,ElasticsearchClusterNotHealthy
5,cluster-5,ElasticsearchClusterNotHealthy
6,cluster-6,AlertmanagerReceiversNotConfigured
7,cluster-8,AlertmanagerReceiversNotConfigured
8,cluster-9,AlertmanagerReceiversNotConfigured


 We turn the data into one-hot encoding using the utility function.

In [5]:
evidence_one_hot = (
  one_hot_encode(evidence, "instance", "k8s-version", prefix="k8s-version:").join(
  one_hot_encode(evidence, "instance", "elasticsearch-version", prefix="elasticsearch-version:"))
)
evidence_one_hot

Unnamed: 0_level_0,k8s-version:1.29,k8s-version:1.30,elasticsearch-version:8.13.0,elasticsearch-version:8.14.0,elasticsearch-version:8.14.1
instance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
cluster-1,0,1,0,0,1
cluster-10,1,0,0,1,0
cluster-2,1,0,1,0,0
cluster-3,1,0,1,0,0
cluster-4,0,1,1,0,0
cluster-5,1,0,0,0,1
cluster-6,1,0,0,0,1
cluster-7,0,1,1,0,0
cluster-8,0,1,1,0,0
cluster-9,0,1,0,1,0


In [6]:
outcomes_one_hot = one_hot_encode(outcomes, "instance", "alert")
outcomes_one_hot

alert,AlertmanagerReceiversNotConfigured,ElasticsearchClusterNotHealthy
instance,Unnamed: 1_level_1,Unnamed: 2_level_1
cluster-1,1,1
cluster-2,1,0
cluster-3,1,0
cluster-4,0,1
cluster-5,0,1
cluster-6,1,0
cluster-8,1,0
cluster-9,1,0


Finally we calculate the actual results.

The interpretation is `elasticsearch-version:8.14.1` is 4.7x more likely to hit `ElasticsearchClusterNotHealthy` compared to instances without this version.

It's important to choose the right set of evidence: for example, if we know `ElasticsearchClusterNotHealthy` is only relevant for `elasticsearch` deployments,
it's good idea to limit the instances only to clusters with `elasticsearch` installed.

In [9]:
rr = RelativeRisks(evidence_one_hot, outcomes_one_hot)
rr.calculate()
rr.rr

outcomes,AlertmanagerReceiversNotConfigured,ElasticsearchClusterNotHealthy
evidence,Unnamed: 1_level_1,Unnamed: 2_level_1
k8s-version:1.29,1.0,0.5
k8s-version:1.30,1.0,2.0
elasticsearch-version:8.13.0,1.0,0.5
elasticsearch-version:8.14.0,0.8,0.0
elasticsearch-version:8.14.1,1.166667,4.666667


See the docstring for `RelativeRisks` for more details.