# ARGO Simility Rule Applier Example

This notebook contains an example of how the ARGO Simility Rule Applier can be used to apply Simility rules present in a dataset (usually contained in the *sim_ll* column)

## Requirements

To run, you'll need the following:

* Install the Rule Applier package - see the readme for more information.
* A dataset containing the *sim_ll* (or equivalent) column.

----

## Import packages

In [2]:
from rule_application.sim_rule_applier import SimRuleApplier

import pandas as pd

## Read in data

Let's read in some dummy data. Note this data must contain the *sim_ll* (or equivalent) column.

In [3]:
X = pd.read_csv('dummy_data/dummy_sim_ll_data.csv', usecols=['eid', 'sim_ll'], index_col='eid')
y = pd.read_csv('dummy_data/dummy_sim_ll_data.csv', usecols=['eid', 'sim_is_fraud'], index_col='eid').squeeze()

----

## Apply rules

### Set up class parameters

Now we can set our class parameters for the Rule Applier. Here we're specifying an additional metric to calculate for each rule (the F1 score). However, you can omit this if you just need to calculate the standard results (Precision, Recall and PercDataFlagged).

**Please see the class docstring for more information on each parameter.**

In [4]:
from rule_optimisation.optimisation_functions import FScore
fs = FScore(beta=1)

In [5]:
params = {    
    'opt_func': fs.fit,
    'sim_ll_column': 'sim_ll'
}

### Instantiate class and run apply method

Once the parameters have been set, we can run the *.apply()* method to apply the Simility rules to the dataset. **Note that you can omit the *y* parameter if you have unlabelled data (however ensure that if you are providing an optimisation function to *opt_func*, it is not expecting a target column - see the *optimisation_functions* module in the *rule_optimisation* sub-package for more information):**

In [6]:
sra = SimRuleApplier(**params)
X_rules = sra.apply(X=X, y=y, sample_weight=None)

### Outputs

The *.apply()* method returns a dataframe giving the binary columns of the rules as applied to the training dataset.

A useful attribute created by running the *.apply()* method (when the *y* parameter is given) is:

* rule_descriptions: A dataframe showing the logic of the rules and their performance metrics as applied to the dataset.

In [7]:
sra.rule_descriptions.head()

Unnamed: 0_level_0,Precision,Recall,PercDataFlagged,OptMetric
Rule,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,0.375,1.0,0.8,0.545455
B,0.375,1.0,0.8,0.545455
D,0.333333,0.333333,0.3,0.333333
C,0.0,0.0,0.1,0.0
E,0.0,0.0,0.1,0.0


In [8]:
X_rules.head()

Rule,A,B,D,C,E
0,1,1,0,0,0
1,1,0,0,1,0
2,0,1,1,0,0
3,1,1,0,0,0
4,1,0,1,0,0


----

## The End

That's it folks - if you have any queries or suggestions please put them in the *#sim-datatools-help* Slack channel or email James directly.