# Optimisation Functions Example

This notebook contains an example of how optimisation functions can be applied to a dataset, how they can be used in other ARGO modules and how to create your own.

## Requirements

To run, you'll need the following:

* Install the Optimisation package - see the readme for more information.
* A dataset containing binary predictor columns and a binary target column.

----

## Import packages

In [1]:
from rule_optimisation.optimisation_functions import Precision, Recall, FScore, Revenue, AlertsPerDay, PercVolume

import pandas as pd
import numpy as np

## Create data

Let's create some dummy predictor columns and a binary target column. For this example, let's assume the dummy predictor columns represent rules that have been applied to a dataset.

In [2]:
np.random.seed(0)

y_pred = pd.Series(np.random.randint(0, 2, 1000))
y = pd.Series(np.random.randint(0, 2, 1000))
amounts = pd.Series(np.random.randint(0, 1000, 1000))

----

## Apply optimisation functions

The example applies four supervised optimisation functions and two unsupervised optimisation function:

**Supervised optimisation functions**

* Precision score
* Recall score
* Fbeta score
* Revenue

**Unsupervised optimisation functions**

* Alerts per day (calculates the negative squared difference between the daily number of records a rule flags vs the targetted daily number of records flagged)
* Percentage of volume (calculates the negative squared difference between the percentage of the overall volume that the rule flags vs the targetted percentage of volume flagged)

**Note that the *FScore*, *Precision* or *Recall* classes are ~100 times faster on larger datasets compared to the same functions from Sklearn's *metrics* module.**

### Instantiate class and run fit method

We can run the *.fit()* method to calculate the optimisation metric for each column in the dataset.

#### Supervised optimisation functions

##### Precision score

In [3]:
precision = Precision()
rule_precision = precision.fit(y_true=y, y_pred=y_pred, sample_weight=None)

##### Recall score

In [4]:
recall = Recall()
rule_recall = recall.fit(y_true=y, y_pred=y_pred, sample_weight=None)

##### Fbeta score (here, beta=1)

In [5]:
f1 = FScore(beta=1)
rule_f1 = f1.fit(y_true=y, y_pred=y_pred, sample_weight=None)

##### Revenue

In [6]:
rev = Revenue(y_type='Fraud', chargeback_multiplier=2)
rule_rev = rev.fit(y_true=y, y_pred=y_pred, sample_weight=amounts)

#### Unsupervised optimisation functions

##### Alerts per day

In [7]:
apd = AlertsPerDay(n_alerts_expected_per_day=5, no_of_days_in_file=10)
rule_apd = apd.fit(y_pred=y_pred)

##### Percentage of volume

In [8]:
pv = PercVolume(perc_vol_expected=0.02)
rule_pv = pv.fit(y_pred=y_pred)

### Outputs

The *.fit()* method returns the optimisation metric defined by the class:

In [9]:
rule_precision

0.5376984126984127

In [10]:
rule_recall

0.5262135922330097

In [11]:
rule_f1

0.5318940137389597

In [12]:
rule_rev

35645

In [13]:
rule_apd

-2061.16

In [14]:
rule_pv

-0.234256

The *.fit()* method can be fed into various ARGO modules as an argument (wherever the opt_func parameter appears). For example, in the RuleGeneratorOpt module, you can set the metric used to optimise the rules using this methodology.

----

## Creating your own optimisation function

Say we want to create a class which calculates the Positive likelihood ratio (TP rate/FP rate).

The main class structure involves having a *.fit()* method which has three arguments - the binary predictor, the binary target and any event specific weights to apply. This method should return a single numeric value.

In [15]:
class PositiveLikelihoodRatio:
    
    def fit(self, y_true: np.array, y_pred: np.array, sample_weight: np.array) -> float:
        # Calculate TPR
        tpr = (y_true * y_pred).sum() / y_true.sum()
        # Calculate FPR
        fpr = (np.where(y_true==0, 1, 0) * y_pred).sum()/(np.where(y_true==0, 1, 0)).sum()
        if tpr == 0 or fpr == 0:
            return 0
        else:
            return tpr/fpr

We can then apply the *.fit()* method to the dataset to check it works:

In [16]:
plr = PositiveLikelihoodRatio()
rule_plr = plr.fit(y_true=y, y_pred=y_pred, sample_weight=None)

In [17]:
rule_plr

1.0953373057210716

Finally, after instantiating the class, we can feed the *.fit* method to a relevant ARGO module (for example, we can feed the *.fit()* method to the *opt_func* parameter in the *RuleGeneratorOpt* class so that rules are generated which maximise the Positive likelihood ratio).

----

## The End

That's it folks - if you have any queries or suggestions please put them in the *#sim-datatools-help* Slack channel or email James directly.