## Evaluation of prediction results

 <a target="_blank" href="https://colab.research.google.com/github/moj-analytical-services/splink/blob/splink4_dev/docs/demos/tutorials/07_Quality_assurance.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In the previous tutorial, we looked at various ways to visualise the results of our model.
These are useful for evaluating a linkage pipeline because they allow us to understand how our model works and verify that it is doing something sensible. They can also be useful to identify examples where the model is not performing as expected.

In addition to these spot checks, Splink also has functions to perform more formal accuracy analysis. These functions allow you to understand the likely prevalence of false positives and false negatives in your linkage models.

They rely on the existence of a sample of labelled (ground truth) matches, which may have been produced (for example) by human beings. For the accuracy analysis to be unbiased, the sample should be representative of the overall dataset.


In [1]:
# Uncomment and run this cell if you're running in Google Colab.
# !pip install git+https://github.com/moj-analytical-services/splink.git@splink4_dev

In [2]:
# Rerun our predictions to we're ready to view the charts
import pandas as pd

from splink import DuckDBAPI, Linker, splink_datasets

pd.options.display.max_columns = 1000

db_api = DuckDBAPI()
df = splink_datasets.fake_1000

In [3]:
import json
import urllib

url = "https://raw.githubusercontent.com/moj-analytical-services/splink_demos/master/demo_settings/saved_model_from_demo.json"

with urllib.request.urlopen(url) as u:
    settings = json.loads(u.read().decode())


linker = Linker(df, settings, database_api=DuckDBAPI())
df_predictions = linker.predict(threshold_match_probability=0.2)


You have called predict(), but there are some parameter estimates which have neither been estimated or specified in your settings dictionary.  To produce predictions the following untrained trained parameters will use default values.
Comparison: 'email':
    m values not fully trained


## Load in labels

The labels file contains a list of pairwise comparisons which represent matches and non-matches.

The required format of the labels file is described [here](https://moj-analytical-services.github.io/splink/linkerqa.html#splink.linker.Linker.roc_chart_from_labels).


In [4]:
from splink.datasets import splink_dataset_labels

df_labels = splink_dataset_labels.fake_1000_labels
df_labels.head(5)
labels_table = linker.register_labels_table(df_labels)

### Threshold Selection chart

Splink includes an interactive dashboard that shows key accuracy statistics:

In [5]:
linker.accuracy_analysis_from_labels_table(
    labels_table, output_type="threshold_selection", add_metrics=["f1"]
)

## Receiver operating characteristic curve

A [ROC chart](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) shows how the number of false positives and false negatives varies depending on the match threshold chosen. The match threshold is the match weight chosen as a cutoff for which pairwise comparisons to accept as matches.


In [6]:
linker.accuracy_analysis_from_labels_table(
    labels_table, output_type="roc"
)

### Precision-recall chart

An alternative representation of truth space is called a [precision recall curve](https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves).

This can be plotted as follows:


In [7]:
linker.accuracy_analysis_from_labels_table(
    labels_table, output_type="precision_recall"
)

## Truth table

Finally, Splink can also report the underlying table used to construct the ROC and precision recall curves.


In [8]:
roc_table = linker.accuracy_analysis_from_labels_table(labels_table, output_type="table")
roc_table.as_pandas_dataframe(limit=5)

Unnamed: 0,truth_threshold,match_probability,total_clerical_labels,p,n,tp,tn,fp,fn,P_rate,N_rate,tp_rate,tn_rate,fp_rate,fn_rate,precision,recall,specificity,npv,accuracy,f1,f2,f0_5,p4,phi
0,-12.3,0.000198,3176.0,2031.0,1145.0,1027.0,1145.0,0.0,1004.0,0.639484,0.360516,0.505662,1.0,0.0,0.494338,1.0,0.505662,1.0,0.532806,0.683879,0.671681,0.561141,0.836455,0.68324,0.519057
1,-5.8,0.017632,3176.0,2031.0,1145.0,1026.0,1145.0,0.0,1005.0,0.639484,0.360516,0.50517,1.0,0.0,0.49483,1.0,0.50517,1.0,0.532558,0.683564,0.671246,0.560656,0.836186,0.682913,0.518683
2,-4.7,0.037048,3176.0,2031.0,1145.0,1025.0,1145.0,0.0,1006.0,0.639484,0.360516,0.504677,1.0,0.0,0.495323,1.0,0.504677,1.0,0.532311,0.683249,0.670812,0.560171,0.835916,0.682586,0.51831
3,-3.7,0.071449,3176.0,2031.0,1145.0,1023.0,1145.0,0.0,1008.0,0.639484,0.360516,0.503693,1.0,0.0,0.496307,1.0,0.503693,1.0,0.531816,0.68262,0.669941,0.5592,0.835375,0.681932,0.517563
4,-3.2,0.098139,3176.0,2031.0,1145.0,1017.0,1145.0,0.0,1014.0,0.639484,0.360516,0.500739,1.0,0.0,0.499261,1.0,0.500739,1.0,0.530338,0.68073,0.667323,0.556285,0.833743,0.679967,0.515326


!!! note "Further Reading"
:material-tools: For more on the quality assurance tools in Splink, please refer to the [Evaluation API documentation](../../linkereval.md).

    :bar_chart: For more on the charts used in this tutorial, please refer to the [Charts Gallery](../../charts/index.md#model-evaluation).

    :material-thumbs-up-down: For more on the Evaluation Metrics used in this tutorial, please refer to the [Edge Metrics guide.](../../topic_guides/evaluation/edge_metrics.md)


## :material-flag-checkered: That's it!

That wraps up the Splink tutorial! Don't worry, there are still plenty of resources to help on the next steps of your Splink journey:

:octicons-link-16: For some end-to-end notebooks of Splink pipelines, check out our [Examples](../examples/examples_index.md)

:simple-readme: For more deepdives into the different aspects of Splink, and record linkage more generally, check out our [Topic Guides](../../topic_guides/topic_guides_index.md)

:material-tools: For a reference on all the functionality avalable in Splink, see our [Documentation](../../documentation_index.md)
