# `confusion_matrix_from_labels_table`

!!! info "At a glance"
    **Useful for:** Summarising how Splink predictions compare with labelled data

    **API Documentation:** [confusion_matrix_from_labels_table_chart()](../linker.md#splink.linker.Linker.confusion_matrix_from_labels_table)

    **What is needed to generate the chart?** A `linker` with some data and a corresponding labelled dataset

## Worked Example

In [3]:
from splink.duckdb.linker import DuckDBLinker
import splink.duckdb.comparison_library as cl
import splink.duckdb.comparison_template_library as ctl
import splink.duckdb.blocking_rule_library as brl
from splink.datasets import splink_datasets, splink_dataset_labels
import logging, sys
logging.disable(sys.maxsize)

df = splink_datasets.fake_1000

settings = {
    "link_type": "dedupe_only",
    "blocking_rules_to_generate_predictions": [
        brl.exact_match_rule("first_name"),
        brl.exact_match_rule("surname"),
        brl.exact_match_rule("city"),
        brl.exact_match_rule("dob"),
        brl.exact_match_rule("email"),
    ],
    "comparisons": [
        ctl.name_comparison("first_name"),
        ctl.name_comparison("surname"),
        ctl.date_comparison("dob", cast_strings_to_date=True),
        cl.exact_match("city", term_frequency_adjustments=True),
        ctl.email_comparison("email", include_username_fuzzy_level=False),
    ],
}

linker = DuckDBLinker(df, settings)
linker.estimate_u_using_random_sampling(max_pairs=1e6)

blocking_rule_for_training = brl.and_(
                            brl.exact_match_rule("first_name"), 
                            brl.exact_match_rule("surname")
                            )

linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training)

blocking_rule_for_training = brl.exact_match_rule("dob")
linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training)


df_labels = splink_dataset_labels.fake_1000_labels
labels_table = linker.register_labels_table(df_labels)

linker.confusion_matrix_from_labels_table(labels_table)

### What the chart shows

The line chart on the left shows how **match probability** varies as a function of **match weight**. 

Hovering over this chart selects a match weight threshold and compares the results against labelled data by updating the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) shown on the right.

<hr>

### How to interpret the chart

Lowering the threshold to the extreme ensures many more matches are generated - this maximises the **True Positives** (high recall) but at the expense of some **False Positives** (low precision).

You can then see the effect on the confusion matrix of raising the match threshold. As more predicted matches become non-matches at the higher threshold, **True Positives** become **False Negatives**, but **False Positives** become **True Negatives**. 

This demonstrates the trade-off between **Type 1 (FP)** and **Type 2 (FN)** errors when selecting a match threshold, or precision vs recall.

<hr>

### Actions to take as a result of the chart

This chart is best used to _illustrate_ the effect of match threshold on linking performance against labelled data.

In order to make a decision about the optimal threshold to use, see [accuracy_chart_from_labels_table_chart()](./accuracy_chart_from_labels_table.ipynb) to use this confusion matrix to calculate various performance metrics.