## Splink comparison viewer

`splink_comparison_viewer` is a tool to help you rapidly understand and quality assure the results of a Splink model.  It's a separate package, available from pypi, which can be installed using `pip install spli.nk_comparison_viewer`

In this demo, we use `splink_comparison_viewer` to generate an interactive dashboard.

The model used is the final model produced by the [combining estimates notebook](combining_estimates.ipynb).





## Step 1:  Imports and setup

The following is just boilerplate code that sets up the Spark session and sets some other non-essential configuration options

In [18]:
from utility_functions.demo_utils import get_spark

spark = get_spark()  # See utility_functions/demo_utils.py for how to set up Spark

21/12/09 21:38:17 WARN SimpleFunctionRegistry: The function jaro_winkler_sim replaced a previously registered function.
21/12/09 21:38:17 WARN SimpleFunctionRegistry: The function dmetaphone replaced a previously registered function.


## Load data and Splink model

In [19]:
df = spark.read.parquet("data/fake_1000.parquet")

In [27]:
import json
with open("data/fake_1000_combined.json") as f:
    settings = json.load(f)


## Apply pre-trained splink model to data

In [28]:
from splink import Splink

linker = Splink(settings["current_settings_dict"], df, spark)
df_e = linker.manually_apply_fellegi_sunter_weights()



## Generate dashboard

In [29]:
from splink_comparison_viewer import get_edges_data, render_html_vis

# The '3' parameter is the number of examples to output to the dashboard per comparison vector
edges_data = get_edges_data(df_e, 3)
render_html_vis(edges_data, settings, "splink_comparison_viewer.html", overwrite=True)



In [30]:
from IPython.display import IFrame
IFrame(src="./splink_comparison_viewer.html", width=1000, height=200)