# Spark-Matcher advanced Matcher example 

This notebook shows how to use the `spark_matcher` for matching entities with more customized settings. First we create a Spark session:

In [None]:
%config Completer.use_jedi = False  # for proper autocompletion
from pyspark.sql import SparkSession

In [None]:
spark = (SparkSession
             .builder
             .master("local")
             .enableHiveSupport()
             .getOrCreate())

Load the example data:

In [None]:
from spark_matcher.data import load_data

In [None]:
a, b = load_data(spark)

We now create a `Matcher` object with our own string similarity metric and blocking rules:

In [None]:
from spark_matcher.matcher import Matcher

First create a string similarity metric that checks if the first word is a perfect match:

In [None]:
def first_word(string_1, string_2):
    return float(string_1.split()[0]==string_2.split()[0])

We also want to use the `token_sort_ratio` from the `thefuzz` package. Note that this package should be available on the Spark worker nodes.

In [None]:
from thefuzz.fuzz import token_sort_ratio

In [None]:
field_info={'name':[first_word, token_sort_ratio], 'suburb':[token_sort_ratio], 'postcode':[token_sort_ratio]}

Moreover, we want to limit blocking to the 'title' field only by looking at the first 3 character and the first 3 words:

In [None]:
from spark_matcher.blocker.blocking_rules import FirstNChars, FirstNWords

In [None]:
blocking_rules=[FirstNChars('name', 3), FirstNWords('name', 3)]

In [None]:
myMatcher = Matcher(spark, field_info=field_info, blocking_rules=blocking_rules, checkpoint_dir='path_to_checkpoints')

Now we are ready for fitting the `Matcher` object:

In [None]:
myMatcher.fit(a, b)

This fitted model can now be use to predict:

In [None]:
result = myMatcher.predict(a, b)