## Problem
We will have multiple tables in the database, and sometimes there won’t
be a common “ID” or “KEY” to join them – scenarios like the following:

• Customer information scattered across multiple tables and systems.

• No global key to link them all together.

• A lot of variations in names and addresses.

## Solution
This can be solved by applying text similarity functions on the
demographic’s columns like the first name, last name, address, etc. And
based on the similarity score on a few common columns, we can decide
either the record pair is a match or not a match.

## How It Works
Let’s follow the steps in this section to link the records.
Technical challenge:

• Huge records that need to be linked/stitched/
deduplicated.

• Records come from various systems with differing
schemas.

There is no global key or customer id to merge. There are two possible
scenarios of data stitching or linking records:

• Multiple records of the same customer at the same
table, and you want to dedupe.

• Records of same customers from multiple tables need
to be merged.

For Recipe 3-A, let’s solve scenario 1 that is deduplication and as a part
of Recipe 3-B, let’s solve scenario 2 that is record linkage from multiple tables.

## Deduplication in the same table
### Step 3A-1 Read and understand the data
We need the data first:

In [None]:
# Import package
!pip install recordlinkage

import recordlinkage
#For this demo let us use the inbuilt dataset fromrecordlinkage library
#import data set

from recordlinkage.datasets import load_febrl1

#create a dataframe - dfa
dfA = load_febrl1()
dfA.head()