# Simulated PIK statistics

Here we inspect the accuracy and characteristics of the PIKs assigned,
leveraging our knowledge of ground truth from pseudopeople.

It wouldn't be possible to do the ground truth part with the real PVS, but
Layne, Wagner, and Rothhaas did something similar by redacting SSN from real records,
sending them through PVS without the SSN, and then using the true SSN
as ground truth.
The health care records they used are probably quite different from a CUF,
but they found a **very** good overall PIK accuracy (see cell below).

In [1]:
# Query planning is now on by default, but it has some rough edges.
# See https://github.com/dask/dask/issues/10995 for general discussion
# and https://github.com/dask/dask-expr/issues/1060 for the particular
# issue I ran into.
import dask

dask.config.set({"dataframe.query-planning": False})

<dask.config.set at 0x7f3441c624a0>

In [2]:
import datetime, os

from vivarium_research_prl import distributed_compute, utils
from IPython.display import display

In [3]:
print(datetime.datetime.now())

2024-05-15 23:43:59.852198


In [4]:
# DO NOT EDIT if this notebook is not called ground_truth_accuracy.ipynb!
# This notebook is designed to be run with papermill; this cell is tagged 'parameters'
data_to_use = "small_sample"
simulated_data_output_dir = "output/generate_simulated_data"
case_study_output_dir = "output"

# The "compute engine" is what we use on the Python side
# for our case-study-specific operations,
# as opposed to the Splink engine
compute_engine = "pandas"
# Only matter if using a distributed compute engine
compute_engine_num_jobs = 3
compute_engine_cpus_per_job = 2
compute_engine_memory_per_job = "5GB"
queue = "long.q"
local_directory = f"/tmp/{os.environ['USER']}_dask"

In [5]:
# Parameters
data_to_use = "ri"
simulated_data_output_dir = (
    "/ihme/scratch/users/zmbc/person_linkage_case_study/generate_simulated_data/"
)
case_study_output_dir = "/ihme/scratch/users/zmbc/person_linkage_case_study/results/"
compute_engine = "dask"
compute_engine_num_jobs = 20
compute_engine_memory_per_job = "30GB"
compute_engine_cpus_per_job = 2

In [6]:
# Parameters for a USA run
# data_to_use = "usa"
# simulated_data_output_dir = "/ihme/scratch/users/zmbc/person_linkage_case_study/generate_simulated_data"
# case_study_output_dir = "/ihme/scratch/users/zmbc/person_linkage_case_study/person_linkage_case_study"

# compute_engine = 'dask'
# compute_engine_num_jobs = 50
# compute_engine_memory_per_job = "120GB"
# compute_engine_cpus_per_job = 2

In [7]:
case_study_output_dir = f"{case_study_output_dir}/{data_to_use}"
simulated_data_output_dir = f"{simulated_data_output_dir}/{data_to_use}"

In [8]:
df_ops, pd = distributed_compute.start_compute_engine(
    compute_engine,
    num_jobs=compute_engine_num_jobs,
    cpus_per_job=compute_engine_cpus_per_job,
    memory_per_job=compute_engine_memory_per_job,
    queue=queue,
    local_directory=local_directory,
)

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.SLURMCluster
Dashboard: http://10.158.111.9:8787/status,

0,1
Dashboard: http://10.158.111.9:8787/status,Workers: 20
Total threads: 20,Total memory: 558.80 GiB

0,1
Comm: tcp://10.158.111.9:43605,Workers: 20
Dashboard: http://10.158.111.9:8787/status,Total threads: 20
Started: Just now,Total memory: 558.80 GiB

0,1
Comm: tcp://10.158.106.31:44125,Total threads: 1
Dashboard: http://10.158.106.31:36005/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.31:42271,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-b76jo76a,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-b76jo76a

0,1
Comm: tcp://10.158.148.223:39967,Total threads: 1
Dashboard: http://10.158.148.223:37659/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:45721,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-wixpgsh7,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-wixpgsh7

0,1
Comm: tcp://10.158.148.223:41119,Total threads: 1
Dashboard: http://10.158.148.223:38255/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:35911,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-obmdyf_t,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-obmdyf_t

0,1
Comm: tcp://10.158.148.142:34559,Total threads: 1
Dashboard: http://10.158.148.142:40903/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.142:38519,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-iql73pss,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-iql73pss

0,1
Comm: tcp://10.158.106.31:40973,Total threads: 1
Dashboard: http://10.158.106.31:43831/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.31:45313,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-a9oz4o8b,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-a9oz4o8b

0,1
Comm: tcp://10.158.106.8:39291,Total threads: 1
Dashboard: http://10.158.106.8:38047/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.8:45969,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-0tmzd3j9,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-0tmzd3j9

0,1
Comm: tcp://10.158.106.8:33775,Total threads: 1
Dashboard: http://10.158.106.8:34161/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.8:43695,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-ly_w09mv,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-ly_w09mv

0,1
Comm: tcp://10.158.106.31:39151,Total threads: 1
Dashboard: http://10.158.106.31:37209/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.31:42567,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-us6hxz59,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-us6hxz59

0,1
Comm: tcp://10.158.148.142:42211,Total threads: 1
Dashboard: http://10.158.148.142:35913/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.142:41155,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-6sazin6g,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-6sazin6g

0,1
Comm: tcp://10.158.148.174:33991,Total threads: 1
Dashboard: http://10.158.148.174:39473/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.174:34831,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-v_5t2xrc,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-v_5t2xrc

0,1
Comm: tcp://10.158.148.223:43793,Total threads: 1
Dashboard: http://10.158.148.223:45663/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:41423,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-qgptzeje,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-qgptzeje

0,1
Comm: tcp://10.158.106.9:46803,Total threads: 1
Dashboard: http://10.158.106.9:36709/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.9:32979,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-6oa95m1p,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-6oa95m1p

0,1
Comm: tcp://10.158.148.142:43949,Total threads: 1
Dashboard: http://10.158.148.142:34191/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.142:41863,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-v_98ioi8,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-v_98ioi8

0,1
Comm: tcp://10.158.148.142:32841,Total threads: 1
Dashboard: http://10.158.148.142:38193/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.142:32817,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-x986j80l,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-x986j80l

0,1
Comm: tcp://10.158.106.8:46681,Total threads: 1
Dashboard: http://10.158.106.8:38449/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.8:41307,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-g2cfddi7,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-g2cfddi7

0,1
Comm: tcp://10.158.148.223:35365,Total threads: 1
Dashboard: http://10.158.148.223:43955/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:38691,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-7ni0kp_8,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-7ni0kp_8

0,1
Comm: tcp://10.158.148.223:45407,Total threads: 1
Dashboard: http://10.158.148.223:38079/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:41437,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-vc325b27,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-vc325b27

0,1
Comm: tcp://10.158.148.142:35283,Total threads: 1
Dashboard: http://10.158.148.142:42079/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.142:44115,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-rtaaj9ib,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-rtaaj9ib

0,1
Comm: tcp://10.158.148.223:34837,Total threads: 1
Dashboard: http://10.158.148.223:34651/status,Memory: 27.94 GiB
Nanny: tcp://10.158.148.223:34453,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-o4szeyv0,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-o4szeyv0

0,1
Comm: tcp://10.158.106.31:44781,Total threads: 1
Dashboard: http://10.158.106.31:44439/status,Memory: 27.94 GiB
Nanny: tcp://10.158.106.31:35903,
Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-4r8w5mqm,Local directory: /tmp/zmbc_dask/dask-scratch-space/worker-4r8w5mqm


In [9]:
census_2030_piked = df_ops.read_parquet(
    f"{case_study_output_dir}/census_2030_piked.parquet"
)
confirmed_piks_with_ground_truth = df_ops.read_parquet(
    f"{case_study_output_dir}/confirmed_piks.parquet"
)

Imbalanced dataframe: too_few=False, too_many=True, too_large=False
count       291.000000
mean     935161.642612
std       16261.028538
min      889812.000000
25%      923722.000000
50%      934268.000000
75%      946355.000000
max      987218.000000
dtype: float64
Creating partitions of 100MB


In [10]:
piked_proportion = df_ops.compute(census_2030_piked.pik.notnull().mean())
# Compare with 90.28% of input records PIKed in the 2010 CUF,
# as reported in Wagner and Layne, Table 2, p. 18
print(f"{piked_proportion:.2%} of the input records were PIKed")

89.41% of the input records were PIKed


In [11]:
# Multiple Census rows assigned the same PIK, indicating the model thinks they are duplicates in Census
pik_sizes = df_ops.persist(
    df_ops.groupby_agg_small_groups(
        census_2030_piked, by="pik", agg_func=lambda x: x.size()
    )
)
df_ops.compute(pik_sizes.value_counts())

1    981164
2       563
Name: count, dtype: int64

In [12]:
# Interesting: in pseudopeople, sometimes siblings are assigned the same (common) first name, making them almost identical.
# The only giveaway is their age and DOB.
# Presumably, this tends not to happen in real life.
duplicate_piks = (
    pik_sizes.rename("pik_size").reset_index().pipe(lambda df: df[df.pik_size > 1])
)

df_ops.head(census_2030_piked.merge(duplicate_piks, on="pik").sort_values("pik"))

Unnamed: 0,household_id,first_name,middle_initial,last_name,age,date_of_birth,street_number,street_name,unit_number,city,state,zipcode,housing_type,relationship_to_reference_person,sex,race_ethnicity,year,record_id,pik,pik_size
63,6118_93687,Ximena,A,Silverman,7.0,03/01/2023,191,east loudon avenue,,nort smithfield,RI,2857.0,Household,Biological child,Female,Black,2030,simulated_census_2030_1_237810,2_100394,2
200,6118_93687,Mary,,Silverman,36.0,03/01/2023,191,east loudon avenue,,nort smithfield,RI,2857.0,Household,Reference person,Female,Black,2030,simulated_census_2030_1_235731,2_100394,2
227,1282_275198,Jack,A,Solano,7.0,09/02/1996,429,e arrow hwy,,coventry,RI,2907.0,Household,Grandchild,Male,Latino,2030,simulated_census_2030_0_157495,2_102403,2
348,1282_275198,Jacob,D,Solano,33.0,09/02/1996,429,e arrow hwy,,coventry,RI,,Household,Biological child,Male,Latino,2030,simulated_census_2030_0_156697,2_102403,2
4,465_115899,Diane,R,Miller,83.0,12/07/2012,906,vandre ave,,providenc,RI,2881.0,Household,Reference person,Female,White,2030,simulated_census_2030_0_62176,2_103729,2
78,465_115899,Kaiya,R,Miller,17.0,12/07/2012,906,vandre ave,,providence,RI,2881.0,Household,Grandchild,Female,White,2030,simulated_census_2030_0_62177,2_103729,2
139,1777_111062,Mia,P,Coronado,18.0,01/18/2012,71,hyde park ct,,wst warwick,RI,2904.0,Household,Biological child,Female,Latino,2030,simulated_census_2030_0_228366,2_103873,2
309,1777_111062,Caleb,,Coronado,,01/18/2012,61,hyde park ct,,wst warwick,RI,2904.0,Household,Biological child,Male,Latino,2030,simulated_census_2030_0_228365,2_103873,2
59,6191_23892,Amin,M,Cole,11.0,12/04/2018,239 1 2,southern arty,,east greenwich,RI,2905.0,Household,Sibling,Male,Latino,2030,simulated_census_2030_1_244226,2_110635,2
168,6191_23892,Eleanor,M,Cole,2.0,12/04/2018,239 1 2,southern arty,,east greenwich,RI,2905.0,Household,,Female,Multiracial or Other,2030,simulated_census_2030_1_246982,2_110635,2


## Ground truth statistics

In [13]:
census_2030_ground_truth = df_ops.persist(
    df_ops.read_parquet(
        f"{simulated_data_output_dir}/simulated_census_2030_ground_truth.parquet"
    )
)

Imbalanced dataframe: too_few=True, too_many=False, too_large=False
count    3.000000e+00
mean     2.368708e+07
std      1.133272e+07
min      1.060119e+07
25%      2.040891e+07
50%      3.021663e+07
75%      3.023002e+07
max      3.024342e+07
dtype: float64


In [14]:
# In this version of pseudopeople, there are no actual duplicates in Census,
# which means all of the duplicates identified above are wrong.
assert len(census_2030_ground_truth) == len(
    df_ops.drop_duplicates(census_2030_ground_truth)
)

Imbalanced dataframe: too_few=True, too_many=False, too_large=False
count    6.000000e+00
mean     1.037869e+07
std      1.299027e+07
min      0.000000e+00
25%      0.000000e+00
50%      4.642370e+06
75%      2.217716e+07
max      2.651276e+07
dtype: float64


In [15]:
reference_files_ground_truth = df_ops.persist(
    df_ops.concat(
        [
            df_ops.read_parquet(
                f"{simulated_data_output_dir}/simulated_geobase_reference_file_ground_truth.parquet"
            ).drop(columns=["n_unique_simulants"]),
            df_ops.read_parquet(
                f"{simulated_data_output_dir}/simulated_name_dob_reference_file_ground_truth.parquet"
            ).drop(columns=["n_unique_simulants"]),
        ],
        ignore_index=True,
    )
)

Imbalanced dataframe: too_few=True, too_many=False, too_large=False
count    2.000000e+00
mean     7.610524e+07
std      4.820347e+03
min      7.610183e+07
25%      7.610353e+07
50%      7.610524e+07
75%      7.610694e+07
max      7.610864e+07
dtype: float64


In [16]:
# However, there can be reference file records that correspond to multiple simulants,
# due to errors in the reference file construction by SSN
n_unique_simulants = df_ops.persist(
    df_ops.groupby_agg_small_groups(
        reference_files_ground_truth,
        by="record_id",
        agg_func=lambda x: x.simulant_id.nunique(),
    )
    .rename("n_unique_simulants")
    .reset_index()
)
df_ops.compute(n_unique_simulants.n_unique_simulants.value_counts())

n_unique_simulants
1    4427601
2     119327
3       5586
4        300
5         14
6          3
Name: count, dtype: int64

In [17]:
reference_files_ground_truth = df_ops.persist(
    reference_files_ground_truth.merge(
        n_unique_simulants,
        on="record_id",
        how="left",
    )
)
reference_files_ground_truth.head(n=100)

Unnamed: 0,record_id,simulant_id,n_unique_simulants
0,simulated_geobase_reference_file_0_12823,7645_262150,1
1,simulated_geobase_reference_file_0_1633,5072_308353,2
2,simulated_geobase_reference_file_0_1633,5072_197339,2
3,simulated_geobase_reference_file_0_18938,107_193795,1
4,simulated_geobase_reference_file_0_23431,3305_405686,1
...,...,...,...
95,simulated_geobase_reference_file_16_32886,3568_452390,1
96,simulated_geobase_reference_file_16_38477,7589_324805,1
97,simulated_geobase_reference_file_16_39367,40_658775,1
98,simulated_geobase_reference_file_16_42612,3454_1000174,1


In [18]:
df_ops.head(
    reference_files_ground_truth[
        reference_files_ground_truth.n_unique_simulants
        == df_ops.compute(reference_files_ground_truth.n_unique_simulants.max())
    ]
)

Unnamed: 0,record_id,simulant_id,n_unique_simulants
25339,simulated_geobase_reference_file_21_35463,7125_711460,6
25340,simulated_geobase_reference_file_21_35463,7125_711458,6
25341,simulated_geobase_reference_file_21_35463,7125_711454,6
25342,simulated_geobase_reference_file_21_35463,7125_711463,6
25343,simulated_geobase_reference_file_21_35463,7125_711462,6
25344,simulated_geobase_reference_file_21_35463,7125_624129,6
25142,simulated_geobase_reference_file_40_34229,6991_559426,6
25143,simulated_geobase_reference_file_40_34229,6991_1067323,6
25144,simulated_geobase_reference_file_40_34229,6991_559425,6
25145,simulated_geobase_reference_file_40_34229,6991_559427,6


In [19]:
census_2030_ground_truth = df_ops.persist(
    census_2030_ground_truth.merge(
        df_ops.drop_duplicates(reference_files_ground_truth[["simulant_id"]]).assign(
            possible_to_pik=1
        ),
        on="simulant_id",
        how="left",
    ).assign(possible_to_pik=lambda df: df.possible_to_pik.fillna(0))
)
possible_to_pik_proportion = df_ops.compute(
    census_2030_ground_truth.possible_to_pik.mean()
)
print(
    f"{(1 - possible_to_pik_proportion):.2%} of the input records are "
    "impossible to PIK correctly, since they are not in any reference files"
)

0.51% of the input records are impossible to PIK correctly, since they are not in any reference files


In [20]:
print(
    f"Assigned PIKs to {(piked_proportion / possible_to_pik_proportion):.2%} of PIK-able records"
)

Assigned PIKs to 89.86% of PIK-able records


In [21]:
reference_file = df_ops.concat(
    [
        df_ops.read_parquet(
            f"{simulated_data_output_dir}/simulated_geobase_reference_file.parquet",
        ),
        df_ops.read_parquet(
            f"{simulated_data_output_dir}/simulated_name_dob_reference_file.parquet",
        ),
    ],
    ignore_index=True,
)

In [22]:
reference_file_piks = df_ops.persist(reference_file[["record_id", "pik"]])
reference_file_piks

Unnamed: 0_level_0,record_id,pik
npartitions=120,Unnamed: 1_level_1,Unnamed: 2_level_1
,string,string
,...,...
...,...,...
,...,...
,...,...


In [23]:
assert len(reference_file_piks) == len(
    df_ops.drop_duplicates(reference_file_piks[["record_id"]])
)

In [24]:
pik_simulant_pairs = df_ops.persist(
    df_ops.drop_duplicates(
        reference_files_ground_truth.merge(reference_file_piks, on="record_id")[
            ["pik", "simulant_id"]
        ]
    )
)

In [25]:
# However, there can be PIKs that correspond to multiple simulants,
# due to errors in the reference file construction by SSN
n_unique_simulants = df_ops.persist(
    df_ops.groupby_agg_small_groups(
        pik_simulant_pairs, by="pik", agg_func=lambda x: x.simulant_id.nunique()
    )
    .rename("n_unique_simulants")
    .reset_index()
)
df_ops.compute(n_unique_simulants.n_unique_simulants.value_counts())

n_unique_simulants
1    1618046
2     103567
3       6328
4        360
5         22
6          4
Name: count, dtype: int64

In [26]:
pik_simulant_pairs = df_ops.persist(
    pik_simulant_pairs.merge(
        n_unique_simulants,
        on="pik",
        how="left",
    )
)
pik_simulant_pairs

Unnamed: 0_level_0,pik,simulant_id,n_unique_simulants
npartitions=240,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
,string,string,int64
,...,...,...
...,...,...,...
,...,...,...
,...,...,...


In [27]:
df_ops.head(
    pik_simulant_pairs[
        pik_simulant_pairs.n_unique_simulants
        == df_ops.compute(pik_simulant_pairs.n_unique_simulants.max())
    ]
)

Unnamed: 0,pik,simulant_id,n_unique_simulants
1942,2_490201,7870_98971,6
1943,2_490201,656_1135113,6
1944,2_490201,2721_36160,6
1945,2_490201,5950_76911,6
1946,2_490201,8133_83126,6
1947,2_490201,8501_630145,6
3870,3_13632,4743_515490,6
3871,3_13632,4743_76318,6
3872,3_13632,4743_515492,6
3873,3_13632,4743_515489,6


## Definitions of accuracy

1. (most strict) Assigning any PIK with multiple simulants is incorrect
2. Assigning a PIK with multiple simulants is neither incorrect nor correct (excluded from denominator)
3. (most lenient) Assigning a PIK with multiple simulants is correct, as long as at least one of those simulants matches the truth

In [28]:
# All modules, Medicare database, calculated from Layne, Wagner, and Rothhaas Table 1 (p. 15)
real_life_pvs_accuracy = 1 - (2_585 + 60_709 + 129_480 + 89_094) / (
    52_406_981 + 5_170_924 + 49_374_794 + 50_327_034
)
f"{real_life_pvs_accuracy:.5%}"

'99.82079%'

### Definition 1

In [29]:
piks_assigned = df_ops.compute(census_2030_piked.pik.notnull().sum())
piks_assigned

982290

In [30]:
df_ops.head(pik_simulant_pairs[pik_simulant_pairs.n_unique_simulants > 1])

Unnamed: 0,pik,simulant_id,n_unique_simulants
0,2_100151,9159_624504,2
1,2_100151,9159_692077,2
2,2_100284,6442_1037247,2
3,2_100284,6442_310430,2
32,2_107130,446_862382,2
33,2_107130,446_211850,2
47,2_109530,6520_1136974,2
48,2_109530,6520_620122,2
62,2_113631,4203_1060381,2
63,2_113631,4203_1060383,2


In [31]:
single_sim_piks_correct = df_ops.compute(
    census_2030_piked[["record_id", "pik"]]
    .merge(pik_simulant_pairs, on="pik")
    .merge(census_2030_ground_truth, on="record_id")
    .pipe(
        lambda df: (df.simulant_id_x == df.simulant_id_y) & (df.n_unique_simulants == 1)
    )
    .sum()
)
single_sim_piks_correct

910434

In [32]:
# Overall accuracy, treating it as a black box
(single_sim_piks_correct / piks_assigned)

0.9268484866994472

In [33]:
assert len(confirmed_piks_with_ground_truth) == piks_assigned

In [34]:
df_ops.head(
    census_2030_ground_truth.rename(columns={"record_id": "record_id_census_2030"})
)

Unnamed: 0,record_id_census_2030,simulant_id,possible_to_pik
0,simulated_census_2030_0_2,28_1143,1.0
1,simulated_census_2030_0_303,28_114415,1.0
2,simulated_census_2030_0_319,28_117645,1.0
3,simulated_census_2030_0_529,28_210376,1.0
4,simulated_census_2030_0_1202,28_482019,1.0
5,simulated_census_2030_0_1412,28_557182,1.0
6,simulated_census_2030_0_1606,28_637973,1.0
7,simulated_census_2030_0_1679,28_672216,1.0
8,simulated_census_2030_0_1996,28_795711,1.0
9,simulated_census_2030_0_2041,28_815800,1.0


In [35]:
# Looking at whether the exact *record* linked was from the same simulant
single_sim_record_links_correct = df_ops.compute(
    confirmed_piks_with_ground_truth.merge(
        census_2030_ground_truth.rename(
            columns={"record_id": "record_id_raw_input_file"}
        ),
        on="record_id_raw_input_file",
    )
    .merge(
        reference_files_ground_truth.rename(
            columns={"record_id": "record_id_reference_file"}
        ),
        on="record_id_reference_file",
    )
    .pipe(
        lambda df: (df.simulant_id_x == df.simulant_id_y) & (df.n_unique_simulants == 1)
    )
    .sum()
)
single_sim_record_links_correct

917968

In [36]:
(single_sim_record_links_correct / piks_assigned)

0.9345183194372334

### Definition 2

In [37]:
single_sim_piks_assigned = len(
    census_2030_piked[["record_id", "pik"]].merge(
        pik_simulant_pairs[pik_simulant_pairs.n_unique_simulants == 1][
            ["pik", "simulant_id"]
        ]
    )
)
single_sim_piks_assigned

911011

In [38]:
# Overall accuracy, treating it as a black box
(single_sim_piks_correct / single_sim_piks_assigned)

0.9993666377244622

In [39]:
# Looking at whether the exact *record* linked was from the same simulant
single_sim_record_links_assigned = df_ops.compute(
    (
        confirmed_piks_with_ground_truth.merge(
            reference_files_ground_truth.rename(
                columns={"record_id": "record_id_reference_file"}
            ),
            on="record_id_reference_file",
        ).n_unique_simulants
        == 1
    ).sum()
)
single_sim_record_links_assigned

918549

In [40]:
(single_sim_record_links_correct / single_sim_record_links_assigned)

0.9993674806678795

### Definition 3

In [41]:
pik_simulant_pairs

Unnamed: 0_level_0,pik,simulant_id,n_unique_simulants
npartitions=240,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
,string,string,int64
,...,...,...
...,...,...,...
,...,...,...
,...,...,...


In [42]:
piks_at_least_partially_correct = df_ops.persist(
    census_2030_piked[["record_id", "pik"]]
    .merge(pik_simulant_pairs, on="pik")
    .merge(census_2030_ground_truth, on="record_id")
    .pipe(df_ops.drop_duplicates)
    .assign(correct=lambda df: df.simulant_id_x == df.simulant_id_y)
    .pipe(
        df_ops.groupby_agg_small_groups,
        by=["record_id", "pik"],
        agg_func=lambda x: x.correct.any(),
    )
    .reset_index()
)
piks_at_least_partially_correct

Imbalanced dataframe: too_few=False, too_many=True, too_large=False
count    4.800000e+02
mean     2.371846e+05
std      3.398717e+05
min      0.000000e+00
25%      0.000000e+00
50%      8.286500e+03
75%      3.188702e+05
max      1.777656e+06
dtype: float64
Creating partitions of 100MB


Unnamed: 0_level_0,record_id,pik,correct
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
,string,string,bool[pyarrow]
,...,...,...
,...,...,...


In [43]:
# Overall accuracy, treating it as a black box
piks_correct_proportion = (
    df_ops.compute(piks_at_least_partially_correct.correct.sum()) / piks_assigned
)
piks_correct_proportion

0.9993759480397846

In [44]:
print(
    f"{piks_correct_proportion:.5%} of the PIKs assigned were correct; compare with {real_life_pvs_accuracy:.5%} in real life"
)

99.93759% of the PIKs assigned were correct; compare with 99.82079% in real life


In [45]:
# Looking at whether the exact *record* linked was from the same simulant
sim_record_links_at_least_partially_correct = df_ops.persist(
    confirmed_piks_with_ground_truth.merge(
        census_2030_ground_truth.rename(
            columns={"record_id": "record_id_raw_input_file"}
        ),
        on="record_id_raw_input_file",
    )
    .merge(
        reference_files_ground_truth.rename(
            columns={"record_id": "record_id_reference_file"}
        ),
        on="record_id_reference_file",
    )
    .assign(correct=lambda df: df.simulant_id_x == df.simulant_id_y)
    .pipe(
        df_ops.groupby_agg_small_groups,
        by=[
            "record_id_raw_input_file",
            "record_id_reference_file",
            "pik",
            "module_name",
            "pass_name",
        ],
        agg_func=lambda x: x.correct.any(),
    )
    .reset_index()
)
sim_record_links_at_least_partially_correct

Unnamed: 0_level_0,record_id_raw_input_file,record_id_reference_file,pik,module_name,pass_name,correct
npartitions=196,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
,string,string,string,string,string,bool[pyarrow]
,...,...,...,...,...,...
...,...,...,...,...,...,...
,...,...,...,...,...,...
,...,...,...,...,...,...


In [46]:
len(sim_record_links_at_least_partially_correct)

982290

In [47]:
len(
    df_ops.drop_duplicates(
        sim_record_links_at_least_partially_correct[
            ["record_id_raw_input_file", "record_id_reference_file"]
        ]
    )
)

982290

In [48]:
(
    df_ops.compute(sim_record_links_at_least_partially_correct.correct.sum())
    / piks_assigned
)

0.9993759480397846

In [49]:
assert df_ops.compute(
    (
        df_ops.groupby_agg_small_groups(
            confirmed_piks_with_ground_truth,
            by="record_id_raw_input_file",
            agg_func=lambda x: x.record_id_reference_file.nunique(),
        )
        <= 1
    ).all()
)

In [50]:
# Using definition 3 -- at the PIK level
piks_at_least_partially_correct = df_ops.persist(
    piks_at_least_partially_correct.rename(
        columns={"record_id": "record_id_raw_input_file"}
    ).merge(
        confirmed_piks_with_ground_truth[
            ["record_id_raw_input_file", "module_name", "pass_name"]
        ],
        on="record_id_raw_input_file",
    )
)
piks_at_least_partially_correct

Unnamed: 0_level_0,record_id_raw_input_file,pik,correct,module_name,pass_name
npartitions=196,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
,string,string,bool[pyarrow],string,string
,...,...,...,...,...
...,...,...,...,...,...
,...,...,...,...,...
,...,...,...,...,...


In [51]:
# Accuracy by module -- note that this shows the opposite pattern (with the sample data)
# relative to the results of Layne et al., who found GeoSearch was much *more* accurate
df_ops.compute(
    piks_at_least_partially_correct.groupby("module_name")
    .correct.agg(["mean", "size"])
    .sort_values("mean")
)

Unnamed: 0_level_0,mean,size
module_name,Unnamed: 1_level_1,Unnamed: 2_level_1
hhcompsearch,0.997934,3873
dobsearch,0.998982,982
geosearch,0.999376,937419
namesearch,0.999525,40016


In [52]:
# Accuracy by pass -- could be used to tune pass-specific cutoffs, but
# this might not be too informative while we are still using the sample data.
df_ops.compute(
    piks_at_least_partially_correct.groupby(["module_name", "pass_name"])
    .correct.agg(["mean", "size"])
    .sort_values("mean")
)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,size
module_name,pass_name,Unnamed: 2_level_1,Unnamed: 3_level_1
namesearch,year of birth and first two characters of name,0.986842,76
hhcompsearch,year of birth,0.99658,2047
geosearch,geokey name switch,0.998187,2758
dobsearch,initials name switch,0.998249,571
namesearch,DOB and initials,0.998347,2420
geosearch,geokey,0.999257,756087
hhcompsearch,initials,0.999452,1826
namesearch,DOB and NYSIIS of name,0.999627,37520
geosearch,house number and street name Soundex,0.999643,44851
geosearch,some name and DOB information,0.999985,133570


In [53]:
# Using definition 3 -- at the link level
df_ops.compute(
    sim_record_links_at_least_partially_correct.groupby("module_name")
    .correct.agg(["mean", "size"])
    .sort_values("mean")
)

Unnamed: 0_level_0,mean,size
module_name,Unnamed: 1_level_1,Unnamed: 2_level_1
hhcompsearch,0.997934,3873
dobsearch,0.998982,982
geosearch,0.999376,937419
namesearch,0.999525,40016


In [54]:
df_ops.compute(
    sim_record_links_at_least_partially_correct.groupby(["module_name", "pass_name"])
    .correct.agg(["mean", "size"])
    .sort_values("mean")
)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,size
module_name,pass_name,Unnamed: 2_level_1,Unnamed: 3_level_1
namesearch,year of birth and first two characters of name,0.986842,76
hhcompsearch,year of birth,0.99658,2047
geosearch,geokey name switch,0.998187,2758
dobsearch,initials name switch,0.998249,571
namesearch,DOB and initials,0.998347,2420
geosearch,geokey,0.999257,756087
hhcompsearch,initials,0.999452,1826
namesearch,DOB and NYSIIS of name,0.999627,37520
geosearch,house number and street name Soundex,0.999643,44851
geosearch,some name and DOB information,0.999985,133570


In [55]:
df_ops.compute(
    sim_record_links_at_least_partially_correct[
        ~sim_record_links_at_least_partially_correct.correct
    ]
    .groupby(["module_name", "pass_name"])
    .size()
).sort_values()

module_name   pass_name                                     
dobsearch     initials name switch                                1
hhcompsearch  initials                                            1
namesearch    year of birth and first two characters of name      1
geosearch     some name and DOB information                       2
namesearch    DOB and initials                                    4
geosearch     geokey name switch                                  5
hhcompsearch  year of birth                                       7
namesearch    DOB and NYSIIS of name                             14
geosearch     house number and street name Soundex               16
              geokey                                            562
dtype: int64

### Incorrect and missed PIKs

In [56]:
incorrectly_linked_pairs = df_ops.persist(
    df_ops.drop_duplicates(
        sim_record_links_at_least_partially_correct[
            ~sim_record_links_at_least_partially_correct.correct
        ][["record_id_raw_input_file", "record_id_reference_file"]]
    )
)
incorrectly_linked_pairs

Unnamed: 0_level_0,record_id_raw_input_file,record_id_reference_file
npartitions=392,Unnamed: 1_level_1,Unnamed: 2_level_1
,string,string
,...,...
...,...,...
,...,...
,...,...


In [57]:
len(incorrectly_linked_pairs)

613

In [58]:
incorrect_links = df_ops.head(incorrectly_linked_pairs, n=100)
incorrect_links

Unnamed: 0,record_id_raw_input_file,record_id_reference_file
0,simulated_census_2030_0_10141,simulated_geobase_reference_file_19_28111
1,simulated_census_2030_0_105524,simulated_geobase_reference_file_39_5682
0,simulated_census_2030_0_105915,simulated_geobase_reference_file_2_41002
1,simulated_census_2030_0_10709,simulated_geobase_reference_file_4_2883
2,simulated_census_2030_0_109733,simulated_geobase_reference_file_45_34511
...,...,...
3,simulated_census_2030_0_231072,simulated_geobase_reference_file_42_18818
0,simulated_census_2030_0_233315,simulated_geobase_reference_file_10_27689
1,simulated_census_2030_0_233497,simulated_geobase_reference_file_53_12651
0,simulated_census_2030_0_233575,simulated_geobase_reference_file_19_18641


In [59]:
%xdel incorrectly_linked_pairs

In [60]:
comparison_cols = [
    "first_name",
    "middle_name",
    "last_name",
    "date_of_birth",
    "street_number",
    "street_name",
    "unit_number",
    "city",
    "state",
]

incorrect_links_detail = incorrect_links.merge(
    df_ops.compute(
        census_2030_piked[
            census_2030_piked.record_id.isin(incorrect_links.record_id_raw_input_file)
        ]
    ).rename(
        columns={
            "record_id": "record_id_raw_input_file",
            "middle_initial": "middle_name",
        }
    )[
        ["record_id_raw_input_file"] + comparison_cols
    ],
    on="record_id_raw_input_file",
    how="left",
).merge(
    df_ops.compute(
        reference_file[
            reference_file.record_id.isin(incorrect_links.record_id_reference_file)
        ]
    )
    .rename(columns={"record_id": "record_id_reference_file"})
    .rename(columns=lambda c: c.replace("mailing_address_", ""))[
        ["record_id_reference_file"] + comparison_cols
    ],
    on="record_id_reference_file",
    how="left",
    suffixes=("_census", "_reference_file"),
)


def flatten(xss):
    return [x for xs in xss for x in xs]


incorrect_links_detail[
    flatten([(f"{c}_census", f"{c}_reference_file") for c in comparison_cols])
]

Unnamed: 0,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
0,,Carl,J,George,Rice,Rice,09/24/1941,19410924,4100,4100,ne 19 st,NE 19 ST,,,middletown,MIDDLETOWN,RI,RI
1,Freya,Sarah,M,Marcia,Smith,Smith,11/09/2028,20281109,1059,1059,sw 15th st,SW 15TH ST,,,warren,WARREN,RI,RI
2,Scarlett,Jennifer,S,Shayla,Boyle,Boyle,03/13/1992,19920313,1018,1018,heather mist,HEATHER MIST,,,pawtucket,PAWTUCKET,RI,RI
3,,Jameson,M,Sister,Martin,Martin,03/20/2008,20080320,648,648,rosalind ave,ROSALIND AVE,,,smithfield,SMITHFIELD,RI,RI
4,Sarah,Brooklyn,A,Audrey,Kirby,Kirby,04/04/2022,20220404,6905,6905,godwin boulevard,GODWIN BOULEVARD,,,warwick,WARWICK,RI,RI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Garrett,,C,Rodney,Sivils,Sivils,03/16/1969,19690316,4744,4744,liberty st,LIBERTY ST,,,middletown,MIDDLETOWN,RI,RI
96,Colton,Colin,D,Augustus,Rodriguez,Rodriguez,07/08/2001,20010708,11,,lowell cir,,,,north smithfield,LINCOLN,RI,RI
97,Millie,Connie,A,Angela,Fulkerson,Fulkerson,06/17/1966,19660617,8345,8345,redd shop rd,REDD SHOP RD,,,providence,PROVIDENCE,RI,RI
98,Lexi,Vicente,,Ezekiel,Cushenberry,Cushenberry,01/28/2025,20250128,310,310,s muskego av,S MUSKEGO AV,ap 9,AP 9,cumberland,CUMBERLAND,RI,RI


In [61]:
missed_links = df_ops.persist(
    census_2030_piked[census_2030_piked.pik.isnull()][["record_id"]]
    .merge(census_2030_ground_truth, on="record_id")
    .merge(
        reference_files_ground_truth[
            reference_files_ground_truth.n_unique_simulants == 1
        ],
        on="simulant_id",
        suffixes=("_census", "_reference_file"),
    )
)

In [62]:
len(missed_links)

285682

In [63]:
simulants_missed = df_ops.head(
    missed_links[["simulant_id"]], n=100
).simulant_id.unique()
simulants_missed

<ArrowStringArray>
[  '446_413479',  '1219_176298',  '2863_430981',  '3298_854796',
  '4561_608363',  '3528_397660',  '5812_765316',   '6991_86918',
  '4344_401701',  '5188_162962',  '1832_850867',  '2277_153565',
  '3357_333795',   '5628_98457',  '6520_109176',  '4621_579210',
  '7817_985124', '9159_1174439',  '3723_959309', '4941_1189209',
   '5628_60718', '6539_1178939',     '8305_416',     '99_73298',
    '682_49973', '7179_1069048', '1597_1188844',  '3624_119898',
  '7344_892187',  '9772_621190',  '2721_917949', '3298_1082703',
  '9740_924180',  '5875_557972',  '9526_137552', '3621_1148314',
 '4802_1177756', '8291_1049376',  '440_1076142']
Length: 39, dtype: string

In [64]:
missed_pairs = df_ops.compute(
    missed_links[missed_links.simulant_id.isin(list(simulants_missed))]
)
missed_pairs

Unnamed: 0,record_id_census,simulant_id,possible_to_pik,record_id_reference_file,n_unique_simulants
0,simulated_census_2030_0_49755,446_413479,1.0,simulated_geobase_reference_file_59_17351,1
1,simulated_census_2030_0_49755,446_413479,1.0,simulated_name_dob_reference_file_19_33351,1
2,simulated_census_2030_0_49755,446_413479,1.0,simulated_geobase_reference_file_59_17350,1
3,simulated_census_2030_0_49755,446_413479,1.0,simulated_geobase_reference_file_59_17352,1
4,simulated_census_2030_0_134475,1219_176298,1.0,simulated_name_dob_reference_file_7_2918,1
...,...,...,...,...,...
96,simulated_census_2030_0_44149,440_1076142,1.0,simulated_geobase_reference_file_13_39804,1
97,simulated_census_2030_0_44149,440_1076142,1.0,simulated_geobase_reference_file_13_39802,1
98,simulated_census_2030_0_44149,440_1076142,1.0,simulated_geobase_reference_file_13_39801,1
99,simulated_census_2030_0_44149,440_1076142,1.0,simulated_geobase_reference_file_13_39803,1


In [65]:
%xdel missed_links

In [66]:
missed_links_detail = missed_pairs.merge(
    df_ops.compute(
        census_2030_piked[
            census_2030_piked.record_id.isin(list(missed_pairs.record_id_census))
        ]
    ).rename(
        columns={"record_id": "record_id_census", "middle_initial": "middle_name"}
    ),
    on="record_id_census",
).merge(
    df_ops.compute(
        reference_file[
            reference_file.record_id.isin(missed_pairs.record_id_reference_file)
        ]
    )
    .rename(columns=lambda c: c.replace("mailing_address_", ""))
    .rename(columns={"record_id": "record_id_reference_file"}),
    on="record_id_reference_file",
    suffixes=("_census", "_reference_file"),
)

In [67]:
for simulant in simulants_missed:
    print(simulant)
    display(
        missed_links_detail[missed_links_detail.simulant_id == simulant][
            ["simulant_id"]
            + flatten([(f"{c}_census", f"{c}_reference_file") for c in comparison_cols])
        ]
    )

446_413479


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
0,446_413479,Jennifer,Jennifer,D,Dana,Hunter,Hunter,08/15/1968,,2307,1607.0,ne jack london st,NE JACK LONDON ST,,,pawtucket,PAWTUCKET,RI,RI
1,446_413479,Jennifer,Jennifer,D,Dana,Hunter,Hunter,08/15/1968,,2307,,ne jack london st,,,,pawtucket,,RI,
2,446_413479,Jennifer,Jennifer,D,Dana,Hunter,Hunter,08/15/1968,,2307,2307.0,ne jack london st,NE JACK LONDON ST,,,pawtucket,PAWTUCKET,RI,OH
3,446_413479,Jennifer,Jennifer,D,Dana,Hunter,Hunter,08/15/1968,,2307,2307.0,ne jack london st,NE JACK LONDON ST,,,pawtucket,PAWTUCKET,RI,RI


1219_176298


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
4,1219_176298,Ella,Ella,J,Jordan,Lopez,Lopez,06/18/1989,20120903,509,,meigs street,,lot 3r,,narragansett,,RI,
5,1219_176298,Ella,Ella,J,Jordan,Lopez,Lopez,06/18/1989,20120903,509,509.0,meigs street,MEIGS STREET,lot 3r,LOT 3R,narragansett,NARRAGANSETT,RI,RI


2863_430981


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
6,2863_430981,Michael,Michael,R,Roberto,Wong,Wong,04/30/1983,19610611,10802,10802.0,spring mountain way,SPRING MOUNTAIN WAY,,,barrington,BARRINGTON,RI,RI
7,2863_430981,Michael,Michael,R,Roberto,Wong,Wong,04/30/1983,19610611,10802,,spring mountain way,,,,barrington,,RI,
8,2863_430981,Michael,Michael,R,Roberto,Wong,Wong,04/30/1983,19610611,10802,10802.0,spring mountain way,SPRING MOUNTAIN WAY,,,barrington,BARAHINGTON,RI,RI


3298_854796


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
9,3298_854796,Sharlene,Sharlene,,Angel,A Royster,Royster,03/06/1981,19810306,1410,,pansy rd,,,,portsmouth,,RI,
10,3298_854796,Sharlene,Sharlene,,Angel,A Royster,Royster,03/06/1981,19810306,1410,1410.0,pansy rd,PANSY RD,,,portsmouth,PORTSMOUTH,RI,RI


4561_608363


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
11,4561_608363,Kaylee,,,Angela,James,James,02/28/1985,19850228,,145.0,e hawthorne st,E HAWTHORNE ST,# 368,# 368,westerly,WESTERLY,RI,RI
12,4561_608363,Kaylee,,,Angela,James,James,02/28/1985,19850228,,,e hawthorne st,,# 368,,westerly,,RI,


3528_397660


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
13,3528_397660,Nathan,Nathan,Q,Q,Loh,Lope,03/30/1995,,4763,,mdw l,,,,central falls,,RI,
14,3528_397660,Nathan,Nathan,Q,Q,Loh,Lope,03/30/1995,,4763,4763.0,mdw l,MDW L,,,central falls,CENTRAL FALLS,RI,


5812_765316


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
15,5812_765316,Campbell,Cam,E,Evelyn,King,King,03/30/2006,,1833,,john stre,,,,providence,,RI,
16,5812_765316,Campbell,Cam,E,Evelyn,King,King,03/30/2006,,1833,1833.0,john stre,JOHN STRE,,,providence,PROVIDENCE,RI,RI


6991_86918


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
17,6991_86918,Maureen,Maureen,S,Shelley,Barnes,Barnes,07/12/1964,19641207,20,20.0,mayflower road,MAYFLOWER ROAD,,,cranston,CRANSTON,RI,RI
18,6991_86918,Maureen,Maureen,S,Shelley,Barnes,Barnes,07/12/1964,19641207,20,,mayflower road,,,,cranston,,RI,


4344_401701


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
19,4344_401701,Phillip,Phillip,G,Glenn,Ostlie,Ostlie,07/13/1980,19800713,5524,5524.0,cedar crest village drive,CEDAR CREST VILLAGE DRIVE,,,johnston,JOHNSTON,RI,RI
20,4344_401701,Phillip,Phillip,G,Glenn,Ostlie,Ostlie,07/13/1980,19800713,5524,5524.0,cedar crest village drive,CEDAR CREST VILLAGE DRIVE,,,johnston,JOHNSTON,RI,
21,4344_401701,Phillip,Phillip,G,Glenn,Ostlie,Ostlie,07/13/1980,19800713,5524,,cedar crest village drive,,,,johnston,,RI,
22,4344_401701,Phillip,Phillip,G,Glenn,Ostlie,Ostlie,07/13/1980,19800713,5524,5524.0,cedar crest village drive,CEDAR CREST VILLAGE DRIVE,,,johnston,JOHNSTON,RI,RI


5188_162962


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
23,5188_162962,Justin,Jonathan,J,Justin,Morgan,Morgan,,20031028,1953,1953.0,lincoln raod,LINCOLN RAOD,,,portsmouth,PORTSMOUTH,RI,RI
24,5188_162962,Justin,Jonathan,J,Justin,Morgan,Morgan,,20031028,1953,,lincoln raod,,,,portsmouth,,RI,
25,5188_162962,Justin,Jonathan,J,Justin,Morgan,Morgan,,20031028,1953,1953.0,lincoln raod,LINCOLN RAOD,,,portsmouth,PORTSMOUTH,RI,RI


1832_850867


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
26,1832_850867,Heather,Heather,H,Hailey,Syeil,Shell,07/30/2006,20060730,7304,,ventura blvd,,,,lincoln,,RI,
27,1832_850867,Heather,Heather,H,Hailey,Syeil,Shell,07/30/2006,20060730,7304,,ventura blvd,,,,lincoln,,RI,


2277_153565


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
28,2277_153565,Keith,Keith,R,Roberto,Thomas,Thomas,01/05/1964,19640105,502,502.0,hunter ln,HUNTER LN,,,warwick,WARWICK,RI,RI
29,2277_153565,Keith,Keith,R,Roberto,Thomas,Thomas,01/05/1964,19640105,502,,hunter ln,,,,warwick,,RI,


3357_333795


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
30,3357_333795,Alexandra,Alexandra,A,Abigail,Coh,Keel,06/10/2025,20150610,,,short st,,,,north providence,,RI,
31,3357_333795,Alexandra,Alexandra,A,Abigail,Coh,Keel,06/10/2025,20150610,,,short st,SHORT ST,,,north providence,NORTH PROVIDENCE,RI,RI


5628_98457


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
32,5628_98457,,Mitch,C,Connor,Boyer,Boyer,04/09/1996,,1724,1724.0,s westwood ave,S WESTWOOD AVE,,,hopkinton,HOPKINTON,RI,RI
33,5628_98457,,Mitch,C,Connor,Boyer,Boyer,04/09/1996,,1724,,s westwood ave,,,,hopkinton,,RI,


6520_109176


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
34,6520_109176,Destiny,,E,Emily,Vargas,Vargas,03/18/2010,20100318,3501,,e 5675 s,,,,cranston,,RI,
35,6520_109176,Destiny,,E,Emily,Vargas,Vargas,03/18/2010,20100318,3501,,e 5675 s,,,,cranston,PROVIDENCE,RI,RI


4621_579210


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
36,4621_579210,Jon,Jon,L,Larry,Rooks,Rooks,01/03/1950,19461214,350,350.0,adams avenue,ADAMS AVENUE,unit f,UNIT F,warren,WARREN,RI,RI
37,4621_579210,Jon,Jon,L,Larry,Rooks,Rooks,01/03/1950,19461214,350,350.0,adams avenue,ADAMZ AVENUE,unit f,UNIT F,warren,WARREN,RI,RI
38,4621_579210,Jon,Jon,L,Larry,Rooks,Rooks,01/03/1950,19461214,350,,adams avenue,,unit f,,warren,,RI,


7817_985124


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
39,7817_985124,Nyla,Nyla,M,Maria,Nardin,Nardin,04/04/1997,19630719,7,7.0,state street,STATE STREET,,,providence,PROVIDENCE,RI,RI
40,7817_985124,Nyla,Nyla,M,Maria,Nardin,Nardin,04/04/1997,19630719,7,7.0,state street,STATE STRFET,,,providence,PROVIDENCE,RI,RI
41,7817_985124,Nyla,Nyla,M,Maria,Nardin,Nardin,04/04/1997,19630719,7,,state street,,,,providence,,RI,
42,7817_985124,Nyla,Nyla,M,Maria,Nardin,Nardin,04/04/1997,19630719,7,7.0,state street,STATE STREET,,,providence,PROVIDENCE,RI,RI


9159_1174439


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
43,9159_1174439,August,August,S,Sebastian,Dante,Dante,10/31/2029,20281207,5581,5581.0,e lincoln ave,E LINCOLN AVE,,,burrillville,BURRILLVILLE,RI,RI
44,9159_1174439,August,August,S,Sebastian,Dante,Dante,10/31/2029,20281207,5581,,e lincoln ave,,,,burrillville,,RI,


3723_959309


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
45,3723_959309,Lily,Lily,R,Riya,Mula,Mula,08/12/2006,20061208,602,602.0,tumbleweed way,TUMBLEWEED WAY,,,newport,NEWPORT,RI,RI
46,3723_959309,Lily,Lily,R,Riya,Mula,Mula,08/12/2006,20061208,602,602.0,tumbleweed way,TUMBLEWEED WAY,,,newport,NEWPORT,RI,SC
47,3723_959309,Lily,Lily,R,Riya,Mula,Mula,08/12/2006,20061208,602,,tumbleweed way,,,,newport,,RI,


4941_1189209


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
48,4941_1189209,Abigail,Abigail,D,Deanna,White,White,10/04/2029,20050711,15542,,stafford rd,,,,smithfield,,RI,
49,4941_1189209,Abigail,Abigail,D,Deanna,White,White,10/04/2029,20050711,15542,,stafford rd,,,,smithfield,,RI,


5628_60718


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
50,5628_60718,Elizabet,Elizabeth,S,Shannon,Davis,Davis,,19970818,212,,n westlawn ave,,,,richmond,EXETER,RI,RI
51,5628_60718,Elizabet,Elizabeth,S,Shannon,Davis,Davis,,19970818,212,,n westlawn ave,,,,richmond,,RI,


6539_1178939


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
52,6539_1178939,Benjamin A,Benjamin,,Augustine,Mitro,Mitro,03/16/2129,20290316,69,,targee stree,,,,providence,,RI,
53,6539_1178939,Benjamin A,Benjamin,,Augustine,Mitro,Mitro,03/16/2129,20290316,69,,targee stree,,,,providence,,RI,


8305_416


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
54,8305_416,Brittny,Brittny,J,J,Llamas Jacobs,Llam,12/15/1990,,12730,,airport rd,,,,cranston,,RI,
55,8305_416,Brittny,Brittny,J,J,Llamas Jacobs,Llam,12/15/1990,,12730,12730.0,airport rd,AIRPORT RD,,,cranston,CRANSTON,RI,RI


99_73298


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
56,99_73298,Carol,Carol,K,K,Oena,Pena,10/01/1959,,1525,1525.0,west howard street,WEST HOWARD STREET,,,westerly,WESTERLY,RI,RI
57,99_73298,Carol,Carol,K,K,Oena,Pena,10/01/1959,,1525,,west howard street,,,,westerly,,RI,
58,99_73298,Carol,Carrie,K,K,Oena,Pena,10/01/1959,,1525,,west howard street,,,,westerly,,RI,
59,99_73298,Carol,Carrie,K,K,Oena,Pena,10/01/1959,,1525,1525.0,west howard street,WEST HOWARD STREET,,,westerly,WESTERLY,RI,RI


682_49973


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
60,682_49973,Samuel,Samuel,C,Christian,Vangent,Vangent,10/07/1991,19960710,506,506.0,93rd blvd ne,93RD BLVD NE,,,cumberland,KUMBERLEAND,RI,RI
61,682_49973,Samuel,Samuel,C,Christian,Vangent,Vangent,10/07/1991,19960710,506,506.0,93rd blvd ne,93RD BLVD NE,,,cumberland,CUMBERLAND,RI,RI
62,682_49973,Samuel,Samuel,C,Christian,Vangent,Vangent,10/07/1991,19960710,506,,93rd blvd ne,,,,cumberland,,RI,


7179_1069048


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
63,7179_1069048,Maia,Maia,L,Lilyana,Henry,Henry,07/21/1964,20221216,2509,,westwood dr,,,,little compton,,NH,
64,7179_1069048,Maia,Maia,L,Lilyana,Henry,Henry,07/21/1964,20221216,2509,2509.0,westwood dr,WESTWOOD DR,,,little compton,LITTLE CORNPTON,NH,RI
65,7179_1069048,Maia,Maia,L,Lilyana,Henry,Henry,07/21/1964,20221216,2509,2509.0,westwood dr,WESTWOOD DR,,,little compton,LITTLE COMPTON,NH,RI


1597_1188844


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
66,1597_1188844,G,Dan,C,Connor,Abramowitz,Abramowitz,09/25/2029,20290925,7003,,sherwlod ave,,,,block island,,RI,
67,1597_1188844,G,Dan,C,Connor,Abramowitz,Abramowitz,09/25/2029,20290925,7003,7003.0,sherwlod ave,SHERWOOD AVE,,,block island,BLOCK ISLAND,RI,RI


3624_119898


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
68,3624_119898,Erin,Erin,E,Erica,Alley,Alley,10/17/1992,20261112,186,186.0,paxton st,PAXTON ST,,,west warwick,WEST WARWICK,RI,RI
69,3624_119898,Erin,Erin,E,Erica,Alley,Alley,10/17/1992,20261112,186,,paxton st,,,,west warwick,,RI,
70,3624_119898,Erin,Erin,E,Erica,Alley,Alley,10/17/1992,20261112,186,186.0,paxton st,PAXTON ST,,,west warwick,WEST WARWICK,RI,RI


7344_892187


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
71,7344_892187,Robert,Robert,S,Sean,Kim,Of House,02/14/2604,20040214,5,,delaware avenu,,,,exeter,,RI,
72,7344_892187,Robert,Robert,S,Sean,Kim,Of House,02/14/2604,20040214,5,5.0,delaware avenu,DELAWARE AVENU,,,exeter,EXETER,RI,RI


9772_621190


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
73,9772_621190,Lynn,Lynn,K,Kathleen,Farrell,Farrell,02/18/1959,19590608,24831,24831.0,s avenue h,S AVENUE H,apartment 469,APARTMENT 469,north kingstown,NORTH KINGSTOWN,RI,RI
74,9772_621190,Lynn,Lynn,K,Kathleen,Farrell,Farrell,02/18/1959,19590608,24831,,s avenue h,,apartment 469,,north kingstown,,RI,


2721_917949


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
75,2721_917949,Gavin,Gavin,J,Justice,Moreno,Moreno,12/04/2002,20020412,5530,5530.0,e lee st,E LEE ST,,,portsmouth,,RI,RI
76,2721_917949,Gavin,Gavin,J,Justice,Moreno,Moreno,12/04/2002,20020412,5530,,e lee st,,,,portsmouth,,RI,
77,2721_917949,Gavin,Gavin,J,Justice,Moreno,Moreno,12/04/2002,20020412,5530,5530.0,e lee st,E LEE ST,,,portsmouth,PORTSMOUTH,RI,RI
78,2721_917949,Gavin,Gavin,J,Justice,Moreno,Moreno,12/04/2002,20020412,5530,5530.0,e lee st,E LEE ST,,,portsmouth,PORTSMOUTH,RI,RI


3298_1082703


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
79,3298_1082703,Genevieve,Genevieve,Z,Zoe,Mahoney,Mahoney,09/10/2023,20231009,1410,1410.0,pansy rd,PANSY RD,,,portsmouth,PORTSMOUTH,RI,RI
80,3298_1082703,Genevieve,Genevieve,Z,Zoe,Mahoney,Mahoney,09/10/2023,20231009,1410,,pansy rd,,,,portsmouth,,RI,
81,3298_1082703,Genevieve,Genevieve,Z,Zoe,Mahoney,Mahoney,09/10/2023,20231009,1410,1410.0,pansy rd,PANSY RD,,,portsmouth,PODTWMOUTH,RI,RI


9740_924180


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
82,9740_924180,Lucia,Lucia,R,Rhonda,Duffy,The House,11/26/1961,,11460,,larimer ave,,,,charlestown,NEWPORT,RI,RI
83,9740_924180,Lucia,Lucia,R,Rhonda,Duffy,The House,11/26/1961,,11460,,larimer ave,,,,charlestown,,RI,


5875_557972


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
84,5875_557972,L,Penelope,Penelope,Lily,Smith,Smith,01/13/2017,20170113,39,39.0,gate house trl,GATE HOUSE TRL,,,cranston,CRANSTON,RI,RI
85,5875_557972,L,Penelope,Penelope,Lily,Smith,Smith,01/13/2017,20170113,39,,gate house trl,,,,cranston,,RI,


9526_137552


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
86,9526_137552,Juanita,Juanita,T,Tammy,Joseph,Joseph,03/08/2027,19690713,9877,9877.0,burt st,BURT ST,,,south kingstown,SOUTH KINGSTOWN,RI,RI
87,9526_137552,Juanita,Juanita,T,Tammy,Joseph,Joseph,03/08/2027,19690713,9877,,burt st,BURT ST,,,south kingstown,SOUTH KINGSTOWN,RI,NH
88,9526_137552,Juanita,Juanita,T,Tammy,Joseph,Joseph,03/08/2027,19690713,9877,9877.0,burt st,BURT ST,,,south kingstown,SOUTH KINGSTOWN,RI,RI
89,9526_137552,Juanita,Juanita,T,Tammy,Joseph,Joseph,03/08/2027,19690713,9877,,burt st,,,,south kingstown,,RI,


3621_1148314


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
90,3621_1148314,Bree,Bree,S,Shannon,Touchstone,Touchstone,08/10/1985,19820217,850,850.0,,ARLINGTON AVENUE,,,middletown,MIDDLETOWN,RI,RI
91,3621_1148314,Bree,Bree,S,Shannon,Touchstone,Touchstone,08/10/1985,19820217,850,,,,,,middletown,,RI,


4802_1177756


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
92,4802_1177756,Nephew,Zachary,A,Alan,Cox,Cox,03/22/2005,20020622,,,saner rd,SANER RD,,,cumberland,CUMBERLAND,RI,RI
93,4802_1177756,Nephew,Zachary,A,Alan,Cox,Cox,03/22/2005,20020622,,,saner rd,,,,cumberland,,RI,


8291_1049376


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
94,8291_1049376,Daniel,Daniel,J,Jeffrey,,Landini,02/13/1974,19700213,438,,drayton rd,,,,tiverton,,,
95,8291_1049376,Daniel,Daniel,J,Jeffrey,,Landini,02/13/1974,19700213,438,438.0,drayton rd,DRAYTON RD,,,tiverton,TIVERTON,,RI


440_1076142


Unnamed: 0,simulant_id,first_name_census,first_name_reference_file,middle_name_census,middle_name_reference_file,last_name_census,last_name_reference_file,date_of_birth_census,date_of_birth_reference_file,street_number_census,street_number_reference_file,street_name_census,street_name_reference_file,unit_number_census,unit_number_reference_file,city_census,city_reference_file,state_census,state_reference_file
96,440_1076142,Hudson,Hudsog,N,Nolan,Kemmerer,Kemmerer,,20230430,580,580.0,school st,SCHOOL ST,,,johnston,JOHNSTON,RI,ND
97,440_1076142,Hudson,Hudsog,N,Nolan,Kemmerer,Kemmerer,,20230430,580,580.0,school st,SCHOOL ST,,,johnston,JOHNSYOH,RI,RI
98,440_1076142,Hudson,Hudsog,N,Nolan,Kemmerer,Kemmerer,,20230430,580,580.0,school st,ACHIOL ST,,,johnston,JOHNSTON,RI,RI
99,440_1076142,Hudson,Hudsog,N,Nolan,Kemmerer,Kemmerer,,20230430,580,580.0,school st,SCHOOL ST,,,johnston,JOHNSTON,RI,RI
100,440_1076142,Hudson,Hudsog,N,Nolan,Kemmerer,Kemmerer,,20230430,580,,school st,,,,johnston,,RI,
