# SSN coverage

We need a rate at which to assign SSNs in these situations:
- Initializing simulants sampled from the PUMS
- New births during the simulation -- this is easy (basically 100%) according to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8712929/
- New immigrants during the simulation

In [1]:
import pandas as pd, numpy as np
import seaborn as sns
from numpy.random import default_rng
import scipy.stats as stats

pd.set_option('display.min_rows', 20)

! whoami
! date

zmbc
Tue Nov  8 16:42:14 PST 2022


In [2]:
acs = pd.read_hdf('../data/acs_2020_5yr_person.hdf', key='acs')

In [3]:
# Duplicate indices! In the future, should probably deal with this in download_acs!
# Filter to relevant columns to save memory
acs = acs[['SERIALNO', 'MIG', 'RELSHIPP', 'HISP', 'RAC1P', 'AGEP', 'SEX', 'NATIVITY', 'CIT', 'ST', 'PUMA', 'PWGTP']].reset_index(drop=True)

In [4]:
acs['immigrant_ever'] = (acs.NATIVITY == 2).astype(int)
acs['recent_immigrant'] = (acs.MIG == 2).astype(int)
acs['citizen'] = (acs.CIT != 5).astype(int)

## Initializing simulants

As noted above (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8712929/), SSN coverage is very high for US-born people, and has been for
quite some time.
Given the calculation below, we are fine just assuming that 100% of the people born in the US have an SSN.

In [5]:
# Rough guess of % of native-born population with an SSN
(
    acs[acs.immigrant_ever == 0]
        .assign(ssn_coverage_era=lambda x: np.where(x.AGEP < 2016 - 1952, 1, np.where(x.AGEP < 2016 - 1919, 0.90, 0.50)))
        .assign(ssn_covered=lambda x: x.PWGTP * x.ssn_coverage_era)
        .ssn_covered.sum()
    / acs[acs.immigrant_ever == 0].PWGTP.sum()
)

0.9828152570699517

What about for those not born in the US?

We will use the percentage of the foreign-born population that are undocumented (aka "unauthorized" in government parlance) immigrants to approximate the percentage without SSNs.
In reality, a small number of undocumented immigrants may have SSNs (especially if they were issued before 2001, when SSN assignment security was improved).
And, some *documented* immigrants may not have SSNs -- those who do not have work authorization.

We'll use the 2018 estimate of the undocumented population -- 11.4 million -- from the DHS: https://www.dhs.gov/sites/default/files/publications/immigration-statistics/Pop_Estimate/UnauthImmigrant/unauthorized_immigrant_population_estimates_2015_-_2018.pdf

In [6]:
1 - (11_400_000 / acs[(acs.immigrant_ever == 1)].PWGTP.sum())

0.7433111790615956

## New immigrants during the simulation

Again, we will assume that those born in the US have SSNs 100% of the time.

Again, for the foreign-born population, we assume that undocumented immigrants represent the percentage without SSN.
The DHS report does not include the annual flow, but this paper does: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6150478/.
Note that it is criticized in the DHS report for inaccuracy in the 1990s, but this doesn't matter as we are only using
the most recent year.

In 2017, the most recent year available, it reports that 463,190 undocumented immigrants were in the US who had entered in
the past year.
We assume all undocumented immigrants were not born in the US (otherwise, they would have citizenship).

It makes some sense that this number would be lower, given that almost everyone who entered the US before 1980
has legal status, which pushes the all-time number higher.

In [7]:
1 - (463_190 / acs[(acs.recent_immigrant == 1) & (acs.immigrant_ever == 1)].PWGTP.sum())

0.6252505058661051