# Impossible relationships

My assumption is that the way relationships are coded in ACS, there should be no more than 1 spouse and no more than 2 parents in each household.
This notebook checks that assumption. I also aim to determine if any other relationships have logical limits, though it is not obvious how to conclusively decide this.

## Load ACS

In [1]:
import pandas as pd, numpy as np

! whoami
! date

zmbc
Thu Sep 29 09:44:20 PDT 2022


In [2]:
acs = pd.read_hdf('../../data/acs_2020_5yr.hdf', key='acs')

In [3]:
print(f'{len(acs):,}')

15,441,673


## Relationship code meanings
|Code|Relationship|
|---|---|
|20|Reference person|
|21|Opposite-sex husband/wife/spouse|
|22|Opposite-sex unmarried partner|
|23|Same-sex husband/wife/spouse|
|24|Same-sex unmarried partner|
|25|Biological son or daughter|
|26|Adopted son or daughter|
|27|Stepson or stepdaughter|
|28|Brother or sister|
|29|Father or mother|
|30|Grandchild|
|31|Parent-in-law|
|32|Son-in-law or daughter-in-law|
|33|Other relative|
|34|Roommate or housemate|
|35|Foster child|
|36|Other nonrelative|
|37|Institutionalized group quarters population|
|38|Noninstitutionalized group quarters population|

## Maximum number of each individual relationship code, and how many times the maximum occurs

In relationships with logical limits, I'd expect to see a low maximum, and many households at the maximum.

In [4]:
(
    acs.groupby(['SERIALNO', 'RELSHIPP']).size().rename('num')
        .reset_index()
        .groupby('RELSHIPP').apply(lambda x: pd.Series({'max': x.num.max(), 'num_max': len(x[x.num == x.num.max()])}))
)

Unnamed: 0_level_0,max,num_max
RELSHIPP,Unnamed: 1_level_1,Unnamed: 2_level_1
20,1,6017646
21,1,3044255
22,1,330391
23,1,30221
24,1,20761
25,18,1
26,13,1
27,14,1
28,11,1
29,4,10


## Parent relationship

As seen above, there can be up to 4 parents for a single reference person! How common is this?

In [5]:
acs[acs.RELSHIPP == 29].groupby('SERIALNO').size().value_counts()

1    113168
2     24669
3        30
4        10
dtype: int64

Very uncommon; **it seems reasonable for us to enforce that this never happens**.

Are the parents being duplicated somehow?

In [6]:
acs[acs.SERIALNO == '2019HU1185509'][['AGEP', 'SEX', 'RELSHIPP', 'FOD1P', 'INDP']]

Unnamed: 0,AGEP,SEX,RELSHIPP,FOD1P,INDP
2194684,40,1,20,,
2194685,30,2,21,,7970.0
2194686,11,1,25,,
2194687,11,1,25,,
2194688,8,1,25,,
2194689,6,1,25,,
2194690,63,2,29,,8990.0
2194691,63,1,29,,4971.0
2194692,63,2,29,,8370.0
2194693,62,1,29,,4971.0


Maybe suspicious of duplication, given narrow age band (all four parents 62-63) but they do all differ on at least one attribute.

## Spouse relationships

In [7]:
# There is never more than one partner/spouse, across all partner/spouse relationship types.
assert np.all(acs[acs.RELSHIPP.isin([21, 22, 23, 24])].groupby('SERIALNO').size() == 1)