#### SIPP1991 401K data

#### logs

5/15/2024 WL: template provided. TODO: write a brief intro to the data discussing the background, and the meaning of the variables; do a bit data summary and regression along the similar lines of ldw data.


Data source: SIPP 1991 (Abadie, 2003), pages 231-263  
This paper investigates the effect of 401(k) programs on savings. To combat selection bias, they use a new statistical method that accounts for individual differences in benefit from 401(k), works with continuous outcomes (like amount saved) and avoids strong assumptions on variable relationships.   
The variables in the data set include:
\begin{align*}
%\begin{array}{ll}
\hline \text { Variable } & \quad \text { Description } \\
\hline \text { net tfa } &\quad  \text { Net total financial assets } \\
\text { e401 } &\quad  \text { = 1 if employers offers 401(k) } \\
\text { p401 } &\quad  \text { = 1 if the employee participates 401(k) } \\
\text { age } & \quad \text { age } \\
\text { inc } &\quad  \text { income } \\
\text { fsize } &\quad  \text { family size } \\
\text { educ } &\quad  \text { years of education }\\
\text { db } &\quad  \text { = 1 if indivudual has defined benefit pension } \\
\text { marr } &\quad \text { = 1 if married } \\
\text { twoearn } &\quad \text { = 1 if two-earner household } \\
\text { pira } &\quad \text { = 1 if individual participates in IRA } \\
\text { hown } &\quad \text { = 1 if home owner } \\
\hline
%\end{array}
\end{align*}

The data consist of 9275 observations from the Survey of Income and Program Participation (SIPP) of 1991. It is restricted to households of persons aged 25-64, where at least one individual is employed and no persns are self-employed. Additionally, family income ranges beteen $10,000 - $200,000.

In [2]:
import io
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import requests
from data_process import load_rdata

# Print the current working directory
print("Current working directory: {0}".format(os.getcwd()))

Current working directory: c:\Users\I1000928\Projects\Personal-Projects\401K


In [15]:
fin_data = load_rdata(r"C:\Users\I1000928\Projects\Personal-Projects\401K\data\sipp1991.Rdata")
df = list(fin_data.values())[0]
df.head()

Unnamed: 0_level_0,nifa,net_tfa,tw,age,inc,fsize,educ,db,marr,twoearn,e401,p401,pira,hown
rownames,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,0.0,0.0,4500.0,47,6765.0,2,8,0,0,0,0,0,0,1
2,6215.0,1015.0,22390.0,36,28452.0,1,16,0,0,0,0,0,0,1
3,0.0,-2000.0,-2000.0,37,3300.0,6,12,1,0,0,0,0,0,0
4,15000.0,15000.0,155000.0,58,52590.0,2,16,0,1,1,0,0,0,1
5,0.0,0.0,58000.0,32,21804.0,1,11,0,0,0,0,0,0,1


In [None]:
#Data Summary

print(df.describe())
print("Number of people with zero net_tfa:", sum(df["net_tfa"] == 0))
print("Number of people not participating in 401k:", sum(df["p401"] == 0))

               nifa       net_tfa            tw          age            inc  \
count  9.915000e+03  9.915000e+03  9.915000e+03  9915.000000    9915.000000   
mean   1.392864e+04  1.805153e+04  6.381685e+04    41.060212   37200.623197   
std    5.490488e+04  6.352250e+04  1.115297e+05    10.344505   24774.288006   
min    0.000000e+00 -5.023020e+05 -5.023020e+05    25.000000   -2652.000000   
25%    2.000000e+02 -5.000000e+02  3.291500e+03    32.000000   19413.000000   
50%    1.635000e+03  1.499000e+03  2.510000e+04    40.000000   31476.000000   
75%    8.765500e+03  1.652450e+04  8.148750e+04    48.000000   48583.500000   
max    1.430298e+06  1.536798e+06  2.029910e+06    64.000000  242124.000000   

             fsize         educ           db         marr      twoearn  \
count  9915.000000  9915.000000  9915.000000  9915.000000  9915.000000   
mean      2.865860    13.206253     0.271004     0.604841     0.380837   
std       1.538937     2.810382     0.444500     0.488909     0.48