In [143]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('dark_background')

# Exploratory Intro to NEETs

In this chapter, we'll take an exploratory look into NEETs, using the CPS monthly samples in 2024. To give this study the proper treatment, we'll need the following variables (all available variables can be [found here](https://cps.ipums.org/cps-action/variables/group)):

- Demographic variables: `AGE`, `SEX`, `RACE`, `HISPAN`, `NCHILD`, `DIFFANY`

- Geographic variables: `STATEFIP`

- Socioeconomic variables: `EMPSTAT`, `LABFORCE`, `SCHLCOLL`

These variables will be enough for our current purposes.

## Load in the data

Using `ipumspy` and a function (`get_CPS`) to simplify the process, we can get the data fairly easily (though it may take some time). Remember to assign your IPUMS API Key to an environmental variable labelled 'IPUMS_API_KEY'. In bash, for example...

    export IPUMS_API_KEY=2477c3178c3178247


In [144]:
import sys
import os
from ipumspy import readers

sys.path.append('..')

from scripts.clean_ipums import get_CPS

In [2]:
my_vars = ['AGE', 'SEX', 'RACE', 'HISPAN', 'NCHILD', 
           'DIFFANY', 'EMPSTAT', 'LABFORCE', 'SCHLCOLL', 'STATEFIP'] 

get_CPS(years=2024, 
        vars=my_vars, 
        filename='exploratory_NEETs_2024', 
        filepath='../datasets' 
        ) # extracting a csv file & xml, named '../datasets/exploratory_NEETs_2024.[csv.gz/xml]'

Let's load our dataset in now.

In [145]:
%%capture

ddi_cps24 = readers.read_ipums_ddi('../datasets/exploratory_NEETs_2024.xml')
cps24 = readers.read_microdata(ddi=ddi_cps24, filename='../datasets/exploratory_NEETs_2024.csv.gz')

Now, we can take a brief look at the data, but there's one more pesky thing. When we extracted, we also extracted the Annual Social and Economic Supplement [(ASEC)](https://www.census.gov/data/datasets/time-series/demo/cps/cps-asec.html) data, which we don't need. Let's remove that.

In [146]:
cps24 = cps24[cps24['ASECFLAG'].isna()]

We can now look at some basic characteristics of our data, and clean it up a bit.

In [147]:
cps24.shape

(794955, 22)

In [26]:
cps24.columns

Index(['YEAR', 'SERIAL', 'MONTH', 'HWTFINL', 'CPSID', 'ASECFLAG', 'ASECWTH',
       'STATEFIP', 'PERNUM', 'WTFINL', 'CPSIDP', 'CPSIDV', 'ASECWT', 'AGE',
       'SEX', 'RACE', 'NCHILD', 'HISPAN', 'EMPSTAT', 'LABFORCE', 'SCHLCOLL',
       'DIFFANY'],
      dtype='object')

Let's look at the gender split and average age in our data.

In [9]:
print('Share Male:' , np.mean(cps24['SEX'] == 1))

Share Male: 0.48874967765471


In [None]:
print('Median Age:', np.median(cps24['AGE']))

Median Age: 42.0


## Overall

Our key variables here are `EMPSTAT` and `SCHLCOLL`, as those will let us define whether an individual is employed or in school. The codes for each are as follows:

`EMPSTAT`:
- `0 == NIU`
- `1 == Armed Forces`
- `10,12 == Employed`
- `20-36 == Unemployed or Not In Labor Force`

`SCHLCOLL`:
- `0 == NIU`
- `1,2,3,4 == in  High School or College`
- `5 == Not in High School or College`

(note that `SCHLCOLL` is only applicable for individuals aged 16-24. Technically, ASEC has it available for 16-54, but let's limit to 16-24)

Let's get the NEET rate (weighted individuals not in school or employed / weighted individuals):
- by gender
- by age ranges


In [148]:
cps24['SCHLCOLL'].value_counts()

cps24 = cps24.query('EMPSTAT > 0 and SCHLCOLL > 0').copy() # remove NIU observations
print('num_obs:', cps24.shape[0])

num_obs: 375027


In [149]:
gender_codes = {1 : 'men', 2 : 'women'} # map gender codes
cps24['sex'] = cps24['SEX'].map(gender_codes)

cps24['NEET'] = 'not_neet'
cps24.loc[(cps24['EMPSTAT'].isin(range(20,37))) & (cps24['SCHLCOLL'] == 5), 'NEET'] = 'neet' # NEET conditions

age_ranges = [range(16,25), range(16,21), range(20,25)] # three age ranges to test

for i in age_ranges:
    filtered_df = cps24.query('AGE in @i').copy()
    agg_wt = filtered_df.groupby(['sex', 'NEET'])['WTFINL'].sum().unstack() # get weighted sum by gender and NEET status
    agg_wt= agg_wt.eval('neet_rate = neet / (neet + not_neet) * 100')
    print('age', str(i), sep='_')
    print('Men: {0}%'.format(agg_wt['neet_rate'].loc['men'].round(2)))
    print('Women: {0}%\n'.format(agg_wt['neet_rate'].loc['women'].round(2)))

age_range(16, 25)
Men: 14.92%
Women: 15.66%

age_range(16, 21)
Men: 14.48%
Women: 13.8%

age_range(20, 25)
Men: 15.72%
Women: 17.54%



This is about in line with my expectations, and similar to what [CEPR](https://cepr.net/publications/are-young-men-falling-behind-young-women-the-neet-rate-helps-shed-light-on-the-matter/) found: men are less likely to be NEETs, and NEET rates for both genders increase slightly by age. 14.9% of men aged 16-24 were not employed or in school in 2024, slightly less than the 15.7% of women.

## By race

Next, let's look at NEET rates by race and gender.

In [153]:
race_codes = [
    ((cps24['RACE'] == 100) & (cps24['HISPAN'] == 0)),
    ((cps24['RACE'] == 200) & (cps24['HISPAN'] == 0)),
    ((cps24['RACE'].isin(range(650, 653))) & (cps24['HISPAN'] == 0)),
    ((cps24['HISPAN'] > 0) & (cps24['HISPAN'] < 902))
]

race_choices = ['white', 'black', 'asian', 'hispanic']

cps24['race_cat'] = np.select(race_codes, race_choices, default = 'other')

filtered_df = cps24.query("AGE >= 16 and AGE <= 24")
agg_wt = filtered_df.groupby(['race_cat', 'sex', 'NEET'])['WTFINL'].sum().unstack() # get weighted sum by gender and NEET status
agg_wt= agg_wt.eval('neet_rate = neet / (neet + not_neet) * 100')
print('NEET rates by race and gender for 16-24 year-olds')
print(agg_wt['neet_rate'].round(2).unstack())

NEET rates by race and gender for 16-24 year-olds
sex         men  women
race_cat              
asian     13.56  13.23
black     21.64  19.00
hispanic  15.58  18.51
other     17.46  17.52
white     12.81  13.43


Intra-racial NEET rates are higher for White and Hispanic women, but they're lower for Black and Asian women. By race and gender, Black men have the highest NEET rates, with about one-in-five Black men aged 16-24 not being in education or employed; White men have the lowest NEET rates at about 13%.

## Geography

To get NEET rates by state, we'll be using `STATEFIP`, which has STATE codes for each individual in our set. 

`STATEFIP in range(1,57)` includes the fifty states and Washington D.C. (the codes occasionally skip a number). To get the corresponding names, we'll use the `us` and library.

In [160]:
import us 

state_names = us.states.mapping('fips', 'name') # dictionary with STATEFIP as key
state_names['11'] = 'District of Columbia' # DC isn't included, so manually add it
cps24['state'] = cps24['STATEFIP'].astype(str).str.zfill(2) 
cps24['state'] = cps24['state'].map(state_names)

filtered_df = cps24.query('AGE >= 16 and AGE <= 24')
agg_wt = filtered_df.groupby(['state', 'sex', 'NEET'])['WTFINL'].sum().unstack() # get weighted sum by gender and NEET status
agg_wt= agg_wt.eval('neet_rate = neet / (neet + not_neet) * 100')
agg_wt2 = agg_wt['neet_rate'].round(2).unstack()
print('NEET rates by race and gender for 16-24 year-olds')
print(agg_wt2)


NEET rates by race and gender for 16-24 year-olds
sex                     men  women
state                             
Alabama               13.35  19.87
Alaska                17.98  19.40
Arizona               15.87  14.13
Arkansas              17.93  19.36
California            15.16  15.95
Colorado              11.70  17.57
Connecticut           11.47  10.84
Delaware              12.01  14.07
District of Columbia  20.56  15.60
Florida               14.34  16.60
Georgia               17.19  15.44
Hawaii                12.02  13.58
Idaho                 16.80  13.37
Illinois              16.66  14.71
Indiana               14.84  15.25
Iowa                  10.34  11.52
Kansas                12.09   9.62
Kentucky              13.38  19.89
Louisiana             15.45  18.90
Maine                  9.41  12.12
Maryland              13.72  15.83
Massachusetts         13.38  10.25
Michigan              18.93  18.72
Minnesota              9.45   9.55
Mississippi           20.42  19.49
Misso

Let's look at the states with the maximum/minimum male/female NEET rates, and states where the NEET rate is higher/lower for men/women.

In [161]:
max_st_male = agg_wt2['men'].sort_values(ascending=False)
max_st_female = agg_wt2['women'].sort_values(ascending=False)

print('State with highest male NEET rate: {0} -- {1}%'.format(max_st_male.index[0], max_st_male.iloc[0]))
print('State with highest female NEET rate: {0} -- {1}%'.format(max_st_female.index[0], max_st_female.iloc[0]))

State with highest male NEET rate: New Mexico -- 20.58%
State with highest female NEET rate: Wyoming -- 23.59%


In [162]:
min_st_male = agg_wt2['men'].sort_values(ascending=True)
min_st_female = agg_wt2['women'].sort_values(ascending=True)

print('State with lowest male NEET rate: {0} -- {1}%'.format(min_st_male.index[0], min_st_male.iloc[0]))
print('State with lowest male NEET rate: {0} -- {1}%'.format(min_st_female.index[0], min_st_female.iloc[0]))

State with lowest male NEET rate: Maine -- 9.41%
State with lowest male NEET rate: Nebraska -- 9.04%


In [163]:
higher_with_men = agg_wt2[agg_wt2['men'] > agg_wt2['women']].sort_values(by='men', ascending=False)
print('States with higher male NEET rates (including {0} states) :\n {1}'.format(higher_with_men.shape[0] , higher_with_men))

States with higher male NEET rates (including 22 states) :
 sex                     men  women
state                             
New Mexico            20.58  13.64
District of Columbia  20.56  15.60
Mississippi           20.42  19.49
Nevada                19.95  14.64
Michigan              18.93  18.72
Rhode Island          17.66  11.22
West Virginia         17.64  16.60
South Carolina        17.58  16.57
Georgia               17.19  15.44
Idaho                 16.80  13.37
Illinois              16.66  14.71
Pennsylvania          16.04  13.91
Arizona               15.87  14.13
New York              15.86  14.52
Oregon                14.19  10.47
Utah                  13.81  13.52
Massachusetts         13.38  10.25
Vermont               12.54  11.25
Wisconsin             12.27   9.16
Kansas                12.09   9.62
Connecticut           11.47  10.84
Nebraska              11.33   9.04


In [164]:
higher_with_women = agg_wt2[agg_wt2['men'] < agg_wt2['women']].sort_values(by='women', ascending=False)
print('States with lower male NEET rates (including {0} states) :\n {1}'.format(higher_with_women.shape[0] , higher_with_women))

States with lower male NEET rates (including 28 states) :
 sex              men  women
state                      
Wyoming        13.05  23.59
Missouri       12.75  22.95
Tennessee      17.98  22.44
Kentucky       13.38  19.89
Alabama        13.35  19.87
Alaska         17.98  19.40
Arkansas       17.93  19.36
Louisiana      15.45  18.90
Oklahoma       15.19  18.86
Washington     13.46  17.80
Colorado       11.70  17.57
Texas          15.68  17.46
Virginia       14.48  16.96
Florida        14.34  16.60
Montana        14.75  16.50
California     15.16  15.95
North Dakota   13.28  15.90
Maryland       13.72  15.83
Indiana        14.84  15.25
Ohio           13.41  14.34
Delaware       12.01  14.07
Hawaii         12.02  13.58
New Jersey     10.75  12.72
Maine           9.41  12.12
South Dakota   10.36  11.55
Iowa           10.34  11.52
New Hampshire  10.24  10.35
Minnesota       9.45   9.55


That's weird, only 50 states (including DC as a 'state'). Is there a state with *equal* NEET rates for men and women?

In [165]:
equal_rates = agg_wt2[agg_wt2['men'] == agg_wt2['women']]
print('State with equal NEET rates:\n{0}'.format(equal_rates))

State with equal NEET rates:
sex               men  women
state                       
North Carolina  15.71  15.71


Huh, very strange. Even to two decimal places. Perhaps North Carolina exhibits the least gendered differences out of all the states? Something to look at in the future (though I'll probably forget).

## Other variables (disability and caretaking)

Okay, we now have a general sense of NEET rates in the US. We know that (at least based on 2024):

- **around 15% of young (16-24 y.o.) men and women are not employed or in education.**
- **men are slightly less likely to be NEETs, but this depends on age.**
- **among groups by race and gender, White men are the least likely to be NEETs, and Black men are the most likely.**
- **most states have higher female NEET rates, and NEET rates vary considerably by state, for both men and women.**

Now we might be curious about *why* some young man/woman is a NEET and another isn't. We can't entirely solve that here obviously, but we can look at some accompanying characteristics. 

Two interesting correlations, based on other research are:

- having a disability
- raising a child

where someone having one of these characteristics would be more likely to be a NEET. We can look at these relationships using variables in our dataset:
- `DIFFANY` : This variable determines if a surveyed individual has any [physical or cognitive disability](https://cps.ipums.org/cps-action/variables/DIFFANY#description_section), and is based responding 'Yes' to any of six other specific disability variables. Research finds that CPS disability prevalence estimates are lower than other, similar data sources, but the directionality should be the same.
- `NCHILD` : This variable indicates the number of the surveyed individual's own children present in their household, including step-children and adopted children. 

While `DIFFANY` should apply equally no matter how we define our age range to calculate the NEET rate, `NCHILD` won't (e.g., a twenty-nine year-old should be more likely to have a kid than a twenty-two year-old). Given our dataset, we'll stick with 16-24 while having this in the back of our heads. 

We'll try out a few different queries, so let's make a basic function that lets us calculate NEET rates repeatedly, based on certain restrictions.

In [221]:
def get_NEET(df = cps24, query_str = '', group_by = []):
    """
    calculates the NEET rate based on query and grouping options 

    **df**
    - dataframe, defaults to cps24

    **query_str**
    - query string to be placed in pandas query method
    - ex. query_str = 'AGE == 25 and SEX == 1'

    **group_by**
    - grouping options to get NEET rates
    - ex. group_by = ['SEX', 'RACE']
    """
    filtered_df = cps24.query(query_str)
    agg_wt = filtered_df.groupby(group_by)['WTFINL'].sum().unstack() # get weighted sum by gender and NEET status
    agg_wt= agg_wt.eval('neet_rate = neet / (neet + not_neet) * 100')
    
    if len(group_by) > 2:
        agg_wt2 = agg_wt['neet_rate'].round(2).unstack()
        return agg_wt2
    else:
        return agg_wt['neet_rate']

Let's briefly look at `DIFFANY` and `NCHILD` before getting our rates.

In [169]:
cps24.loc[:, ['DIFFANY', 'NCHILD']].describe()

Unnamed: 0,DIFFANY,NCHILD
count,375027.0,375027.0
mean,1.066601,0.847979
std,0.249329,1.196087
min,1.0,0.0
25%,1.0,0.0
50%,1.0,0.0
75%,1.0,2.0
max,2.0,9.0


`DIFFANY` equaling 1 means 'no difficulty', with 2 meaning 'has difficulty.' `NCHILD` is an integer variable, with 0 meaning no kids, and 9 meaning 9 or more kids. Let's start getting our calculations. Let's map binary variables for these two variables.

In [175]:
child_dict = {i : 'has_child' for i in range(10) if i > 0}
child_dict[0] = 'no_child'

disability_dict = {1 : 'no_dis', 2 : 'has_dis'}

cps24['kids'] = cps24['NCHILD'].map(child_dict)
cps24['dis'] = cps24['DIFFANY'].map(disability_dict)

First, the difference in NEET rates between people with a disabilty versus those without, by gender.

In [179]:
by_disability = get_NEET(cps24, 'AGE >= 16 and AGE <= 24', ['sex', 'dis', 'NEET'])
print(by_disability)

dis    has_dis  no_dis
sex                   
men      39.65   13.45
women    33.82   14.67


Wow, disabled men are three times as likely to be NEETs than men without a disability, almost four-in-ten. The gap is not as big for women, but still striking. What's the sample size for disabled men and women in our data?

In [212]:
dis_counts = cps24['dis'].sort_values().value_counts()
print(dis_counts)
male_dis_counts = cps24.query("sex == 'men'")['dis'].sort_values().value_counts()
female_dis_counts = cps24.query("sex == 'women'")['dis'].sort_values().value_counts()
print(male_dis_counts)

dis
no_dis     350050
has_dis     24977
Name: count, dtype: int64
dis
no_dis     173048
has_dis     12303
Name: count, dtype: int64


Okay, let's look at kids now. How do the NEET rates of those with kids differ from those without kids?

In [213]:
by_kids = get_NEET(cps24, 'AGE >= 16 and AGE <= 24', ['sex', 'kids', 'NEET'])
print(by_kids)

kids   has_child  no_child
sex                       
men        15.64     14.90
women      42.99     13.84


There's a small gap between men with kids and men without, but a much larger gap for women. 42% of women aged 16-24 years-old with kids are NEETs, compared to 13% of women without any kids. 

Well, how about the intersections? What is the NEET rate for men and women who are not disabled and don't have children?

In [222]:
no_kids_no_diff = get_NEET(cps24, 
                           "AGE >= 16 and AGE <= 24 and kids == 'no_child' and dis == 'no_dis' ", 
                           ['sex', 'NEET'])
print(no_kids_no_diff)

sex
men      13.421133
women    12.799778
Name: neet_rate, dtype: float64


Another finding: while the overall NEET rate (for 16-24 year-olds) is higher for women than men, when you look at the NEET rate for those with no kids, it's slightly higher for men; again, very very close, but slightly higher.

## Up next

We've made some progress on this question. But we've only looked at 2024. The natural next question is:

**'How have NEET rates changed over time in the US? and how has this change (if at all) varied by various demographics?'**