# Unemployment Rate by Age Group in CA
https://data.ca.gov/dataset/unemployment-rate-by-age-groups

This data set from the same LGHC survey tells us unemployment rates by age group in CA. I will look through what the rows and columns tell us, and see if this will give us any meaningful insights/patterns.

Based on what I find, I will determine whether or not to carry on with data analysis on this factor, if it's worth discussing in my data story!

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import math

In [2]:
# Read in data set:

unemploy_df = pd.read_csv('../data/Raw/adult_unemployment.csv')

In [3]:
unemploy_df

Unnamed: 0,Area Type,Area Name,Date,Year,Month,Age 16-19,Age 20-24,Age 25-34,Age 35-44,Age 45-54,Age 55-64,Age 65+
0,California,State,05/01/2019,2019,May,0.2,0.1,0.0,0.0,0.0,0.0,0.0
1,California,State,04/01/2019,2019,April,0.2,0.1,0.0,0.0,0.0,0.0,0.0
2,California,State,03/01/2019,2019,March,0.2,0.1,0.0,0.0,0.0,0.0,0.0
3,California,State,02/01/2019,2019,February,0.1,0.1,0.0,0.0,0.0,0.0,0.0
4,California,State,01/01/2019,2019,January,0.2,0.1,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
148,California,State,06/01/2020,2020,June,0.2,0.1,0.1,0.1,0.1,0.1,0.1
149,California,State,07/01/2020,2020,July,0.2,0.1,0.1,0.1,0.1,0.1,0.1
150,California,State,08/01/2020,2020,August,0.2,0.1,0.1,0.1,0.1,0.1,0.1
151,California,State,09/01/2020,2020,September,0.2,0.1,0.1,0.1,0.1,0.1,0.1


In [4]:
# Look at columns:

unemploy_df.columns

Index(['Area Type', 'Area Name', 'Date', 'Year', 'Month', 'Age 16-19',
       'Age 20-24', 'Age 25-34', 'Age 35-44', 'Age 45-54', 'Age 55-64',
       'Age 65+'],
      dtype='object')

Interesting; there's an Area type and Area Name but let's see if there are actually different areas in our data or if it's just the state of CA overall:

In [5]:
unemploy_df.sample(30)

Unnamed: 0,Area Type,Area Name,Date,Year,Month,Age 16-19,Age 20-24,Age 25-34,Age 35-44,Age 45-54,Age 55-64,Age 65+
40,California,State,01/01/2016,2016,January,0.2,0.1,0.1,0.0,0.0,0.0,0.0
50,California,State,03/01/2015,2015,March,0.2,0.1,0.1,0.1,0.1,0.1,0.1
61,California,State,04/01/2014,2014,April,0.3,0.1,0.1,0.1,0.1,0.1,0.1
7,California,State,10/01/2018,2018,October,0.2,0.1,0.0,0.0,0.0,0.0,0.0
137,California,State,07/01/2019,2019,July,0.2,0.1,0.0,0.0,0.0,0.0,0.0
151,California,State,09/01/2020,2020,September,0.2,0.1,0.1,0.1,0.1,0.1,0.1
111,California,State,02/01/2010,2010,February,0.3,0.2,0.1,0.1,0.1,0.1,0.1
26,California,State,03/01/2017,2017,March,0.2,0.1,0.1,0.0,0.0,0.0,0.0
105,California,State,08/01/2010,2010,August,0.4,0.2,0.1,0.1,0.1,0.1,0.1
75,California,State,02/01/2013,2013,February,0.3,0.2,0.1,0.1,0.1,0.1,0.1


In [6]:
unemploy_df.nunique()

Area Type      1
Area Name      1
Date         153
Year          13
Month         12
Age 16-19      4
Age 20-24      2
Age 25-34      2
Age 35-44      2
Age 45-54      2
Age 55-64      2
Age 65+        2
dtype: int64

Taking a sample several times and looking at the number of unique columns, it seems like the data is covering the entire state of CA rather than showing us specific areas. 

I'm going to rename the columns to make them more convenient for me and easier to work with:

In [7]:
# Renaming columns:

cname_dict = {
    'Year' : 'year',
    'Month' : 'month',
    'Age 16-19' : '16-19',
    'Age 20-24' : '20-24',
    'Age 25-34' : '25-34',
    'Age 35-44' : '35-44',
    'Age 45-54' : '45-54',
    'Age 55-64' : '55-64',
    'Age 65+' : '65+'
}

In [8]:
unemploy_df = unemploy_df.rename(columns=cname_dict)

In [9]:
unemploy_df

Unnamed: 0,Area Type,Area Name,Date,year,month,16-19,20-24,25-34,35-44,45-54,55-64,65+
0,California,State,05/01/2019,2019,May,0.2,0.1,0.0,0.0,0.0,0.0,0.0
1,California,State,04/01/2019,2019,April,0.2,0.1,0.0,0.0,0.0,0.0,0.0
2,California,State,03/01/2019,2019,March,0.2,0.1,0.0,0.0,0.0,0.0,0.0
3,California,State,02/01/2019,2019,February,0.1,0.1,0.0,0.0,0.0,0.0,0.0
4,California,State,01/01/2019,2019,January,0.2,0.1,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
148,California,State,06/01/2020,2020,June,0.2,0.1,0.1,0.1,0.1,0.1,0.1
149,California,State,07/01/2020,2020,July,0.2,0.1,0.1,0.1,0.1,0.1,0.1
150,California,State,08/01/2020,2020,August,0.2,0.1,0.1,0.1,0.1,0.1,0.1
151,California,State,09/01/2020,2020,September,0.2,0.1,0.1,0.1,0.1,0.1,0.1


In [10]:
# Choosing columns with the data I want:

cols_to_use = [
    'year',
    'month',
    '16-19',
    '20-24',
    '35-44',
    '45-54',
    '55-64',
    '65+'
]

unemploy_df = unemploy_df[cols_to_use].rename(columns=cname_dict)

In [11]:
unemploy_df

Unnamed: 0,year,month,16-19,20-24,35-44,45-54,55-64,65+
0,2019,May,0.2,0.1,0.0,0.0,0.0,0.0
1,2019,April,0.2,0.1,0.0,0.0,0.0,0.0
2,2019,March,0.2,0.1,0.0,0.0,0.0,0.0
3,2019,February,0.1,0.1,0.0,0.0,0.0,0.0
4,2019,January,0.2,0.1,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
148,2020,June,0.2,0.1,0.1,0.1,0.1,0.1
149,2020,July,0.2,0.1,0.1,0.1,0.1,0.1
150,2020,August,0.2,0.1,0.1,0.1,0.1,0.1
151,2020,September,0.2,0.1,0.1,0.1,0.1,0.1


In [12]:
unemploy_df.sample(20)

Unnamed: 0,year,month,16-19,20-24,35-44,45-54,55-64,65+
78,2012,November,0.3,0.2,0.1,0.1,0.1,0.1
140,2019,October,0.1,0.1,0.0,0.0,0.0,0.0
58,2014,July,0.3,0.1,0.1,0.1,0.1,0.1
62,2014,March,0.3,0.1,0.1,0.1,0.1,0.1
37,2016,April,0.2,0.1,0.0,0.0,0.0,0.0
4,2019,January,0.2,0.1,0.0,0.0,0.0,0.0
63,2014,February,0.3,0.1,0.1,0.1,0.1,0.1
40,2016,January,0.2,0.1,0.0,0.0,0.0,0.0
60,2014,May,0.3,0.1,0.1,0.1,0.1,0.1
148,2020,June,0.2,0.1,0.1,0.1,0.1,0.1


In [15]:
# Highlighting the 55-64 age group:

col_to_use = [
    'year',
    '55-64'
]

unemploy_df1 = unemploy_df[col_to_use]

In [16]:
unemploy_df1

Unnamed: 0,year,55-64
0,2019,0.0
1,2019,0.0
2,2019,0.0
3,2019,0.0
4,2019,0.0
...,...,...
148,2020,0.1
149,2020,0.1
150,2020,0.1
151,2020,0.1


### Observations:

* Nothing too meaningful, since the unemployment rates are not telling us about any big spikes or useful information (it's been pretty static over the years)
* However, I can still mention unemployment in a short sentence in my data story to indicate these sorts of socioeconomic factors as possible predictors for mental health issues like depression

With that, I will not be moving onto data analysis for unemployment rates. 

Let's go onto the analysis folder now! Thank you!