# Analyzing Education Levels in Adults over 21
I wanted to take a look at education levels in adults so below is my quick analysis.

In [None]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as plt_ticker
import seaborn as sns
%matplotlib inline

In [None]:
cols_to_keep = np.array(['AGEP','DECADE', 'SEX', 'SCHL'])
ss1 = pd.read_csv('../input/ss14pusa.csv',usecols=cols_to_keep)
ss2 = pd.read_csv('../input/ss14pusb.csv',usecols=cols_to_keep)
ss = pd.concat([ss1,ss2])

# Filtering Down the Data
Since I wanted to take a look at data related to adult education levels I thought I would filter out anyone that was younger than 21.
I am also updating the Decade field to be a number from 2-9 representing people in their 20's, 30's, etc...

In [None]:
gte21 = ss.loc[:,'AGEP']>=21
gte21df = ss.loc[gte21]
gte21df.loc[:,('DECADE')] = np.floor(gte21df.loc[:,('AGEP')]/10)

# Histogram of Education Levels of Adults over 21
Below I started with a plot of of everyone.  The table below shows what each number represents.  As you can see in the data set 16 or High School Diploma accounts for the highest category.  I was interested to see how this changes with different age groups.

## Table showing what each number means

 - bb N/A (less than 3 years old) 
 - 01 No schooling completed 
 - 02 Nursery school, preschool 
 - 03 Kindergarten 04 .Grade 1 05 .Grade 2
 - 06 Grade 3 
 - 07 Grade 4 
 - 08 Grade 5 
 - 09 Grade 6
 - 10 Grade 7 
 - 11 Grade 8 
 - 12 Grade 9
 - 13 Grade 10
 - 14 Grade 11
 - 15 12th grade - no diploma
 - 16 Regular high school diploma 
 - 17 GED or alternative credential 
 - 18 Some college, but less than 1 year 
 - 19 1 or more years of college credit no degree
 - 20 Associate's degree 
 - 21 Bachelor's degree 
 - 22 Master's degree 
 - 23 Professional degree beyond a bachelor's degree
 - 24 Doctorate degree

In [None]:
#plt.hist(gte21df['SCHL'], bins=24)
plt.figure(figsize=(8, 6))
g = sns.countplot(x="SCHL", data=gte21df)
g.set_xticklabels(g.get_xticklabels(), rotation=45)

In [None]:
def createDecadeSubPlots(nrows, chartFilters):
    fig, axes = plt.subplots(nrows, 1, figsize=(7,15))
    chartFilters_list = chartFilters.tolist()
    for row in axes:
        d1 = chartFilters_list.pop()
        rowFilter = gte21df.loc[:,'DECADE']==d1
        x = gte21df.loc[rowFilter,'SCHL']
        title1 = "Decade %0d" % (d1)
        plot(row, x, title1)
    plt.tight_layout()
    plt.show()
    
def plot(axrow, x, title1):
    axrow.hist(x, color=np.random.rand( 3,1))
    axrow.set_title(title1)

# Histogram of Different Generations and Education Levels
I wanted to see if there has been much change in education levels of adults over 21 over the years and as you can see in the charts below there has.  Each chart is labeled Decade 9, Decade 8, etc.. each grouping all the people into the 90's, 80's, etc age groups together.
If you look you can see the education charts change as we hit people in their 60's and even tipping the scales to more college grads than high school for people in their 40's or younger.

In [None]:
chartFilters = np.arange(gte21df.loc[:,'DECADE'].min(),gte21df.loc[:,'DECADE'].max()+1,1)
createDecadeSubPlots(8, chartFilters)

In [None]:
def plotGender(decade, col='SCHL', binSize=10):
    male = gte21df.loc[(gte21df.loc[:,'DECADE']==decade) & (gte21df.loc[:,'SEX']==1.0)]
    female = gte21df.loc[(gte21df.loc[:,'DECADE']==decade) & (gte21df.loc[:,'SEX']==2.0)]
    fig = plt.figure(figsize=(7,12))
    fig.suptitle('Gender by Decade Hist of Schooling')
    ax1 = fig.add_subplot(2, 1, 1)
    ax2 = fig.add_subplot(2, 1, 2)
    
    ax1.hist(male.loc[:,col], color='blue', bins=binSize)
    ax2.hist(female.loc[:,col], color='green', bins=binSize)
    
    ax1.set_title('Males')
    ax2.set_title('Females')
    
    plt.show()

# Gender Differences
I wanted to take analysis to one more level of detail.  By decade and gender.  I started with people in their 20's with the chart below.  I added in an interactive widget but I am not sure that will come through in Kaggle.  If not download the files and you can toggle the decades and how many bins are in the histogram and the column you are plotting.
What was interesting here is with people in their 20's you see a different distribution for Males and Females.  It appears that more females have college degrees and less high school degrees in the data set than males.

In [None]:
interact(plotGender, decade=widgets.FloatSlider(min=2.0,max=9.0,step=1.0,value=2.0));