# Interactive Visualization of Change in Female College Major Percentage with Altair

- toc: true
- categories: [project]
- image: images/female_college_major.png

In this project, I make some simple interactive graphs to show annual changes in proportions of female college major in different disciplines. The data I used is obtained from [The Department of Education Statistics](http://nces.ed.gov/programs/digest/2013menu_tables.asp). It contains the proportions of female college majors in 17 disciplines in each year from 1970 to 2011. To make these analysis more easily understood, I will split these disciplines into three groups: STEM, Liberal Arts and the others, and plot each disciplines in these three groups. 

I use Python for this project. The programming tool I used for visualizing the data is Altair, a declarative statistical visualization library for Python. You can find the source code on [GitHub](http://github.com/altair-viz/altair).

This project consists of three parts: 
* a preview of our data to gain a brief understanding
* preprocessing to make the data easier to be visualized
* making interactive graphs to discovery information

# A preview of the data

In [1]:
import pandas as pd

women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
women_degrees.iloc[:, :8].head()

Unnamed: 0,Year,Agriculture,Architecture,Art and Performance,Biology,Business,Communications and Journalism,Computer Science
0,1970,4.229798,11.921005,59.7,29.088363,9.064439,35.3,13.6
1,1971,5.452797,12.003106,59.9,29.394403,9.503187,35.5,13.6
2,1972,7.42071,13.214594,60.4,29.810221,10.558962,36.6,14.9
3,1973,9.653602,14.791613,60.2,31.147915,12.804602,38.4,16.4
4,1974,14.074623,17.444688,61.9,32.996183,16.20485,40.5,18.9


In [2]:
women_degrees.columns

Index(['Year', 'Agriculture', 'Architecture', 'Art and Performance', 'Biology',
       'Business', 'Communications and Journalism', 'Computer Science',
       'Education', 'Engineering', 'English', 'Foreign Languages',
       'Health Professions', 'Math and Statistics', 'Physical Sciences',
       'Psychology', 'Public Administration', 'Social Sciences and History'],
      dtype='object')

# Preprocessing

## Tidying the data

In [3]:
tidy_data = women_degrees.melt(id_vars="Year", value_vars=women_degrees.columns[1:], 
                               var_name="Discipline", value_name="Female proportion")
tidy_data.head(10)

Unnamed: 0,Year,Discipline,Female proportion
0,1970,Agriculture,4.229798
1,1971,Agriculture,5.452797
2,1972,Agriculture,7.42071
3,1973,Agriculture,9.653602
4,1974,Agriculture,14.074623
5,1975,Agriculture,18.333162
6,1976,Agriculture,22.25276
7,1977,Agriculture,24.640177
8,1978,Agriculture,27.146192
9,1979,Agriculture,29.633365


## Divide disciplines into groups

In [4]:
stem_cats = ['Psychology', 'Biology', 'Math and Statistics', 
             'Physical Sciences', 'Computer Science', 
             'Engineering']

lib_arts_cats = ['Foreign Languages', 'English', 
                 'Communications and Journalism', 
                 'Art and Performance', 
                 'Social Sciences and History']

other_cats = ['Health Professions', 'Public Administration', 
              'Education', 'Agriculture',
              'Business', 'Architecture']

def which_type(subject):
    if subject in stem_cats:
        return "STEM"
    elif subject in lib_arts_cats:
        return "Liberal Arts"
    else :
        return "Others"

tidy_data["Type"] = tidy_data["Discipline"].apply(which_type)

tidy_data.head()

Unnamed: 0,Year,Discipline,Female proportion,Type
0,1970,Agriculture,4.229798,Others
1,1971,Agriculture,5.452797,Others
2,1972,Agriculture,7.42071,Others
3,1973,Agriculture,9.653602,Others
4,1974,Agriculture,14.074623,Others


# Plotting

## Trends of groups

In [5]:
#collapse-hide
import altair as alt
    
selection = alt.selection_multi(encodings=["color"], bind='legend')

point = alt.Chart(data=tidy_data).mark_point(filled=True).encode(
    x="Year:N",
    y="median(Female proportion)",
    color="Type",
    tooltip=["Type", "Year", "median(Female proportion)"]
).properties(
    width=700,
    height=300
)

line = point.mark_line()

(point + line).encode(
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
).add_selection(
    selection
)

## STEM

In [20]:
#collapse-hide

stem = tidy_data[tidy_data.Type == "STEM"]

selection = alt.selection_multi(encodings=["color"], bind='legend')

point = alt.Chart(data=stem).mark_point(filled=True).encode(
    alt.X("Year", type="nominal"),
    alt.Y("Female proportion", title="Female Proportion(%)"),
    color="Discipline",
    tooltip=["Discipline", "Year", "Female proportion"],
).properties(
    width=800,
    height=500,
    title="Female proportions in STEM major"
)

line = point.mark_line()
chart = point + line

chart.encode(
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
).add_selection(
    selection
)

## Liberal Arts

In [17]:
#collapse-hide

liberal = tidy_data[tidy_data.Type == "Liberal Arts"]

selection = alt.selection_multi(encodings=["color"], bind='legend')

point = alt.Chart(data=liberal).mark_point(filled=True).encode(
    alt.X("Year", type="nominal"),
    alt.Y("Female proportion", title="Female Proportion(%)"),
    color="Discipline",
    tooltip=["Discipline", "Year", "Female proportion"],
).properties(
    width=800,
    height=500,
    title="Female proportions in Liberal Arts major"
)

line = point.mark_line()
chart = point + line

chart.encode(
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
).add_selection(
    selection
)

## Others

In [18]:
#collapse-hide

other = tidy_data[tidy_data.Type == "Others"]

selection = alt.selection_multi(encodings=["color"], bind='legend')

point = alt.Chart(data=other).mark_point(filled=True).encode(
    alt.X("Year", type="nominal"),
    alt.Y("Female proportion", title="Female Proportion(%)"),
    color="Discipline",
    tooltip=["Discipline", "Year", "Female proportion"],
).properties(
    width=800,
    height=500,
    title="Female proportions in other majors"
)

line = point.mark_line()
chart = point + line

chart.encode(
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
).add_selection(
    selection
)

# Summary