# Gender diversity at the FNWI
---

In [None]:
# Environment setup
import matplotlib.pyplot as plt
from matplotlib import rc, colors
import numpy as np
import pandas as pd
import sys
import os

# Matplotlib config
plt.style.use('seaborn-muted')

In [None]:
# Import data sources
COL_NAMES = ['TYPE HOGER ONDERWIJS', 
             'INSTELLINGSNAAM ACTUEEL', 
             'CROHO ONDERDEEL', 
             'OPLEIDINGSNAAM ACTUEEL', 
             'OPLEIDINGSVORM', 
             'GESLACHT', 
             '2015', 
             '2016', 
             '2017', 
             '2018', 
             '2019']
enrollments = pd.read_csv('data/inschrijvingen-wo-2019.csv', sep=';', header=0, usecols=COL_NAMES)

In [None]:
# Filter out the UvA FNWI data specifically
is_uva = enrollments['INSTELLINGSNAAM ACTUEEL'] == 'Universiteit van Amsterdam'
is_nature = enrollments['CROHO ONDERDEEL'] == 'natuur'
is_bachelor = enrollments['TYPE HOGER ONDERWIJS'] == 'bachelor'
is_master = enrollments['TYPE HOGER ONDERWIJS'] == 'master'

uva_fnwi_bachelors = enrollments[is_uva & is_nature & is_bachelor]
uva_fnwi_masters = enrollments[is_uva & is_nature & is_master]

In [None]:
# Preview of what our data looks like
uva_fnwi_bachelors

## Bachelor Programs

Note: In the data for both Physics as well as Chemistry, there are duplicate rows, one with '(joint degree)' and one without. Since the data is not identical, we assume that they are each others complement and we can add them together. However, we should verify that this really is the case.

In [None]:
# Merge duplicate joint degree programs
uva_fnwi_bachelors = uva_fnwi_bachelors.replace({
        'B Scheikunde (joint degree)': 'B Scheikunde',
        'B Natuur- en Sterrenkunde (joint degree)': 'B Natuur- en Sterrenkunde'
    })
grouped_data = uva_fnwi_bachelors.groupby(['OPLEIDINGSNAAM ACTUEEL', 'GESLACHT']).aggregate(sum)

### Enrollments in the last 5 years

The table below shows the enrollments of each program per year over the last five years, for each gender.

TODO: find way to visualize this in an organised but complete way.

In [None]:
grouped_data

### Gender diversity in 2019

In [None]:
data_2019 = uva_fnwi_bachelors.groupby(['OPLEIDINGSNAAM ACTUEEL', 'GESLACHT'])['2019'].sum()
data_2019.to_frame()

In [None]:
data_2019.unstack().plot(kind='bar', figsize=(15, 10))
plt.legend(['male', 'female'], title='Gender')
plt.xlabel('Program')
plt.show()