## Key numbers on acceptance to DTU
Students accepted in software technology, undergraduate.

Henriette Steenhoff, s134869

----
![DTU logo](http://dtu-studiedatavarehus.ait.dtu.dk/images/IndexDeko.jpg)

All work here is based on data from the [DTU Study Data Warehouse](http://dtu-studiedatavarehus.ait.dtu.dk/Default.aspx) -- sadly this site is only available in Danish.

The fun starts in the [Plotting](#Plotting) section -- fast forward to this point if you are less interested in all the nitty gritty details.

----

## Prerequisite 

In [125]:
# IMPORTS
import re
import urllib2
import json
import numpy as np
import pandas as pd

# Plotting tools
import plotly 
from IPython.display import Image 
import plotly.plotly as py
import plotly.graph_objs as go
# API access to plotting tools
plotly.tools.set_credentials_file(username='frksteenhoff2', api_key ='duu8hsfRmuI5rF2EU8o5')
import matplotlib.pyplot as plt
%matplotlib inline

# QUERY FOR HISTORIC DATA ON ACCEPTANCE, SOFTWARE TECHNOLOGY
request_url = 'http://dtu-studiedatavarehus.ait.dtu.dk/vis_noegletal_optag.aspx?aar=0&sem=e&udd=1&ret=12&kon=Alle&alder=0&nt=0&vd=0&land=0&region=0&kv=0&eks=0'

response = urllib2.urlopen(request_url)
#temp_data = json.load(response)

In [127]:
# Fetching data directly from webpage works poorly -- only as instance, not object
# The information is available and can be fetched, but it requires one hell of a regular expression.
response.read(100)

'nQCA2QMFCsAGRYIHgROYW1lBQpVRERBTk5FTFNFHgpJc1JlYWRPbmx5aB4EVHlwZRkrAh4JRGF0YUZpZWxkBQpVRERBTk5FTFNFF'

----
## Key numbers
As fetching the URL content directly from the website works poorly (I would have hoped for an API/Open data solution, but nope), I have added the numbers on students accepted from the webpage manually using this query:

[`http://dtu-studiedatavarehus.ait.dtu.dk/vis_noegletal_optag.aspx?aar=0&sem=e&udd=1&ret=12&kon=Alle&alder=0&nt=0&vd=0&land=0&region=0&kv=0&eks=0`](http://dtu-studiedatavarehus.ait.dtu.dk/vis_noegletal_optag.aspx?aar=0&sem=e&udd=1&ret=12&kon=Alle&alder=0&nt=0&vd=0&land=0&region=0&kv=0&eks=0)

Here fetching numbers on students accepted for all years (2004-2017) on civil engineering bachelor in Computer Science at DTU.

For the graduated students, I used this query:

[`http://dtu-studiedatavarehus.ait.dtu.dk/vis_noegletal_faerdige.aspx?aar=0&ret=67&udd=1&kon=Alle&alder=0&nt=0&vd=0&land=0&region=&kv=0&eks=&stud=0`](http://dtu-studiedatavarehus.ait.dtu.dk/vis_noegletal_faerdige.aspx?aar=0&ret=67&udd=1&kon=Alle&alder=0&nt=0&vd=0&land=0&region=&kv=0&eks=&stud=0)

Here fetching numbers on students graduated for all years (2004-2017) from civil engineering bachelor in Computer Science at DTU.


### Accepted

In [106]:
# Percentage male/female accepted on undergrad Software technology at DTU

YEAR         = range(2004,2018)

# Percentage accepted female/male
DTU_CS_P_F   = np.asarray([5, 2, 2, 0,  10,5, 9, 7, 7, 8, 12,7, 10,19])
DTU_CS_P_M   = np.asarray([95,98,98,100,90,95,91,93,93,92,88,93,90,81])

# All accepted, total
CS_YEARLY_ACCEPTED = np.asarray([56,66,62,63,62,55,54,56,55,61,68,68,63,80])

# Number of people accepted (calculated from the above)
no_females = np.round(np.multiply(DTU_CS_P_F,np.asarray(CS_YEARLY_ACCEPTED/100.0)))
no_males   = np.round(np.multiply(DTU_CS_P_M,np.asarray(CS_YEARLY_ACCEPTED/100.0)))

### Graduated

In [128]:
GRAD_CS_F = np.asarray([0,0,0,0,5, 0, 0, 2, 1, 0, 2, 0, 2, 8])
GRAD_CS_M = np.asarray([0,0,0,9,40,39,42,33,39,41,42,47,48,55])

## Functions

In order to make the plotting look a little more sleek (and minimize redundancy of code), I wrapped the plotting up in a function for bar charts with and without grouping. 

In [130]:
# Plotting bar chart with two groups
def plot2bar(bar1, name1, bar2, name2, x_in, title, xaxis, yaxis):
    trace1 = go.Bar(
        x    = x_in,
        y    = bar1,
        name = name1
    )

    trace2 = go.Bar(
        x    = x_in,
        y    = bar2,
        name = name2
    )
 
    data = [trace1,trace2]
    layout= go.Layout(
        barmode='group',
        title=title,
        xaxis=dict(
            title=xaxis
        ),
        yaxis=dict(
            title=yaxis
        )
    )

    fig = go.Figure(data=data, layout=layout)
    return py.iplot(fig, sharing='public', filename='grouped-bar')

In [131]:
# Plotting bar chart - no grouping
def plotbar(bar, name, x_in, title, xaxis, yaxis):
    trace = go.Bar(
        x    = x_in,
        y    = bar,
        name = name
    )
 
    data = [trace]
    layout= go.Layout(
        title=title,
        xaxis=dict(
            title=xaxis
        ),
        yaxis=dict(
            title=yaxis
        )
    )

    fig = go.Figure(data=data, layout=layout)
    return py.iplot(fig, sharing='public', filename='grouped-bar')

## Plotting ##
This is where the fun starts!

### Acceptance

In [132]:
plot2bar(DTU_CS_P_F, "Female", 
         DTU_CS_P_M, "Male", 
         YEAR, 
         "Accepted software technology, percentage", 
         "Year", "Percentage [%]")

In [134]:
plot2bar(no_females, "Female", 
         no_males, "Male", 
         YEAR, 
         "Accepted software technology, numbers", 
         "Year", "Frequency")

----
#### Female students only

In [135]:
plotbar(no_females, "Female", YEAR, "Females accepted", "Year", "Frequency")

So, there has definitely been an increase in women applying for the CS undergraduate at DTU, but only significantly in 2017 where almost 20% of the accepted students are women.

### Graduated

In [136]:
plot2bar(GRAD_CS_F, "Female", GRAD_CS_M, "Male", YEAR, "Graduated, Software Technology", "Year", "Frequency")

At DTU getting a bachelor takes 3 years. This means that what one would hope, was that the number of students accepted in year `x` should be the same number of students graduated in year `x+3` -- this is why there are no graduates in 2004-2006. Overall there is quite a large number of drop-outs among the students and the number of females who actually graduate is very low. One should also take into account that quite a few students use an extra semester on completing their bachelor, which also might make it harder to fully predict the exact graduation date.

**From 2004-2017, only 20 women have actually graduated with a degree in Computer Science which means that on average 1.5 women graduate each year.**

**Of the women accepted before 2016, only half of them graduated.**