# Time taken to complete degree
Obtain all the data for the Bachelor students, starting from 2007. Keep only the students for which you have an entry for both Bachelor semestre 1 and Bachelor semestre 6. Compute how many months it took each student to go from the first to the sixth semester. Partition the data between male and female students, and compute the average -- is the difference in average statistically significant?

In [31]:
import requests as req
import urllib
import pandas as pd
from bs4 import BeautifulSoup as bes

#Add the base url where the form is
base_url = "http://isa.epfl.ch/imoniteur_ISAP/!gedpublicreports.htm?ww_i_reportmodel=133685247"
full_url = "http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_b_list=1&ww_i_reportmodel=133685247&ww_c_langue=&ww_i_reportModelXsl=133685270&zz_x_UNITE_ACAD=&ww_x_UNITE_ACAD=249847&zz_x_PERIODE_ACAD=&ww_x_PERIODE_ACAD=213638028&zz_x_PERIODE_PEDAGO=&ww_x_PERIODE_PEDAGO=249108&zz_x_HIVERETE=&ww_x_HIVERETE=2936286&dummy=ok"

Applying the filters to get the Bachelor semester 1 in informatique in 2015-16 gives the following url: 
```
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_b_list=1&ww_i_reportmodel=133685247&ww_c_langue=&ww_i_reportModelXsl=133685270&zz_x_UNITE_ACAD=&ww_x_UNITE_ACAD=249847&zz_x_PERIODE_ACAD=&ww_x_PERIODE_ACAD=213638028&zz_x_PERIODE_PEDAGO=&ww_x_PERIODE_PEDAGO=249108&zz_x_HIVERETE=&ww_x_HIVERETE=2936286&dummy=ok
```
Feeding this url to postman interceptor gives us the following parameter values:
```
ww_b_list:1
ww_i_reportmodel:133685247
ww_c_langue:
ww_i_reportModelXsl:133685270
zz_x_UNITE_ACAD:
ww_x_UNITE_ACAD:249847
zz_x_PERIODE_ACAD:
ww_x_PERIODE_ACAD:213638028
zz_x_PERIODE_PEDAGO:
ww_x_PERIODE_PEDAGO:249108
zz_x_HIVERETE:
ww_x_HIVERETE:2936286
dummy:ok
```
We used inspect element  to get the url of the page that displays only the data table without the form. The url was as follows:
```
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=1897032870&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=213638028&ww_x_PERIODE_PEDAGO=249108&ww_x_HIVERETE=2936286
```
Feeding this url to postman interceptor gives us the following parameter values:
```
ww_x_GPS:1897032870
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:213638028
ww_x_PERIODE_PEDAGO:249108
ww_x_HIVERETE:2936286
```
Looking at the HTML code, we can figure out what each of the parameters stands for:
+ **ww_x_GPS**: There might be several lists which match our search. This specifies which list to open. giving -1 opens tous, which is all. So we set this to -1
+ **ww_i_reportModel and ww_i_reportModelXsl**: Selecting whether to use HTML or excel. We always want to use HTML so we fix it to the values we got above in postman interceptor.
+ **ww_x_UNITE_ACAD**: We are only considering 'Informatique' so we fix it to the value above found using interceptor, for informatique.
+ **ww_x_PERIODE_ACAD**: We need to vary the academic period. We will get a dictionary for what value corresponds to which academic year from the HTML source using beautiful soup.
+ **ww_x_PERIODE_PEDAGO**: We also need a dictionary for this just as we do for ww_x_PERIODE_ACAD.
+ **ww_x_HIVERETE**: Same as above.

In [26]:
ww_x_GPS = '-1'  #for 'tous'. fixed
ww_i_reportModel = '133685247'  #fixed
ww_i_reportModelXsl = '133685270'  #fixed to HTML
ww_x_UNITE_ACAD = '249847'  #fixed to informatique


Now we need to get the dictionaries for the required fields

In [33]:
r = req.get( full_url )
soup = bes(r.text, 'lxml')
#print (soup.prettify())

In [58]:
dict = {}
for select in soup.findAll('select'):
    name = select['name'].strip()
    dict[name] = {}
    for option in select.findAll('option'):
        #print( option.string, option['value'])
        if(option['value'] != 'null'):
            strng = option.string.strip()
            dict[name][strng] = option['value'].strip()

In [59]:
dict

{'ww_x_HIVERETE': {"Semestre d'automne": '2936286',
  'Semestre de printemps': '2936295'},
 'ww_x_PERIODE_ACAD': {'2007-2008': '978181',
  '2008-2009': '978187',
  '2009-2010': '978195',
  '2010-2011': '39486325',
  '2011-2012': '123455150',
  '2012-2013': '123456101',
  '2013-2014': '213637754',
  '2014-2015': '213637922',
  '2015-2016': '213638028',
  '2016-2017': '355925344'},
 'ww_x_PERIODE_PEDAGO': {'Bachelor semestre 1': '249108',
  'Bachelor semestre 2': '249114',
  'Bachelor semestre 3': '942155',
  'Bachelor semestre 4': '942163',
  'Bachelor semestre 5': '942120',
  'Bachelor semestre 5b': '2226768',
  'Bachelor semestre 6': '942175',
  'Bachelor semestre 6b': '2226785',
  'Master semestre 1': '2230106',
  'Master semestre 2': '942192',
  'Master semestre 3': '2230128',
  'Master semestre 4': '2230140',
  'Mineur semestre 1': '2335667',
  'Mineur semestre 2': '2335676',
  'Mise à niveau': '2063602308',
  'Projet Master automne': '249127',
  'Projet Master printemps': '3781783