# IS-Academia Analysis
This is a data analysis of the IS-Academia data accessible by anyone, without authentication.

**Goal** : 
* Find out how much time do EPFL's students in Computer Science need to get their Bachelor. 
* Do a similar analysis for the Master's degree. 

---

## Collecting the Data

The challenge before analysing the data is to extract this data from the IS-Academia website. By looking at this [page](http://isa.epfl.ch/imoniteur_ISAP/%21gedpublicreports.htm?ww_i_reportmodel=133685247), we can extract information about the names and different values of the HTML `<input>` fields using *Beautiful Soup*. Then we will be able to generate a valid request to get the wanted data. 

### Analysing the requests using Postman

With the *Postman interceptor*, we can intercept requests when submitting the form we are interested in. 
A valid request URL looks like this : 

`http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_b_list=1&ww_i_reportmodel=133685247&ww_c_langue=&ww_i_reportModelXsl=133685270&zz_x_UNITE_ACAD=Informatique&ww_x_UNITE_ACAD=249847&zz_x_PERIODE_ACAD=2016-2017&ww_x_PERIODE_ACAD=355925344&zz_x_PERIODE_PEDAGO=Bachelor+semestre+1&ww_x_PERIODE_PEDAGO=249108&zz_x_HIVERETE=Semestre+d%27automne&ww_x_HIVERETE=2936286&dummy=ok`

You can see there are redundant information, for example : `zz_x_UNITE_ACAD=Informatique` and `ww_x_UNITE_ACAD=249847`. You can imagine that getting rid of one of them can still work. It's actually the case, you can get rid of all the `zz_x_*` parameters. With a closer analysis, you can see that the `ww_x_*` parameters correspond to the actual values in the HTML dropdowns.

So this is also a valid request :

`http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_b_list=1&ww_i_reportmodel=133685247&ww_c_langue=&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=355925344&ww_x_PERIODE_PEDAGO=249108&ww_x_HIVERETE=2936286&dummy=ok`, much more simpler and shorter. 

Then the URL of the empty form page is given by 
`http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportmodel=133685247`
which is a little bit different from the IS-Academia link above, because the page is using frames and their HTML code are not included in the base page. 

From here, you can already guess the form paramters we are going to use, but the goal is to extract them and not hardcode them. At least now we have the request base and format.


### Getting the parameters using Beautiful Soup

In [62]:
# Import Requests and Beautiful Soup
import requests as rq
from bs4 import BeautifulSoup

# Define the IS-Academia page
empty_form_url = 'http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportmodel=133685247'

# Get the page by doing a HTTP request
empty_form = rq.get(empty_form_url)

# Get the soup out of it
form_soup = BeautifulSoup(empty_form.text, 'html.parser')

form_soup.find_all('th', string="Période académique")[0].nextSibling

<td><input name="zz_x_PERIODE_ACAD" type="hidden" value=""><select name="ww_x_PERIODE_ACAD" onchange="document.f.zz_x_PERIODE_ACAD.value=document.f.ww_x_PERIODE_ACAD.options[document.f.ww_x_PERIODE_ACAD.selectedIndex].text"><option value="null"></option><option value="355925344">2016-2017</option><option value="213638028">2015-2016</option><option value="213637922">2014-2015</option><option value="213637754">2013-2014</option><option value="123456101">2012-2013</option><option value="123455150">2011-2012</option><option value="39486325">2010-2011</option><option value="978195">2009-2010</option><option value="978187">2008-2009</option><option value="978181">2007-2008</option></select></input></td>

### Getting the actual data