See function get_tables at bottom for a single function that returns a pandas dataframe after having scraped the requested data from is-academia. 

In [97]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

base_url = 'http://isa.epfl.ch/imoniteur_ISAP/%21gedpublicreports.htm?ww_i_reportmodel=133685247'


When making a request through the form, the following request is made:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_b_list=1&ww_i_reportmodel=133685247&ww_c_langue=&ww_i_reportModelXsl=133685270&zz_x_UNITE_ACAD=Informatique&ww_x_UNITE_ACAD=249847&zz_x_PERIODE_ACAD=&ww_x_PERIODE_ACAD=null&zz_x_PERIODE_PEDAGO=&ww_x_PERIODE_PEDAGO=null&zz_x_HIVERETE=&ww_x_HIVERETE=null&dummy=ok

The params are:
ww_b_list:1
ww_i_reportmodel:133685247
ww_c_langue:
ww_i_reportModelXsl:133685270
zz_x_UNITE_ACAD:Informatique
ww_x_UNITE_ACAD:249847
zz_x_PERIODE_ACAD:
ww_x_PERIODE_ACAD:null
zz_x_PERIODE_PEDAGO:
ww_x_PERIODE_PEDAGO:null
zz_x_HIVERETE:
ww_x_HIVERETE:null
dummy:ok

Notice we have specified zz_x_UNITE_ACAD:Informatique i.e. computer science

This gives us a list of all the different student lists for computer science.

This request allows us to select a list and display the students:

http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=71454914&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=null&ww_x_HIVERETE=null

The params are:
ww_x_GPS:71454914
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:null
ww_x_HIVERETE:null


For the first question we need to filter on Bachelor, the date and the semestre
Here is the request for bachelor semestre 1 2004-2005:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=2225262&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=249108&ww_x_HIVERETE=null

ww_x_GPS:2225262
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:249108
ww_x_HIVERETE:null

And for bachelor semestre 6 2004-2005:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=2225150&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=942175&ww_x_HIVERETE=null
ww_x_GPS:2225150
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:942175
ww_x_HIVERETE:null

Remark: we seem to have standard values for ww_x_PERIODE_PEDAGO: 

In [2]:
bachelor_semestre = {1:249108, 6:942175}


Now lets look at how changing the academic period affects the params:
Bachelor semestre 6 2005-2006:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=2225237&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=942175&ww_x_HIVERETE=null

ww_x_GPS:2225237
ww_i_reportModel:133685247  # Seems to be invariant
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847  # Section: Computer Science
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:942175  # Semestre
ww_x_HIVERETE:null

As expected ww_x_PERIODE_PEDAGO is 942175
The only value that has changed is ww_x_GPS.

Bachelor Seestre 6 2006-2007:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=2225324&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=942175&ww_x_HIVERETE=null
ww_x_GPS:2225324
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:942175
ww_x_HIVERETE:null

Again only ww_x_GPS has changed. No obvious pattern, expected that it seems to be an upwards trend as we increase the year. Let's check this. 
As this would be sufficient de filter FROM a specific date.

Bachelor 6 2014-2015:
ww_x_GPS:1378362238  # OK

BAchelor 6 2016-2017:
ww_x_GPS:1744378039 # OK the trend seems to hold



Let's look at how to differentiate between bachelor and master
Master 1 2015-2016:
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=1897033225&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=null&ww_x_PERIODE_PEDAGO=2230106&ww_x_HIVERETE=null
ww_x_GPS:1897033225
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:2230106
ww_x_HIVERETE:null

The value that has changed is ww_x_PERIODE_PEDAGO to 2230106
We can guess that all master 1 students will have this value, whatever the year. Let's check

Master 1 2013 2014
ww_x_GPS:1378438423
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:null
ww_x_PERIODE_PEDAGO:2230106
ww_x_HIVERETE:null

Great. The theory holds.


Let's try filtering on the year from their form.
Master 1 2016-2017
http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=2021044028&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=355925344&ww_x_PERIODE_PEDAGO=2230106&ww_x_HIVERETE=null
ww_x_GPS:2021044028
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:355925344
ww_x_PERIODE_PEDAGO:2230106
ww_x_HIVERETE:null

Interesting, we have a value for ww_x_PERIODE_ACAD. Let's check this represents the year

Bachelor 6 2016-2017:
ww_x_GPS:1744378039
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:355925344
ww_x_PERIODE_PEDAGO:942175
ww_x_HIVERETE:null

Ok great this represents the year. What logic is behind this param ?
Let's go down a year:

BAchelor 6 2015-2016:
ww_x_GPS:1650772010
ww_i_reportModel:133685247
ww_i_reportModelXsl:133685270
ww_x_UNITE_ACAD:249847
ww_x_PERIODE_ACAD:213638028
ww_x_PERIODE_PEDAGO:942175
ww_x_HIVERETE:null

Bachelor 6 2014-2015
ww_x_PERIODE_ACAD:213637922

Ok it looks pretty random, we're just going to get the values for the years we need

In [3]:
# Values for ww_x_PERIODE_ACAD
years = {'2016-2017':355925344,
         '2015-2016':213638028,
         '2014-2015':213637922,
         '2013-2014':213637754,
         '2012-2013':123456101,
         '2011-2012':123455150,
         '2010-2011':39486325,
         '2009-2010':978195,
         '2008-2009':978187,
         '2007-2008':978181 ,     
        }

It also seems there is no obvious where to get ww_x_GPS. Note however that if we request all the lists this param is set to -1. This will do nicely.

In [4]:
def get_data(semestre, year):
    params = {
                'ww_x_GPS':str(-1),  # This gives us all results
                'ww_i_reportModel':133685247,  # Always the same
                'ww_i_reportModelXsl':133685270,  # Always the same
                'ww_x_UNITE_ACAD':249847,  # The section, we also always looking at computer sicence
                'ww_x_PERIODE_ACAD':str(year),  # The school year, ex: 2016-2017
                'ww_x_PERIODE_PEDAGO':str(semestre),  # The semestre, ex Bachelor 1
                'ww_x_HIVERETE':'null'  # Unknown param, always seems to be null
             }
    url = 'http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS={ww_x_GPS}&ww_i_reportModel={ww_i_reportModel}&ww_i_reportModelXsl={ww_i_reportModelXsl}&ww_x_UNITE_ACAD={ww_x_UNITE_ACAD}&ww_x_PERIODE_ACAD={ww_x_PERIODE_ACAD}&ww_x_PERIODE_PEDAGO={ww_x_PERIODE_PEDAGO}&ww_x_HIVERETE={ww_x_HIVERETE}'.format(**params)
    r = requests.get(url)
    return r

In [5]:
# Lets test our function
# assert get_data(bachelor_semestre[1], years['2016-2017']).status_code == 200

In [6]:
# f = open('samplehtml.html','w')
# f.write(get_data(bachelor_semestre[1], years['2016-2017']).text)

Let's start having a look at how we can parse this data:
The data we want is basically in an html table.


In [35]:
data = get_data(bachelor_semestre[1], years['2016-2017'])
soup = BeautifulSoup(data.text, 'html.parser')

In [36]:
# soup

In [37]:
len(soup.table)

237

In [38]:
table = soup.table

In [87]:
data_list = []
rows = table.find_all('tr')
for i,row in enumerate(rows):
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    cols = cols[:-1]  # There' s an empty column at the end
    data_list.append(cols)
df = pd.DataFrame(data_list[2:],columns=['Civilité','Nom Prénom','Orientation Bachelor','Orientation Master','Spécialisation','Filière opt.','Mineur','Statut','Type Echange','Ecole Echange','No Sciper'])


In [88]:
df.head()



Unnamed: 0,Civilité,Nom Prénom,Orientation Bachelor,Orientation Master,Spécialisation,Filière opt.,Mineur,Statut,Type Echange,Ecole Echange,No Sciper
0,Monsieur,Abbey Alexandre,,,,,,Présent,,,235688
1,Monsieur,Ahn Seongho,,,,,,Présent,,,274015
2,Madame,Alemanno Sara,,,,,,Présent,,,268410
3,Monsieur,Althaus Luca,,,,,,Présent,,,271464
4,Monsieur,Assi Karim,,,,,,Présent,,,274518


Ok so let's put all of this together in a single function. This function will get the data and return it in a pandas dataframe

In [93]:
def get_table(semestre, year, student_type):
    """
    @param semestre: Int represented the semestre number: for example put 1 for semestre 1
    @param year: String representing the academic year: for example '2016-2017'
    @param student type: String either 'bachelor' or 'master'
    @return: pandas dataframe with corresponding data from Is-Academia
    """
    
    years = {'2016-2017':355925344,
         '2015-2016':213638028,
         '2014-2015':213637922,
         '2013-2014':213637754,
         '2012-2013':123456101,
         '2011-2012':123455150,
         '2010-2011':39486325,
         '2009-2010':978195,
         '2008-2009':978187,
         '2007-2008':978181 ,     
        }
    bachelor_semestre = {1:249108, 6:942175}
    
    if year in years.keys():
        year_code = years[year]
    else:
        raise Exception('Invalid year selected')
    if semestre in range(1,7):
        if student_type == 'bachelor':
            semestre_code = bachelor_semestre[semestre]
        elif student_type == 'master':
            raise Exception('Master Not yet implemented, need to manually fill in semestre codes')
        else:
            raise Exception('Invalid student type. Please choose either bachelor or master')
    else:
        raise Exception('Invalid semestre selected')
    
    params = {
                'ww_x_GPS':str(-1),  # This gives us all results
                'ww_i_reportModel':133685247,  # Always the same
                'ww_i_reportModelXsl':133685270,  # Always the same
                'ww_x_UNITE_ACAD':249847,  # The section, we also always looking at computer sicence
                'ww_x_PERIODE_ACAD':str(year_code),  # The school year, ex: 2016-2017
                'ww_x_PERIODE_PEDAGO':str(semestre_code),  # The semestre, ex Bachelor 1
                'ww_x_HIVERETE':'null'  # Unknown param, always seems to be null
             }
    url = 'http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS={ww_x_GPS}&ww_i_reportModel={ww_i_reportModel}&ww_i_reportModelXsl={ww_i_reportModelXsl}&ww_x_UNITE_ACAD={ww_x_UNITE_ACAD}&ww_x_PERIODE_ACAD={ww_x_PERIODE_ACAD}&ww_x_PERIODE_PEDAGO={ww_x_PERIODE_PEDAGO}&ww_x_HIVERETE={ww_x_HIVERETE}'.format(**params)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    table = soup.table
    data_list = []
    rows = table.find_all('tr')
    for i,row in enumerate(rows):
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        cols = cols[:-1]  # There' s an empty column at the end
        data_list.append(cols)
    columns=['Civilité','Nom Prénom','Orientation Bachelor','Orientation Master','Spécialisation','Filière opt.','Mineur','Statut','Type Echange','Ecole Echange','No Sciper']
    df = pd.DataFrame(data_list[2:],columns=columns)
    return df


In [98]:
# test
get_table(1,'2016-2017', 'bachelor')

Unnamed: 0,Civilité,Nom Prénom,Orientation Bachelor,Orientation Master,Spécialisation,Filière opt.,Mineur,Statut,Type Echange,Ecole Echange,No Sciper
0,Monsieur,Abbey Alexandre,,,,,,Présent,,,235688
1,Monsieur,Ahn Seongho,,,,,,Présent,,,274015
2,Madame,Alemanno Sara,,,,,,Présent,,,268410
3,Monsieur,Althaus Luca,,,,,,Présent,,,271464
4,Monsieur,Assi Karim,,,,,,Présent,,,274518
5,Monsieur,Badoux Luc-Antoine,,,,,,Présent,,,249613
6,Monsieur,Bagnoud Jérôme,,,,,,Présent,,,262214
7,Monsieur,Barbaras Yann Quentin,,,,,,Présent,,,262239
8,Monsieur,Barras Luca,,,,,,Présent,,,257916
9,Madame,Barsi Clémence Marie Sabine,,,,,,Présent,,,271508
