# Table of Contents
<p>
<div class="lev1"><a href="#Data-from-the-Web"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data from the Web</a></div>
<div class="lev1"><a href="#Getting-the-data"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Getting the data</a></div>
<div class="lev2"><a href="#Requesting-ISA-form"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Requesting ISA form</a></div>
<div class="lev2"><a href="#Finding-form-IDs"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Finding form IDs</a></div>
<div class="lev2"><a href="#Filtering-and-getting-the-data"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Filtering and getting the data</a></div>
<div class="lev2"><a href="#Extracting-data-from-the-result-page"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Extracting data from the result page</a></div>



# Data from the Web

In this homework we will extract interesting information from IS-Academia, the educational portal of EPFL. Specifically, we will focus on the part that allows public access to academic data. The list of registered students by section and semester is not offered as a downloadable dataset, so you will have to find a way to scrape the information we need. On this form you can select the data to download based on different criteria (e.g., year, semester, etc.)

You are not allowed to download manually all the tables -- rather you have to understand what parameters the server accepts, and generate accordingly the HTTP requests. For this task, Postman with the Interceptor extension can help you greatly. I recommend you to watch this brief tutorial to understand quickly how to use it. Your code in the iPython Notebook should not contain any hardcoded URL. To fetch the content from the IS-Academia server, you can use the Requests library with a Base URL, but all the other form parameters should be extracted from the HTML with BeautifulSoup. You can choose to download Excel or HTML files -- they both have pros and cons, as you will find out after a quick check. You can also choose to download data at different granularities (e.g., per semester, per year, etc.) but I recommend you not to download all the data in one shot because 1) the requests are likely to timeout and 2) we will overload the IS-Academia server.


In [6]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import glob
import requests
import re
from bs4 import BeautifulSoup
sns.set_context('notebook')

# Getting the data

## Finding ISA form 

The first part of the job in order to get the data is to get the parameters required to get the data we want.

In this purpose, we first do a get request on the ISA form with the link <http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportModel=133685247>.

We also use BeautifulSoup on the resulting html response in order to parse it later.

In [178]:
r = requests.get('http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportModel=133685247')
r.headers['content-type']
html_doc = r.text
isaForm = BeautifulSoup(html_doc, 'html.parser')

In [179]:
isaForm

<html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"><div></div><title></title><script src="GEDPUBLICREPORTS.txt?ww_x_path=Gestac.Base.Palette_js&amp;ww_c_langue=fr" type="text/javascript"></script><link href="GEDPUBLICREPORTS.css?ww_x_path=Gestac.Moniteur.Style" rel="stylesheet" type="text/css"><link href="GEDPUBLICREPORTS.css?ww_x_path=Gestac.Moniteur.StyleNavigator" rel="stylesheet" type="text/css"/></link></meta></head><body alink="#666666" bgcolor="#ffffff" link="#666666" marginheight="0" marginwidth="5" vlink="#666666"><div class="filtres"><form action="!GEDPUBLICREPORTS.filter" method="GET" name="f"><input name="ww_b_list" type="hidden" value="1"><input name="ww_i_reportmodel" type="hidden" value="133685247"><input name="ww_c_langue" type="hidden" value=""><h1 id="titre">Liste des étudiants inscrits par semestre</h1><table border="0" id="format"><tr><th>Format:</th></tr><tr><td><input checked="" name="ww_i_reportModelXsl" type="radio" value="133685270">

## Finding form IDs

Now that we've got the form's html code, we need to know which values of the form are used to filter and displayed the desired data. The values we're interested in are 'unité académique', 'période académique' and 'période pédagogique' (corresponding respectively to section, academic year and semester).

By inspecting the html code, we saw that the form items are 'option', it is then easy to get their value by using BeautifupSoup find and find_all method.

The following code will simply find the option value corresponding to section 'Informatique', and output it's value (the id used to filter the result).
```python
    isaForm.find('option', text = re.compile('Informatique'))['value']
```

We do the same thing for Bachelor 1st and 6th semester.
```python
    semester_ids['Bachelor semestre 1'] = isaForm.find('option', 
                                                        text = re.compile('Bachelor semestre 1'))['value']
    semester_ids['Bachelor semestre 6'] = isaForm.find('option', 
                                                        text = re.compile('Bachelor semestre 6'))['value']
```

And we get the academic years ids from 2007-2008 to 2016-2017 using a for loop (see in the cell below)

In [8]:
informatique_id = isaForm.find('option', text = re.compile('Informatique'))['value']

print("Id of informatique : ", informatique_id, "\n")

master_semester_ids = {}

#print(isaForm)

#i=1
#a = isaForm.find('option', text = re.compile('Master semestre ' + str(i)))
#a = isaForm.find('option','Bachelor semestre 1')
#print(a)

for i in range(1, 5):
    master_semester_ids['Master semestre ' + str(i)] = isaForm.find('option', text = re.compile('Master semestre ' + str(i)))['value']
    print('Id of Master semester '+ str(i)+' : '+ master_semester_ids['Master semestre '+str(i)])
    
for i in range(1, 3):
    master_semester_ids['Mineur semestre ' + str(i)] = isaForm.find('option', text = re.compile('Mineur semestre ' + str(i)))['value']
    print('Id of Mineur semester '+ str(i)+' : '+ master_semester_ids['Mineur semestre '+str(i)])  
    
master_semester_ids['Projet Master fall'] = isaForm.find('option', text = re.compile('Projet Master automne'))['value']
print('Id of Projet Master fall'+' : '+ master_semester_ids['Projet Master fall']) 

master_semester_ids['Projet Master spring'] = isaForm.find('option', text = re.compile('Projet Master printemps'))['value']
print('Id of Projet Master spring'+' : '+ master_semester_ids['Projet Master spring']) 

print('\n')

year_ids = {}
for y in range(2007, 2017):
    school_year = str(y) + "-" + str(y+1)
    year_ids[str(y) + "-" + str(y+1)] = [isaForm.find('option', text = re.compile(school_year))['value']]
    
print("years ids : (from 2007-2008 to 2016-2017)", year_ids)


Id of informatique :  249847 

Id of Master semester 1 : 2230106
Id of Master semester 2 : 942192
Id of Master semester 3 : 2230128
Id of Master semester 4 : 2230140
Id of Mineur semester 1 : 2335667
Id of Mineur semester 2 : 2335676
Id of Projet Master fall : 249127
Id of Projet Master spring : 3781783


years ids : (from 2007-2008 to 2016-2017) {'2011-2012': ['123455150'], '2014-2015': ['213637922'], '2008-2009': ['978187'], '2012-2013': ['123456101'], '2010-2011': ['39486325'], '2015-2016': ['213638028'], '2009-2010': ['978195'], '2007-2008': ['978181'], '2013-2014': ['213637754'], '2016-2017': ['355925344']}




## Filtering and getting the data

Now that we know the interesting IDs used in the form, we need to filter and request our data. For this purpose, we used Postman and Postman interceptor to intercept and inspect the request method used to get the data from the formula. 
  
</br>




The picture below shows all parameters used in the URL to filter and return results for:
* Section "Informatique"
* Academic period "2016-2017"
* Pedagogic period "Bachelor semestre 1"

<p>
    <img src="img/postman.png" alt="postman" align="center"/>
</p>

After playing a bit with the URL, we conclude that not all parameters were mandatory, the required parameters and their values are:

|parameter  | value |
|-----------|-------|
|ww_b_list  |must be '1'|  
|ww_i_reportmodel|must be '133685247'|
|ww_i_reportModelXsl|must be '133685270'|
|ww_x_UNITE_ACAD|correspond to the id of the section, taken from the form|
|ww_x_PERIODE_ACAD|correspond to the id of the academic year, taken from the form|
|ww_x_PERIODE_PEDAGO|correspond to the id of the semester, taken from the form|

Therefore we create a parameters dictionnary and put all the need parameters in order to get the correct URL.


The filter returns us a new html page containing two possibilities link to display the data. 
Since we used very precise filter in the form (specifying years, semester and section), there is only one set of data to display, meaning that both link ("Tous" and "Informatique, 'years', 'semester'") leads to the same dataset.

We choose to get the link from the "Informatique, 'years', 'semester', therefore, by inspecting the html code, we saw that the parameters used in the link was "ww_x_GPS", we simply get it from the html page for the desired data.

We can then simply request the dataset, using the base URL we found thanks to Postman, the parameters used for the filter and the ww_x_GPS id


In [9]:
def getFilteredPage(academic_year, semester):
    params = {'ww_b_list':'1',
            'ww_i_reportmodel':'133685247',
            'ww_i_reportModelXsl':'133685270',
            'ww_x_UNITE_ACAD':informatique_id,
            'ww_x_PERIODE_ACAD':year_ids[academic_year],
            'ww_x_PERIODE_PEDAGO':master_semester_ids[semester]}
    r = requests.get('http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?', params)
    html_doc = r.text
    return BeautifulSoup(html_doc, 'html.parser'), params


In [10]:
def getResultPage(academic_year, semester):
    filteredPage, params = getFilteredPage(academic_year, semester)
    params['ww_x_GPS'] = filteredPage.find_all('a')[1].get('onclick').split("ww_x_GPS=")[1].split("')")[0]
    r = requests.get('http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.bhtml?', params)
    return BeautifulSoup(r.text, 'html.parser')

In [184]:
def getResultAllYears(semester):
    global_table = []
    for year_id in year_ids:
        soup = getResultPage(year_id, semester)

        students_tr = soup.body.hr.table.find_all('tr')[2:]
        students = []
        for i in range (0,len(students_tr)):
            student = students_tr[i].find_all('td')
            students.append([student[0].text,student[1].text.replace(u'\xa0', u' '),student[7].text,student[10].text])

        pd_student = pd.DataFrame(students, columns=['Gender', 'Name', 'Status_' + year_id, 'Sciper'])

        global_table.append(pd_student)
    return global_table

In [186]:
def getResultAllYearsSpec(semester):
    global_table = []
    for year_id in year_ids:
        soup = getResultPage(year_id, semester)

        students_tr = soup.body.hr.table.find_all('tr')[2:]
        students = []
        for i in range (0,len(students_tr)):
            student = students_tr[i].find_all('td')
            students.append([student[0].text,student[1].text.replace(u'\xa0', u' '),student[7].text,student[10].text,student[4].text])

        pd_student = pd.DataFrame(students, columns=['Gender', 'Name', 'Status_' + year_id, 'Sciper','Spécialisation'])

        global_table.append(pd_student)
    return global_table

In [12]:
def joinTables(global_table):
    joined_table = global_table[0]
    for single_table in global_table[1:len(global_table)]:
        joined_table = pd.merge(joined_table, single_table, how='outer', on=['Gender','Name','Sciper'])
    return joined_table

In [13]:
def formatTable(joined_table_bch):
    bch_no_string = joined_table_bch.drop(joined_table_bch.columns[[0,1,3]], axis=1)
    bch = bch_no_string.sort_index(axis=1)
    
    year = 2007
    for bch_col in bch:
        bch[bch_col] = bch[bch_col].replace('Présent', year)
        year = year + 1
    return bch
 
    

### Get Value Master semester 1

In [43]:
global_table_master1 = getResultAllYears('Master semestre 1')

AttributeError: 'list' object has no attribute 'heaad'

In [73]:
#global_table_master1

In [47]:
joined_table_master1 = joinTables(global_table_master1)
joined_table_master1.head()

Unnamed: 0,Gender,Name,Status_2011-2012,Sciper,Status_2014-2015,Status_2008-2009,Status_2012-2013,Status_2010-2011,Status_2015-2016,Status_2009-2010,Status_2007-2008,Status_2013-2014,Status_2016-2017
0,Monsieur,Arnfred Jonas,Présent,184772,,,,,,,,,
1,Monsieur,Asgari Ehsaneddin,Présent,211754,,,,,,,,,
2,Monsieur,Baeriswyl Jonathan,Présent,179406,,,,,,,,,
3,Madame,Bai Yi,Présent,209850,,,,,,,,,
4,Monsieur,Barroco Michael,Présent,179428,,,,,,,,,


In [48]:
joined_table_master1.shape

(901, 13)

In [49]:
#with pd.option_context('display.max_rows', 999, 'display.max_columns', 999):
#    print(joined_table_master1)

In [67]:
#table_no_conge_master1 = joined_table_master1
#print(table_no_conge_master1)
table_no_conge_master1 = joined_table_master1.replace('Congé', 'Présent').replace('Attente', np.nan).replace('Stage','Présent').dropna(thresh=4)


print('shape before replace :')
print(joined_table_master1.shape)

print('fall shape after replace :')
print(table_no_conge_master1.shape)

assert(joined_table_master1.shape==table_no_conge_master1.shape)


table avant :
(901, 13)
table no conge :
(901, 13)


In [68]:
formated_table_master1 = formatTable(table_no_conge_master1)
master1_min = formated_table_master1.min(axis=1)
#print(formated_table_master1)

master1_min.head()
#print(master1_min)

0    2011.0
1    2011.0
2    2011.0
3    2011.0
4    2011.0
dtype: float64

In [153]:
master1_string = table_no_conge_master1.drop(table_no_conge_master1.columns[[1,2,4,5,6,7,8,9,10,11,12]], axis=1)
master1_final = pd.merge(master1_string, master1_min.to_frame(),left_index=True, right_index=True)
master1_final.columns = [['Gender', 'Sciper', 'Start_year']]
master1_final#.head()
#print(master1_final)

Unnamed: 0,Gender,Sciper,Start_year
0,Monsieur,184772,2011.0
1,Monsieur,211754,2011.0
2,Monsieur,179406,2011.0
3,Madame,209850,2011.0
4,Monsieur,179428,2011.0
5,Monsieur,184814,2011.0
6,Monsieur,179426,2011.0
7,Monsieur,185949,2011.0
8,Monsieur,212234,2011.0
9,Monsieur,179157,2011.0


### Get Value Projet Master fall spring

In [80]:
global_table_prj_fall = getResultAllYears('Projet Master fall')
global_table_prj_spring = getResultAllYears('Projet Master spring')

In [84]:
joined_table_prj_fall = joinTables(global_table_prj_fall)
joined_table_prj_fall.head()

Unnamed: 0,Gender,Name,Status_2011-2012,Sciper,Status_2014-2015,Status_2008-2009,Status_2012-2013,Status_2010-2011,Status_2015-2016,Status_2009-2010,Status_2007-2008,Status_2013-2014,Status_2016-2017
0,Monsieur,Atitallah Samir,Présent,196669,,,,,,,,,
1,Madame,Bogos Sonia Mihaela,Stage,200065,,,,,,,,,
2,Monsieur,Buchschacher Nicolas,Présent,171619,,,,,,,,,
3,Monsieur,Frélich Lukás,Stage,200597,,,,,,,,,
4,Monsieur,Houdemer Charles-Henry,Présent,170239,,,,,,,,,


In [85]:
joined_table_prj_spring = joinTables(global_table_prj_spring)
joined_table_prj_spring.head()

Unnamed: 0,Gender,Name,Status_2011-2012,Sciper,Status_2014-2015,Status_2008-2009,Status_2012-2013,Status_2010-2011,Status_2015-2016,Status_2009-2010,Status_2007-2008,Status_2013-2014,Status_2016-2017
0,Monsieur,Bloch Marc-Olivier,Présent,178553,,,,,,,,,
1,Monsieur,Bricola Jean-Charles,Stage,180731,,,,,,,,,
2,Monsieur,Chakrabarty Soumitro,Stage,199654,,,,,,,,,
3,Monsieur,Dosmukhamedov Diyar,Présent,192861,,,,,,,,,
4,Monsieur,Gruner Samuel,Congé,170235,,,,,,,,,


In [90]:
print("fall shape : ")
print(joined_table_prj_fall.shape)
print("spring shape : ")
print(joined_table_prj_spring.shape)

fall shape : 
(86, 13)
spring shape : 
(42, 13)


In [98]:
table_no_conge_prj_fall = joined_table_prj_fall.replace('Congé', 'Présent').replace('Attente', np.nan).replace('Stage','Présent').dropna(thresh=4)

print('fall shape before replace   :')
print(joined_table_prj_fall.shape)

print('fall shape after replace    :')
print(table_no_conge_prj_fall.shape)

assert(joined_table_prj_fall.shape==table_no_conge_prj_fall.shape)


table_no_conge_prj_spring = joined_table_prj_spring.replace('Congé', 'Présent').replace('Attente', np.nan).replace('Stage','Présent').dropna(thresh=4)

print('spring shape before replace :')
print(joined_table_prj_spring.shape)

print('spring shape after replace  :')
print(table_no_conge_prj_spring.shape)

assert(joined_table_prj_spring.shape==table_no_conge_prj_spring.shape)

fall shape before replace   :
(86, 13)
fall shape after replace    :
(86, 13)
spring shape before replace :
(42, 13)
spring shape after replace  :
(42, 13)


In [134]:
formated_table_prj_fall = formatTable(table_no_conge_prj_fall)
prj_fall_max = formated_table_prj_fall.max(axis=1)

prj_fall_max.head()

formated_table_prj_spring = formatTable(table_no_conge_prj_spring)
prj_spring_max = formated_table_prj_spring.max(axis=1)

prj_spring_max.head()
#print(prj_spring_max)

0    2011.0
1    2011.0
2    2011.0
3    2011.0
4    2011.0
dtype: float64

In [135]:
prj_fall_string = table_no_conge_prj_fall.drop(table_no_conge_prj_fall.columns[[1,2,4,5,6,7,8,9,10,11,12]], axis=1)
prj_fall_final = pd.merge(prj_fall_string, prj_fall_max.to_frame(),left_index=True, right_index=True)
prj_fall_final.columns = [['Gender', 'Sciper', 'End_year_fall']]
prj_fall_final.head()
#print(master1_final)


Unnamed: 0,Gender,Sciper,End_year_fall
0,Monsieur,196669,2011.0
1,Madame,200065,2011.0
2,Monsieur,171619,2011.0
3,Monsieur,200597,2011.0
4,Monsieur,170239,2011.0


In [136]:
prj_spring_string = table_no_conge_prj_spring.drop(table_no_conge_prj_spring.columns[[1,2,4,5,6,7,8,9,10,11,12]], axis=1)
prj_spring_final = pd.merge(prj_spring_string, prj_spring_max.to_frame(),left_index=True, right_index=True)
prj_spring_final.columns = [['Gender', 'Sciper', 'End_year_spring']]

year_adjuster= lambda x : x+0.5
prj_spring_final['End_year_spring'] = prj_spring_final['End_year_spring'].apply(year_adjuster)
prj_spring_final.head()


#print(master1_final)

Unnamed: 0,Gender,Sciper,End_year_spring
0,Monsieur,178553,2011.5
1,Monsieur,180731,2011.5
2,Monsieur,199654,2011.5
3,Monsieur,192861,2011.5
4,Monsieur,170235,2011.5


In [137]:
projet_end = pd.merge(prj_fall_final, prj_spring_final,how='outer',on=['Gender','Sciper'])
projet_end.head()

#projet_end.shape
#projet_end.loc[projet_end['End_year_fall']==]



Unnamed: 0,Gender,Sciper,End_year_fall,End_year_spring
0,Monsieur,196669,2011.0,
1,Madame,200065,2011.0,
2,Monsieur,171619,2011.0,
3,Monsieur,200597,2011.0,
4,Monsieur,170239,2011.0,


In [151]:
projet_end_year = projet_end.drop(projet_end.columns[[0,1]],axis=1)
projet_end_year.shape
#print(projet_end_year.head)


tmp_projet_real_max_year = projet_end_year.max(axis=1)
#print(tmp_projet_real_max_year)


final_projet_max_year = pd.merge(projet_end.drop(projet_end.columns[[2,3]],axis=1),tmp_projet_real_max_year.to_frame(),left_index=True,right_index=True)
#a = projet_end.drop(projet_end.columns[[0,1]],axis=1)


final_projet_max_year.columns = [['Gender', 'Sciper', 'End_year_Project']]
print(final_projet_max_year)
final_projet_max_year.drop(projet_end.columns[[0,1]],axis=1).describe()


       Gender  Sciper  End_year_Project
0    Monsieur  196669            2011.0
1      Madame  200065            2011.0
2    Monsieur  171619            2011.0
3    Monsieur  200597            2011.0
4    Monsieur  170239            2011.0
5    Monsieur  181017            2011.0
6    Monsieur  200808            2011.0
7    Monsieur  170176            2011.0
8    Monsieur  171195            2011.0
9    Monsieur  185301            2014.0
10     Madame  180027            2008.0
11   Monsieur  159852            2008.0
12   Monsieur  166805            2008.0
13   Monsieur  172264            2008.0
14   Monsieur  202059            2012.0
15   Monsieur  170235            2012.0
16   Monsieur  191471            2010.5
17   Monsieur  146742            2010.5
18   Monsieur  191313            2010.0
19   Monsieur  233184            2015.0
20   Monsieur  233543            2015.0
21     Madame  183512            2009.0
22   Monsieur  160213            2009.5
23     Madame  183605            2009.5


Unnamed: 0,End_year_Project
count,121.0
mean,2013.115702
std,3.0746
min,2007.5
25%,2010.5
50%,2014.0
75%,2016.0
max,2016.0


In [170]:


master_data = pd.merge(master1_final, final_projet_max_year,how='inner',on=['Gender','Sciper'])
master_data.head()


Unnamed: 0,Gender,Sciper,Start_year,End_year_Project
0,Monsieur,210215,2011.0,2013.0
1,Monsieur,206923,2011.0,2012.5
2,Monsieur,166075,2014.0,2016.0
3,Monsieur,194182,2013.0,2016.0
4,Monsieur,213664,2014.0,2016.0


In [176]:
master_duration = (master_data.End_year_Project - master_data.Start_year + 0.5).to_frame()

master_duration_clean = pd.merge(master_data.drop(master_data.columns[[2,3]],axis=1),master_duration,left_index=True,right_index=True)
master_duration_clean.columns = [['Gender', 'Sciper', 'Number_of_year_master']]

master_duration_clean.describe()

Unnamed: 0,Number_of_year_master
count,114.0
mean,2.096491
std,0.514726
min,1.5
25%,1.5
50%,2.0
75%,2.5
max,4.0


In [177]:
master_duration_clean.groupby('Gender').mean()

Unnamed: 0_level_0,Number_of_year_master
Gender,Unnamed: 1_level_1
Madame,2.076923
Monsieur,2.09901


## Specialisation

In [187]:
master_sem1_spec = getResultAllYearsSpec('Master semestre 1')

In [195]:
master_sem2_spec = getResultAllYearsSpec('Master semestre 2')

In [196]:
master_sem3_spec = getResultAllYearsSpec('Master semestre 3')

In [251]:
joined_table_master1_spec = joinTables(master_sem1_spec)
joined_table_master1_spec = joined_table_master1_spec.replace('',np.nan)
#joined_table_master1_spec.dropna(thresh=)
joined_table_master1_spec.shape
sciper_num_master1_spec = joined_table_master1_spec.drop(joined_table_master1_spec.columns[[0,1,2,5,7,9,11,13,15,17,19,21]],axis=1)
sciper_num_master1_spec = sciper_num_master1_spec.dropna(thresh=2)



sciper_num_master1_spec = sciper_num_master1_spec.replace(np.nan,'')
tmp= sciper_num_master1_spec[[1]]


for i in range(0,5) : 
    tmp=tmp+sciper_num_master1_spec[[i]]
    #tmp=tmp.map(str)+ column.map(str)

#sciper_num_master1_spec

tmp

#sciper_num_master1_spec = sciper_num_master1_spec.max(axis=1,skipna=True)
#sciper_num_master1_spec = sciper_num_master1_spec.replace(np.nan,'')
#sciper_num_master1_spec
#sciper_num_master1_spec = sciper_num_master1_spec.drop(sciper_num_master1_spec.columns[[1,2,3,4,5,6,7,8,9,10]],axis#=1)
#sciper_num_master1_spec
#final_projet_max_year = pd.merge(projet_end.drop(projet_end.columns[[2,3]],axis=1),tmp_projet_real_max_year.to_frame(),left_index=True,right_index=True)

Unnamed: 0,Sciper,Spécialisation_x,Spécialisation_y
11,,,
21,,,
24,,,
28,,,
48,,,
53,,,
55,,,
57,,,
63,,,
69,,,
