# Creating a data frame with Members of European Parliament (MEPS) names and their political parties affiliations

The purpose of this notebook is to unite files from [Dataset on Members of the European Parliament (1979-2019)](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/V2FJEF) in one data frame that contains information about Members of European Parliament (MEPS) for the period of 1996-2011.

This data frame is used to match Members of European Parliament with their political parties affiliations as one of the data preprocessing steps in [data_cleaning_preprocessing.ipynb](data_cleaning_preprocessing.ipynb).

If you want to run this notebook, you need to download the necessary files following [this link](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/V2FJEF).
Please, download the following files:
* Dataset MEPs EP4 1994-1999.xlsx
* Dataset MEPs EP5 1999-2004.xlsx
* Dataset MEPs EP 6 2004-2009.xlsx
* Dataset MEPs 7EP 2009-2014.xlsx

Create a folder 'mep_data/' in the same folder as this notebook, and place the downloaded files there. The notebook will be 
ready to run.

This code produces a 'MEP_dataframe.csv' ready to be used in [data_cleaning_preprocessing.ipynb](data_cleaning_preprocessing.ipynb).

## Upload files
To run this notebook you need to download the files mentioned above, create a folder 'mep_data/' in the same folder as this notebook, and place the downloaded files there.

In [1]:
import pandas as pd
import os
# ! pip install openpyxl

In [2]:
# ignore warnings to increase readability of this notebook, feel free to outcomment them
pd.options.mode.chained_assignment = None  # default='warn'
import numpy as np
np.warnings.filterwarnings('ignore', category=np.VisibleDeprecationWarning) 

In [None]:
import os
path = os.getcwd()
file_names = [os.path.join(path,'mep_data/Dataset MEPs EP4 1994-1999.xlsx'),
            os.path.join(path,'mep_data/Dataset MEPs EP5 1999-2004.xlsx'),
            os.path.join(path, 'mep_data/Dataset MEPs EP 6 2004-2009.xlsx'),
            os.path.join(path, 'mep_data/Dataset MEPs 7EP 2009-2014.xlsx')
         ]
file_names

## Data frames 1994 - 1999, 1999 -2004
The goal of this code is to unite 4 data frames that have slightly different format and labeling. First we preprocess these data frames separately.

The code under this section is working with the following files:
* mep_data/Dataset MEPs EP4 1994-1999.xlsx
* mep_data/Dataset MEPs EP5 1999-2004.xlsx

### Extract encodings of the data frames
The two .xlsx file contains several excel sheets, one of these sheets contains encodings of political parties, e.g. A corresponds to Independents for a European of Nations and Europe of Democracies and Diversities.

Here we upload encodings only from 'Dataset MEPs EP4 1994-1999.xlsx', because they are applicable to both files processed in this section.

In [4]:
# info from the first file: MEPS_data/5/Dataset MEPs EP4 1994-1999.xlsx
political_parties_df = pd.read_excel(file_names[0], sheet_name = 2)
political_parties_df

Unnamed: 0,EP Group,Abbeviation,Code
0,Independents for a European of Nations,I-EDN,A
1,Europe of Democracies and Diversities,EDD,A
2,European Democrats,ED,C
3,European People's Party,PPE,E
4,European People's Party-European Democrats,PPE-DE,E
5,Forza Europa,FE,F
6,Progressive European Democrats,DEP,G
7,European Democratic Alliance,RDE,G
8,Union for Europe,UPE,G
9,Union for a Europe of Nations,UEN,G


We group political parties by their codes (A to X) and create a dictionary of these codes and corresponding to them political party names.

In [5]:
political_party_grouped = political_parties_df.groupby(political_parties_df['Code'], as_index = False).agg(lambda x: list(x))
political_party_dic = dict(zip(political_party_grouped['Code'], political_party_grouped['EP Group']))
# political_party_dic

One of the excel sheets contains information about national parties, we also extract this information and create a dictionary of national parties encodings.

In [6]:
national_parties_df = pd.read_excel(file_names[0], sheet_name = 1)
# national_parties_df

In [7]:
national_party_name_dic = dict(zip(national_parties_df['Code'], national_parties_df['National Party']))
# national_party_name_dic

In [8]:
national_party_family_dic = dict(zip(national_parties_df['Code'], national_parties_df['Party Family']))
#national_party_family_dic

### Assign a far-right or not far-right label to political parties
Now we assign a label to each political parties code (A to X) telling if it's far right or not far right parties (0,1):
* 1 - far-right
* 0 - not far-right

We assign these labels manually based on public information about these political parties.

If you want to see which labels were assigned to which parties compare far_right_dic and political_party_dic.

In [9]:
far_right_dic = dict(zip(political_parties_df['Code'].unique(), [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]))
far_right_dic

{'A': 1,
 'C': 0,
 'E': 0,
 'F': 0,
 'G': 1,
 'L': 0,
 'M': 0,
 'O': 0,
 'N': 1,
 'R': 0,
 'S': 0,
 'V': 0,
 'X': 1}

### Transform data frames to new format
We create new data frames that contain names of the European Parliament members, far-right vs not far-right labels, and other information that might be useful for the analysis.

In [10]:
def process_table_1(df, file_name):
    new_table = df.loc[:,['full_name', 'country_name', 'epg_name', 'national party_id']]
    new_table.loc[:,['surname']] = [x.split(" ")[0].lower() for x in new_table['full_name']] 
    new_table.loc[:,['far_right']] = [far_right_dic[x] for x in new_table['epg_name']]
    new_table.loc[:,['epg_names']] = [political_party_dic[x] for x in new_table['epg_name']]
    new_table.loc[:,['national_party_name']] = [national_party_name_dic[x] for x in new_table['national party_id']]
    new_table.loc[:,['national_party_family']] = [national_party_family_dic[x] for x in new_table['national party_id']]
    new_table.loc[:,['years']] = file_name[-14:-5]
    new_table.rename(columns = {'epg_name':'epg_code',
                               'national party_id':'national_party_id',
                               }, inplace = True)
    new_table = new_table[['full_name', 'surname', 'far_right', 'years', 'country_name', 'epg_code', 'epg_names', 'national_party_id', 'national_party_name', 'national_party_family']]
    
    return new_table

First, the 'Dataset MEPs EP4 1994-1999.xlsx' file

In [11]:
# read the file
df_1994 = pd.read_excel(file_names[0], sheet_name = 0)

# manually correct data entry mistakes
## Dagmar Reichenbach
df_1994.loc[568, 'epg_name'] = 'S'  # '\xa0\xa0\xa0\xa0\xa0S&D' -> S
df_1994.loc[568, 'national party_id'] = 2407 #'\xa0\xa0\xa0\xa0\xa0Social Democratic Party' -> 2407
## Wagenknecht Sahra
df_1994.loc[719, 'epg_name'] = 'O'  # 'EUL/NGL' -> O
df_1994.loc[719, 'national party_id'] = 1205 # 31 -> 1205

# transform to new format
df_1994 = process_table_1(df_1994, file_names[0])
df_1994

  new_table.loc[:,['far_right']] = [far_right_dic[x] for x in new_table['epg_name']]


Unnamed: 0,full_name,surname,far_right,years,country_name,epg_code,epg_names,national_party_id,national_party_name,national_party_family
0,ADAM Gordon J.,adam,0,1994-1999,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
1,AELVOET Magda G.H.,aelvoet,0,1994-1999,Belgium,V,"[Green Group, Greens/European Free Alliance]",1101,"Anders gaan arbeiden, leven en vrijen",Grn
2,AGLIETTA Maria Adelaide,aglietta,0,1994-1999,Italy,V,"[Green Group, Greens/European Free Alliance]",1609,Verdi Arcobaleno / Federazione dei Verdi / Ver...,Grn
3,AHERN Nuala,ahern,0,1994-1999,Ireland,V,"[Green Group, Greens/European Free Alliance]",2203,Green Party,Grn
4,AHLQVIST Birgitta,ahlqvist,0,1994-1999,Sweden,S,"[Socialist Group, Party of European Socialists]",2306,Socialdemokratiska arbetarepartiet,Soc
...,...,...,...,...,...,...,...,...,...,...
737,WURTH-POLFER Lydie,wurth-polfer,0,1994-1999,Luxembourg,L,"[Liberal and Democratic Group, Liberal Democra...",1802,Parti démcratique,Lib
738,WURTZ Francis,wurtz,0,1994-1999,France,M,"[Communist Group, European United Left/Nordic ...",1409,Parti communiste française / Gauche unitaire /...,Left
739,WYNN Terence,wynn,0,1994-1999,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
740,ZIMMERMANN Wilmya,zimmermann,0,1994-1999,Germany,S,"[Socialist Group, Party of European Socialists]",1207,Sozialdemokratische Partei Deutschlands,Soc


Then, the 'Dataset MEPs EP5 1999-2004.xlsx' file

In [12]:
# read the file 
df_1999 = pd.read_excel(file_names[1], sheet_name = 0)

# transform the data frame
df_1999 = process_table_1(df_1999, file_names[1])
df_1999

  new_table.loc[:,['far_right']] = [far_right_dic[x] for x in new_table['epg_name']]


Unnamed: 0,full_name,surname,far_right,years,country_name,epg_code,epg_names,national_party_id,national_party_name,national_party_family
0,BERGER Maria,berger,0,1999-2004,Austria,S,"[Socialist Group, Party of European Socialists]",1005,Sozialdemokratische Partei Österreichs,Soc
1,BÖSCH Herbert,bösch,0,1999-2004,Austria,S,"[Socialist Group, Party of European Socialists]",1005,Sozialdemokratische Partei Österreichs,Soc
2,ECHERER Raina A. Mercedes,echerer,0,1999-2004,Austria,V,"[Green Group, Greens/European Free Alliance]",1002,Die Grünen – Die Grüne Alternative,Grn
3,ETTL Harald,ettl,0,1999-2004,Austria,S,"[Socialist Group, Party of European Socialists]",1005,Sozialdemokratische Partei Österreichs,Soc
4,FLEMMING Marialiese,flemming,0,1999-2004,Austria,E,"[European People's Party, European People's Pa...",1004,Österreichische Volkspartei,CDem
...,...,...,...,...,...,...,...,...,...,...
687,WYNN Terence,wynn,0,1999-2004,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
688,DONNELLY Alan John,donnelly,0,1999-2004,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
689,GREEN Pauline,green,0,1999-2004,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
690,BOOTH Graham H.,booth,1,1999-2004,U.K.,A,"[Independents for a European of Nations, Europ...",2409,United Kingdom Independence Party,Anti-EU


## Data frame 2004-2009
Now we do the same steps as above but for the 'Dataset MEPs EP 6 2004-2009.xlsx' file. It has a slightly different format and contains different encodings.

###  Extract encodings of the data frame

In [13]:
national_party_2004 = pd.read_excel(file_names[2], sheet_name = 1)
national_party_dic_2004 = dict(zip(national_party_2004['ID'], national_party_2004['Name']))
#national_party_dic_2004

### Assign a far-right or not far-right label to political parties
Now we assign a label to each political parties code (SOC, S&D, etc.) telling if it's far right or not far right parties (0,1):

* 1 - far-right
* 0 - not far-right

We assign these labels manually based on public information about these political parties.

In [14]:
df_2004 = pd.read_excel(file_names[2], sheet_name = 0)
df_2004

Unnamed: 0,nr,full_name,country_name,epg_name,id,national party_id,Newcomer/Amateur,Rielected in the EP,Rielected in the EP after interval,Number of past legislatures in the EP (not conisdering the 2004-2009),No political experience/amateur,National level,Code for previous national political roles,Local level,Code for previous local political roles,Gender,Role in the EP,Role in the party group,notes,Report
0,,"BERGER, Maria",Austria,SOC,2285.0,206,n,y,,2.0,,,,,,f,,,,5
1,,"BÖSCH, Herbert",Austria,SOC,2048.0,206,n,y,,2.0,,MP,1.0,,,m,committee chair,,,5
2,,"ETTL, Harald",Austria,SOC,2286.0,206,n,y,,2.0,,senior and/or junior minister,3.0,,,m,,,,6
3,,"KARAS, Othmar",Austria,EPP-ED,4246.0,207,n,y,,1.0,,member of national party leadership,4.0,,,m,,,,6
4,,"LEICHTFRIED, Jörg",Austria,SOC,28251.0,206,y,,,,,member of national party leadership,,,10.0,m,,,,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
908,,"WILLMOTT, Glenis",UK,SOC,35743.0,57,y,,,,y,,,,,f,,,,0
909,,"SIMPSON, Brian",UK,SOC,1309.0,57,n,y,,3.0,,,,,,m,,,,0
910,,"COLMAN, Trevor",UK,IND/DEM,94283.0,66,y,,,,,member of national party leadership,4.0,,,m,,,,0
911,,"VILLIERS, Theresa",UK,EPP-ED,4520.0,54,n,y,,1.0,,,,,,f,,,,0


In [15]:
df_2004['epg_name'].unique()
df_2004.loc[307,'epg_name'] = 'S&D'   # '\xa0\xa0\xa0\xa0\xa0S&D' ==> 'SOC'
df_2004.loc[535,'epg_name'] = 'na'    # nan -> 'na'
df_2004.loc[806,'epg_name'] = 'na'    # nan -> 'na'

far_right_dic_2004 = { 'SOC' : 0,
                      'S&D' : 0,
                      'EPP-ED' : 0,
                      'na' : 0,
                      'ALDE' : 0,
                      'G/EFA' : 0,
                      'EUL/NGL' : 0,
                      'IND/DEM' : 1,
                      'UEN' : 1}


In [16]:
party_names_dic_2004 = {'SOC' : "SOC",  # ? we are not 100% sure what SOC stands for, but we assume these are not far right parties
                      'S&D' : "The Progressive Alliance of Socialists and Democrats",
                      'EPP-ED' : "European People's Party Group and European Democrats",
                      'na' : 'NA',
                      'ALDE' : "Alliance of Liberals and Democrats for Europe Party",
                      'G/EFA' : "The Greens/European Free Alliance",
                      'EUL/NGL' : "The Left in the European Parliament",
                      'IND/DEM' : "Independence/Democracy",
                      'UEN' : "Union for Europe of the Nations"}

### Transform the data frame to new format
We create a new data frame that contains names of the European Parliament members, far-right vs not far-right labels, and other information that might be useful for the analysis.


In [17]:
df_2004 = df_2004[['full_name', 'country_name', 'epg_name', 'national party_id']]
df_2004.loc[:,['surname']] = [x.split(",")[0].lower() for x in df_2004['full_name']] # contains diacritics
df_2004.loc[:,['full_name']] = df_2004['full_name'].apply(lambda x: x.replace(',', ''))
df_2004.loc[:,['far_right']] = [far_right_dic_2004[x] for x in df_2004['epg_name']]
df_2004.loc[:,['years']] = file_names[2][-14:-5]

#manually correct data entry mistakes
df_2004.loc[307,'national party_id'] = 151    #Dagmar Reichenbach 
# "\xa0\xa0\xa0\xa0\xa0Social Democratic Party" -> 'Socialdemokratiet'

df_2004.loc[:,['national_party_name']] = [national_party_dic_2004[x] for x in df_2004['national party_id']]
df_2004.loc[:,['epg_names']] = [party_names_dic_2004[x] for x in df_2004['epg_name']]
df_2004.rename(columns = {'epg_name':'epg_code',
                          'national party_id' : 'national_party_id'              
                         }, inplace = True)
df_2004

  df_2004.loc[:,['far_right']] = [far_right_dic_2004[x] for x in df_2004['epg_name']]


Unnamed: 0,full_name,country_name,epg_code,national_party_id,surname,far_right,years,national_party_name,epg_names
0,BERGER Maria,Austria,SOC,206,berger,0,2004-2009,Sozialdemokratische Partei Österreichs,SOC
1,BÖSCH Herbert,Austria,SOC,206,bösch,0,2004-2009,Sozialdemokratische Partei Österreichs,SOC
2,ETTL Harald,Austria,SOC,206,ettl,0,2004-2009,Sozialdemokratische Partei Österreichs,SOC
3,KARAS Othmar,Austria,EPP-ED,207,karas,0,2004-2009,Österreichische Volkspartei - Liste Ursula Ste...,European People's Party Group and European Dem...
4,LEICHTFRIED Jörg,Austria,SOC,206,leichtfried,0,2004-2009,Sozialdemokratische Partei Österreichs,SOC
...,...,...,...,...,...,...,...,...,...
908,WILLMOTT Glenis,UK,SOC,57,willmott,0,2004-2009,Labour Party,SOC
909,SIMPSON Brian,UK,SOC,57,simpson,0,2004-2009,Labour Party,SOC
910,COLMAN Trevor,UK,IND/DEM,66,colman,1,2004-2009,UK Independence Party,Independence/Democracy
911,VILLIERS Theresa,UK,EPP-ED,54,villiers,0,2004-2009,Conservative and Unionist Party,European People's Party Group and European Dem...


##  Data frame 2009-2014
Now we do the same steps as above but for a 'Dataset MEPs 7EP 2009-2014.xlsx' file. It has a slightly different format and contains different encodings.

### Assign a far-right or not far-right label to political parties
We assign a label to each political parties code (NI, S&D, etc.) telling if it's far right or not far right parties (0,1):

* 1 - far-right
* 0 - not far-right

We assign these labels manually based on public information about these political parties.

In [18]:
df_2009 = pd.read_excel(file_names[3], sheet_name = 0)
df_2009

Unnamed: 0,fullName,country,politicalGroup,national Party,Newcomer in the EP,Rielected in the EP,Rielected in the EP after interval,Number of past legislatures in the EP (not conisdering the 2009-2014),No political experience/Amateur,National level,Code for previous national political roles,Local level,ode for previous local politcal roles,gender,Role in the EP,Role in the party group,Reports,Notes
0,Martin Ehrenhauser,Austria,NI,Hans-Peter Martin's List,y,,,,y,,,,,m,,,2.0,
1,Karin Kadenbach,Austria,S&D,Social Democratic Party,y,,,,,,,Member of regional parliament,5.0,f,,,1.0,
2,Othmar Karas,Austria,EPP,People's Party,n,y,,2.0,,member of national party leadership,4.0,,,m,vice president,vice chair,5.0,
3,Elisabeth Köstinger,Austria,EPP,People's Party,y,,,,,,,member of regional party leadership,,f,,member of the bureau,2.0,
4,Jörg Leichtfried,Austria,S&D,Social Democratic Party,N,Y,,1.0,,,,member of regional party leadership,10.0,m,,vice chair,8.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
811,Jean Roatta,France,EPP,Union for a Popular Movement,y,,,,,MP,1.0,,,m,,,0.0,
812,Yves Cochet,France,G–EFA,Europe Ecology,y,,,,,MP,1.0,,,m,,,0.0,
813,Nils TORVALDS,Finland,ALDE,Svenska folkpartiet,Y,,,,,member of national party leadership,,council member,9.0,M,,,1.0,
814,Isabelle THOMAS,France,S&D,Parti socialiste,Y,,,,,member of national party leadership,4.0,Member of regional parliament,5.0,F,,,2.0,


In [19]:
df_2009['politicalGroup'].unique()

far_right_dic_2009 = {
    'NI' : 0,
    'S&D' : 0, 
    'EPP' : 0,
    'G–EFA' : 0,
    'ECR' : 0,
    'ALDE' : 0,
    'NI/EFD' : 1,
    'EUL–NGL' : 0,
    'EFD' : 1,
    'EUL-NGL' : 0,
    'EUL/NGL' : 0,
    'G-EFA' : 0,
    'EPP-ED' : 0}

In [20]:
political_party_names_2009 = {
    'NI' : "Non-Inscrits",
    'S&D' : "The Progressive Alliance of Socialists and Democrats", 
    'EPP' : "European People's Party Group",
    'G–EFA' : "The Greens/European Free Alliance",
    'ECR' : "The European Conservatives and Reformists Group",
    'ALDE' : "Alliance of Liberals and Democrats for Europe Party",
    'NI/EFD' : "Non-Inscrits / Europe of Freedom and Democracy",   ## only Frank Vanhecke
    'EUL–NGL' : "The Left in the European Parliament",
    'EFD' : "Europe of Freedom and Democracy",
    'EUL-NGL' : "The Left in the European Parliament",
    'EUL/NGL' : "The Left in the European Parliament",
    'G-EFA' : "The Greens/European Free Alliance",
    'EPP-ED' : "European People's Party Group / European Democrats"}

### Transform the data frame to new format
We create a new data frame that contains names of the European Parliament members, far-right vs not far-right labels, and other information that might be useful for the analysis.

In [21]:
df_2009 = df_2009[['fullName', 'country', 'politicalGroup', 'national Party']]
df_2009.loc[:,['surname']] = [x.split(" ")[1].lower() for x in df_2009['fullName']] 
df_2009.loc[:,['far_right']] = [far_right_dic_2009[x] for x in df_2009['politicalGroup']]
df_2009.loc[:,['years']] = file_names[3][-14:-5]
df_2009.loc[:,['epg_names']] = [political_party_names_2009[x] for x in df_2009['politicalGroup']]
df_2009.rename(columns = {'country':'country_name',
                          'fullName' : 'full_name',
                          'politicalGroup': 'epg_code',
                          'national Party' : 'national_party_name'                    
                         }, inplace = True)
df_2009

  df_2009.loc[:,['far_right']] = [far_right_dic_2009[x] for x in df_2009['politicalGroup']]


Unnamed: 0,full_name,country_name,epg_code,national_party_name,surname,far_right,years,epg_names
0,Martin Ehrenhauser,Austria,NI,Hans-Peter Martin's List,ehrenhauser,0,2009-2014,Non-Inscrits
1,Karin Kadenbach,Austria,S&D,Social Democratic Party,kadenbach,0,2009-2014,The Progressive Alliance of Socialists and Dem...
2,Othmar Karas,Austria,EPP,People's Party,karas,0,2009-2014,European People's Party Group
3,Elisabeth Köstinger,Austria,EPP,People's Party,köstinger,0,2009-2014,European People's Party Group
4,Jörg Leichtfried,Austria,S&D,Social Democratic Party,leichtfried,0,2009-2014,The Progressive Alliance of Socialists and Dem...
...,...,...,...,...,...,...,...,...
811,Jean Roatta,France,EPP,Union for a Popular Movement,roatta,0,2009-2014,European People's Party Group
812,Yves Cochet,France,G–EFA,Europe Ecology,cochet,0,2009-2014,The Greens/European Free Alliance
813,Nils TORVALDS,Finland,ALDE,Svenska folkpartiet,torvalds,0,2009-2014,Alliance of Liberals and Democrats for Europe ...
814,Isabelle THOMAS,France,S&D,Parti socialiste,thomas,0,2009-2014,The Progressive Alliance of Socialists and Dem...


## Concatenate data frames

In [22]:
concatenated = pd.concat([df_1994, df_1999, df_2004, df_2009])
concatenated.reset_index(drop = True, inplace = True)
concatenated

Unnamed: 0,full_name,surname,far_right,years,country_name,epg_code,epg_names,national_party_id,national_party_name,national_party_family
0,ADAM Gordon J.,adam,0,1994-1999,U.K.,S,"[Socialist Group, Party of European Socialists]",2404,Labour Party,Soc
1,AELVOET Magda G.H.,aelvoet,0,1994-1999,Belgium,V,"[Green Group, Greens/European Free Alliance]",1101,"Anders gaan arbeiden, leven en vrijen",Grn
2,AGLIETTA Maria Adelaide,aglietta,0,1994-1999,Italy,V,"[Green Group, Greens/European Free Alliance]",1609,Verdi Arcobaleno / Federazione dei Verdi / Ver...,Grn
3,AHERN Nuala,ahern,0,1994-1999,Ireland,V,"[Green Group, Greens/European Free Alliance]",2203,Green Party,Grn
4,AHLQVIST Birgitta,ahlqvist,0,1994-1999,Sweden,S,"[Socialist Group, Party of European Socialists]",2306,Socialdemokratiska arbetarepartiet,Soc
...,...,...,...,...,...,...,...,...,...,...
3158,Jean Roatta,roatta,0,2009-2014,France,EPP,European People's Party Group,,Union for a Popular Movement,
3159,Yves Cochet,cochet,0,2009-2014,France,G–EFA,The Greens/European Free Alliance,,Europe Ecology,
3160,Nils TORVALDS,torvalds,0,2009-2014,Finland,ALDE,Alliance of Liberals and Democrats for Europe ...,,Svenska folkpartiet,
3161,Isabelle THOMAS,thomas,0,2009-2014,France,S&D,The Progressive Alliance of Socialists and Dem...,,Parti socialiste,


Check how many far-right politicians are in the data frame

In [23]:
# concatenated[concatenated['far_right'] == 1]  #311

Check what names repeat in the data frame

In [24]:
#concatenated['full_name'].value_counts()

Save the final data frame into 'MEP_dataframe.csv', you can use this data frame in [data_cleaning_preprocessing.ipynb](data_cleaning_preprocessing.ipynb)

In [25]:
concatenated.to_csv(path_or_buf = os.path.join(path, 'MEP_dataframe.csv'))