# 05 - Format Personal Information
From the scraped data from `MemberCouncil`, we want to format a file containing all the information relevant to the deputies we're dealing with, in order to be able to access it simply and have it centred at one place.

In [2]:
import pandas as pd
import numpy as np
import glob
import os

Import the *Voting*. It is heavy, but we load it in order to know which are the deputies for which we need to get additional data.

In [4]:
dataset_tmp = []
path = '../datas/scrap/Voting'
allFiles = glob.glob(os.path.join(path, 'Session*.csv'))

for file_ in allFiles:
    data_tmp = pd.read_csv(file_,index_col=0)
    dataset_tmp += [data_tmp] 
voting_df = pd.concat(dataset_tmp)

We create an array called *names* which contains all the unique name entries into the *Voting* dataframe.

In [6]:
voting_df['Name'] = voting_df['FirstName']+' '+voting_df['LastName']

names = voting_df['Name'].unique()

We now get the *MemberCouncil* dataframe, from which we'll scrap what we need for the infos on each deputee.

In [15]:
dataset_tmp = []
path = '../datas/scrap/MemberCouncil'
allFiles = glob.glob(os.path.join(path, 'MemberCouncilid*.csv'))

for file_ in allFiles:
    data_tmp = pd.read_csv(file_,index_col=0)
    dataset_tmp += [data_tmp] 
member_df = pd.concat(dataset_tmp)

member_df['Name'] = member_df['FirstName']+' '+member_df['LastName']

We see below that there are quite some information available. We'll consider only a few ones:
- Active
- Canton
- Name
- PartyName
- PartyAbbreviation
- DateOfBirth
- ...

But it wil be easy to add some other fields when needed.

In [14]:
member_df.columns
member_df = member_df.loc[member_df['Name'].isin(names)]
member_df_final = member_df[['Name','Active','CantonName','PartyName','PartyAbbreviation','DateOfBirth']]
member_df_final.loc[:,'DateOfBirth'] = member_df_final['DateOfBirth'].apply(pd.to_datetime).apply(lambda x: x.date())

Index(['Active', 'AdditionalActivity', 'AdditionalMandate',
       'BirthPlace_Canton', 'BirthPlace_City', 'Canton', 'CantonAbbreviation',
       'CantonName', 'Citizenship', 'Council', 'CouncilAbbreviation',
       'CouncilName', 'DateElection', 'DateJoining', 'DateLeaving', 'DateOath',
       'DateOfBirth', 'DateOfDeath', 'DateResignation', 'FirstName',
       'GenderAsString', 'ID', 'IdPredecessor', 'Language', 'LastName',
       'Mandates', 'MaritalStatus', 'MaritalStatusText', 'MilitaryRank',
       'MilitaryRankText', 'Modified', 'NumberOfChildren', 'OfficialName',
       'ParlGroupAbbreviation', 'ParlGroupFunction', 'ParlGroupFunctionText',
       'ParlGroupName', 'ParlGroupNumber', 'Party', 'PartyAbbreviation',
       'PartyName', 'PersonIdCode', 'PersonNumber'],
      dtype='object')

Now we match the party names to the ones we have in our database, in order to prevent as much as possible clashes in the data.

In [69]:
parties_name_dict = {'Alliance verte':'Parti écologiste suisse (Les Verts)', 'Groupe socialiste':'Parti socialiste', 
        "Parti vert'libéral":'Parti vert-libéral', 'PLR.Les Libéraux-Radicaux':'Parti libéral-radical',
        'Union Démocratique du Centre':'Union démocratique du centre',
        'Parti démocrate-chrétien suisse':'Parti démocrate-chrétien', 
        'Parti bourgeois-démocratique suisse':'Parti Bourgeois-Démocratique'}
parties_dict = {'PSS':'PS', 'pvl':'PVL', '-':'Sans Parti', 'nan':'Other'}
   
    
member_df_final['PartyName'].replace(pd.Series(parties_name_dict), inplace=True)
member_df_final['PartyAbbreviation'].replace(pd.Series(parties_dict), inplace=True)
member_df_final.loc[:,'Active'] = member_df_final['Active'].astype(str)
member_df_final.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


Unnamed: 0,Name,Active,CantonName,PartyName,PartyAbbreviation,DateOfBirth
11,Max Binder,False,Zurich,Union démocratique du centre,UDC,1947-11-26
17,Christoph Blocher,False,Zurich,Union démocratique du centre,UDC,1940-10-11
22,Roland F. Borer,False,Soleure,Union démocratique du centre,UDC,1951-01-27
24,Toni Bortoluzzi,False,Zurich,Union démocratique du centre,UDC,1947-02-16
68,Christoph Eymann,True,Bâle-Ville,Parti libéral démocrate,PLD,1951-01-15


The list of parties below shows all the parties we consider in a vote. This is more extensive than what we have in the voting fields.

In [64]:
member_df_final.PartyName.unique()

array(['Union démocratique du centre', 'Parti libéral démocrate',
       'Parti socialiste suisse', 'Parti libéral-radical', 'La Gauche',
       'Parti Bourgeois-Démocratique', 'Parti démocrate-chrétien',
       'Parti écologiste suisse', 'Parti évangélique suisse',
       'Lega dei Ticinesi', 'Parti vert-libéral',
       'Parti écologiste suisse (Les Verts)', 'Alternative Canton de Zoug',
       nan, 'sans parti', 'Parti chrétien-social',
       'Union Démocratique Féderale', 'Christlich-soziale Partei Obwalden',
       'Mouvement Citoyens Genevois', 'Grüne (Basels starke Alternative)',
       'Parti Suisse du travail'], dtype=object)

Exporting the names in a single json file each time. We have to perform a little trick to get each row into the dict we want, and then formatting it properly for exportation into a `.json` file.

In [95]:
import json
directory = '../datas/analysis/deputee_names/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
for deputee in names:
    deputee_list = member_df_final.loc[member_df_final.Name==deputee].to_dict(orient='records')
    deputee_list=deputee_list[0]
    deputee_list['DateOfBirth'] = deputee_list['DateOfBirth'].isoformat()
    with open(directory+deputee+'_info.json', 'w') as f:
        json.dump(deputee_list, f)