# Person Analysis
**Aims :**

1. We want to see how a person behaves at each vote -> Create a DataFrame that records whether the person voted like the average of the party (Could be a good thing to have what the party instructs to vote at this time), and whether the person was present.
2. We want to compute some global statistics about each person :
    - Percentage of the time during which a person votes like his party
    - Percentage of abstention
    - Percentage of absence (Days or vote the person did not take ?)
    - Time at which he entered the parliament.
    
    
To do so, we need a few things :

1. The *Vote* file that we parsed, that records all the votations taking place at the parliament. We will need to find the earliest law that each person votes, and then, from this, count all the laws that have been voted, and count the number of votes the person did not attend.
2. The *Voting* file will be our primary source of informations, recording all the votes of every person at the parliament. We will need to split it by each person, which was already done beforehand in the `01-ML` folder.

**Ideal visualisation :** 
Whenever we enter the page about someone -> Some general information (from Gaël's team maybe), and then, a graph with the votes (Q: What to display ?)

-> Idea : a vertical bar, which height corresponds to the number of law that the deputee voted on. Inside the vertical bar, show 5 sectors : percentage of yes/no/abstention/absent/president for each period. Then, click on it and a detailled view on a single period appears -> Same kind of graph, except the guy voted yes or no for a law (color for yes/no/abstention, other color for absent) + Color for whether the law was passed or not.

Maybe use [stacked bar chart](http://bl.ocks.org/mbostock/3886208) for the overall view (shift it so the bars are horizontal) or alternatively [grouped to stacked](https://bl.ocks.org/mbostock/3943967)


When you click on a session, maybe display something like [Table with bar chart](http://bl.ocks.org/llad/3918637) with change to another page, with either again the same chart, or a [grouped bar chart](http://bl.ocks.org/mbostock/3887051), or we could use a mix between [collapsed tree](http://bl.ocks.org/mbostock/1093025) and a [dendrogram with bars](https://bl.ocks.org/dahis39/f28369f0b17b456ac2f1fa9b937c5002)

In [None]:
import pandas as pd
import glob
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%load_ext autoreload
%autoreload 2

# There's a lot of columns in the DF. 
# Therefore, we add this option so that we can see more columns
pd.options.display.max_columns = 100

# 1. Loading and formatting the Vote DataFrame
Convert *date* into *datetime* format with `pd.to_datetime` allows us to sort the dates the way we want.

- *Attributes of datetime object:* year, month, day, hour, minute, second, microsecond, and tzinfo.

A few useful German words unknown to me.

*Antrag* : demande


In [None]:
vote_df = pd.read_csv('../datas/scrap/Vote/legiid_37-50.csv',index_col=0)
print('Entries in the DataFrame',vote_df.shape)
vote_df = vote_df[['VoteEnd','BillTitle','BusinessTitle','Subject','MeaningNo','MeaningYes','BusinessShortNumber','ID','IdLegislativePeriod','IdSession',
                  ]]
vote_df['VoteEnd'] = vote_df['VoteEnd'].apply(pd.to_datetime)
vote_df.sort_values('VoteEnd',ascending=True).head()

N.B. `SessionName` field is rubbish -> The correct session is not the one mentionned in it.


- PLR : 	Parti libéral-radical (PLR. Les Libéraux-Radicaux)
- PDC : 	Parti démocrate-chrétien suisse
- PS : 	Parti socialiste suisse
- UDC :	Union démocratique du centre
- PES :	Parti écologiste suisse (Les Verts)
- PVL :	Parti vert-libéral
- PBD :	Parti Bourgeois-Démocratique

# 2. Load the Voting DataFrame

In [None]:
def load_voting():
    """
        Loads the Voting DataFrame and formats it to get the correct name of parties and tags, 
        as well as only considering the fields we actually need. 
    """
    
    # 1. Load the Voting files and concatenate them as well as the session dataframe to identify the sessions
    dataset_tmp = []
    path = '../datas/scrap/Voting'
    allFiles = glob.glob(os.path.join(path, 'Session*.csv'))

    for file_ in allFiles:
        data_tmp = pd.read_csv(file_,index_col=0)
        dataset_tmp += [data_tmp] 
    voting_df = pd.concat(dataset_tmp)

    print('Entries in the DataFrame',voting_df.shape)
    
    session_path = '../datas/scrap/Session/Legiid_37-50.csv'
    session_df = pd.read_csv(session_path,index_col=0)
    session_df.set_index('ID',inplace=True)
    session_df['Year']=session_df['StartDate'].apply(pd.to_datetime).apply(lambda x: x.year)
    session_df_name = session_df[['SessionName','Year']]
    
    # 2. Update the names of the parties
    parties_name_dict = {'Groupe écologiste':'Parti écologiste suisse (Les Verts)', 'Groupe socialiste':'Parti socialiste', 
        "Groupe vert'libéral":'Parti vert-libéral', 'Groupe radical-démocratique':'Parti libéral-radical',
        'Groupe des Paysans, Artisans et Bourgeois':'Union démocratique du centre',
        'Groupe conservateur-catholique':'Parti démocrate-chrétien', 
        'Groupe BD':'Parti Bourgeois-Démocratique', 'Non inscrit':'Non inscrit'}
    parties_dict = {'G':'PES', 'S':'PS', 'GL':'PVL', 'RL':'PLR', 
                     'V':'UDC', 'CE':'PDC', 'BD':'PBD', 'C':'PDC', '-':'Other', 'CEg':'PDC'}
    voting_df['ParlGroupName'].replace(pd.Series(parties_name_dict), inplace=True)
    voting_df['ParlGroupCode'].replace(pd.Series(parties_dict), inplace=True)
    
    # 3. Process some supplementary fields
    voting_df.insert(1,'Name', voting_df['FirstName'] + ' ' + voting_df['LastName'])
    voting_df['VoteEnd'] = voting_df['VoteEnd'].apply(pd.to_datetime)
    
    # 4. Keep only the relevant fields
    voting_df = voting_df[['Name','BillTitle','BusinessTitle', 'Decision', 'DecisionText','BusinessShortNumber','Canton','ParlGroupCode','ParlGroupName','ID','IdVote','IdSession','VoteEnd']]
    voting_df = voting_df.join(session_df_name, on='IdSession')
    return voting_df

voting_df = load_voting()

Still has a lot of data, but we tried to keep a minimum of it to make it readable by a human as well as having all the information we need to display.

In [None]:
voting_df.head()

# 3. Format the DataFrame
## a) Session level statistics for each of the deputees.
Now, format the way we want to for exporting for each person

First of all : Global stats -> *Yes/No/Abstention/Absence/Excused/President* at a session level.

In [None]:
def split_df(df, field):
    """
        Splits the input df along a certain field into multiple dictionaries which links each unique
        entry of the field to the entries in the dataframe
    """
    # Retrieve first all the unique Name entries
    unique_field = df[field].unique()
    print('Number of unique entries in',field,':',len(unique_field))
    #Create a dictionary of DataFrames which stores all the info relative to a single deputee
    df_dict = {elem : pd.DataFrame for elem in unique_field}

    for key in df_dict.keys():
        df_dict[key] = df.loc[df[field] == key]
    
    return df_dict

voting_dict = split_df(voting_df, 'Name')

Summary of the different votes possible

|**1** | **2** | **3**      | **4**    | **5**               | **6** |                **7**                       |
|------|-------|------------|----------|---------------------|-------|--------------------------------------------|
| Yes  | No    | Abstention | No entry | Did not participate |Excused| The president of the session does not vote |


Counting the percentage of abstention by person, in order to see whether there is a person that would abstain significantly more often than the rest. It does not appear to be the case.

In [None]:
#count_abst = lambda x: 
voting_test = voting_df[['Name','Decision']].groupby('Name').apply(lambda x: np.sum(x.Decision==3)/len(x.Decision))
voting_test.sort_values(ascending=False).head()

Simply trying the aggregation at a session level for a deputee in particular, it will have to be done for every single one after, then exporting the resulting DataFrame to a `.csv` or `.json` to make it readable with `javascript`.

In [None]:
test = voting_dict['Filippo Leutenegger']
test.head()

In [None]:
voting_dict['Filippo Leutenegger'].loc[voting_dict['Filippo Leutenegger'].IdSession==4913].SessionName.unique()[0]

In [None]:
test = voting_dict['Filippo Leutenegger'].groupby('IdSession')


count_yes = lambda x: np.sum(x==1)
count_no = lambda x: np.sum(x==2)
count_abstention = lambda x: np.sum(x==3)
count_absent = lambda x: np.sum(x==5)
count_excused = lambda x: np.sum(x==6)
count_president = lambda x: np.sum(x==7)
name_session = lambda x: x.unique()[0]
year_session = lambda x: x.unique()[0]

test = test.agg({'Decision':{'Yes': count_yes, 'No': count_no,'Abstention': count_abstention,
                'Excused':count_excused, 'Absent':count_absent, 'President':count_president},
                         'SessionName':{'SessionName':name_session},'Year':{'Year':year_session}})

test.columns = test.columns.droplevel(0)
#test_grouped = test_grouped.join(test['SessionName','Year'])
test = test[['Yes','No','Abstention','Absent','Excused','President','SessionName','Year']]
#test['Total'] = test.sum(axis=1)
test

Aggregating further up at a yearly level

In [None]:
test = test.drop('SessionName',1).groupby('Year').apply(sum).drop('Year',1)

Displaying the result.

In [None]:
test['Presence'] = test['Yes']+test['No']+test['Abstention']+test['President']
test['Absence'] = test['Excused']+test['Absent']

test.plot.bar(x=test.index, y=['Yes','No','Abstention','Excused','Absent','President'])
#test.plot.bar(x=test.index,y=['Presence','Absence'])

N.B. We checked for consistency, and from the time the person is elected, he appears at all subsequent votes (the sum of all the fields will give the total number of votes from the session)

### Exporting the results to the `analysis` folder in the datas 

In [None]:
def export_session_vote_csv(directory, deputee, df):
    """
        Computes the session level aggregation for a deputee and then stores in the directory given as input.
        Requires the name of the deputee, its associated dataframe
    """
    df_grouped = df.groupby('IdSession')

    count_yes = lambda x: np.sum(x==1); count_no = lambda x: np.sum(x==2); 
    count_abstention = lambda x: np.sum(x==3); count_absent = lambda x: np.sum(x==5);
    count_excused = lambda x: np.sum(x==6); count_president = lambda x: np.sum(x==7);
    # There is a unique session name and year as we group by session -> We want to keep it and found
    # this kind of dirty way to do it.
    name_session = lambda x: x.unique()[0]; year_session = lambda x: x.unique()[0]

    df = df_grouped.agg({'Decision':{'Yes': count_yes, 'No': count_no,'Abstention': count_abstention,
                    'Excused':count_excused, 'Absent':count_absent, 'President':count_president},
                    'SessionName':{'SessionName':name_session},'Year':{'Year':year_session}})

    df.columns = df.columns.droplevel(0)
    df = df[['Yes','No','Abstention','Absent','Excused','President','SessionName','Year']]
    df['Total'] = df.sum(axis=1)
    
    df.to_csv(directory+deputee+'_vote_session.csv')

In [None]:
directory = '../datas/analysis/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
for deputee, df in voting_dict.items():
    export_session_vote_csv(directory,deputee,df)