This notebook formats 2008 and 2014 city council election results with the 7 main parties. We can then use these data to model election results at the district level ("arrondissement"). We'll do that in another notebook.

In [47]:
%load_ext lab_black

import numpy as np
import pandas as pd

from typing import Set

AFFILIATIONS = {
    "farleft": ["LFG", "LEXG"],
    "left": ["LUG", "LSOC"],
    "green": ["LVEC"],
    "right": ["LUMP", "LUDI", "LUD"],
    "farright": ["LFN"],
}

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


First, here are some helper functions:

In [57]:
def extract_nuances(nuances_df: pd.DataFrame) -> Set[str]:
    """
    Extract the nuances competing in this election.
    From the dataframe of nuances, we check each column for each line. 
    If the cell is not empty and the nuance is not already counted, we add it to the set of nuances.
    """
    nuances_set = set()

    for _, line in nuances_df.iterrows():
        for col in nuances_df.columns:
            if pd.notnull(line[col]):
                nuances_set.update({line[col]})

    return nuances_set


def format_results(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    Take the raw df, for each line switch the nuance's label to column name, 
    and match it with the corresponding score of this party.
    Return a dataframe with the proper format.
    """
    res = {
        "date": df.date.values,
        "ville": df.ville.values,
        "arrondissement": df.arrondissement.values,
        "Exprimés": df["Exprimés"].values,
    }
    res.update({nuance: [] for nuance in nuances_set})

    nuances_lbls = df.filter(like="Code Nuance").columns
    scores_lbls = df.filter(like="Voix").columns

    # each line is an arrondissement:
    for _, line in df.iterrows():
        tempset = nuances_set.copy()

        # iterate over nuances in line:
        for n, s in zip(nuances_lbls, scores_lbls):
            name = line[n]
            score = line[s]
            if pd.notnull(name):
                # if 1st time we see this nuance in this line:
                if name in tempset:
                    res[name].append(score)
                    tempset.remove(name)
                # if we already saw this nuance in this line:
                else:
                    res[name][-1] += score
        # if nuance still in tempset after iteration, then it's not competing in this arrondissement:
        for nuance in tempset:
            res[nuance].append(np.nan)

    return pd.DataFrame(data=res)


def attribute_parties(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    From a dataframe with general party denomination, attribute colloquial party names and 
    add parties with same nuance.
    Then aggregate other parties and drop all useless ones.
    """
    for p in AFFILIATIONS.keys():
        # which candidate represents the party this year?
        intersection = list(nuances_set & set(AFFILIATIONS[p]))
        # take only LFI for farleft, starting in 2012:
        if "LFG" in intersection:
            df = df.rename(columns={"LFG": p})
        else:
            # add candidates with same nuance, then drop:
            if len(intersection) >= 2:
                df[p] = df[intersection].sum(axis=1)
                df.drop(intersection, axis=1, inplace=True)
            # rename column of only candidate of this party:
            elif len(intersection) == 1:
                df = df.rename(columns={intersection[0]: p})

    # aggregate other parties:
    rest = df.filter(like="L")
    df["other"] = rest.sum(axis=1)
    df.drop(rest.columns, axis=1, inplace=True)

    return df

### Municipales 2014
Let's begin by formatting 2014 city council election results - as they are already at the district level ("arrondissement") it will be easier.

In [62]:
d = pd.read_excel("data/election_results_1st_round/munic2014-ardmnt.xlsx")
d["date"] = d["Date de l'export"].dt.normalize()  # only interested in the date
d["ville"], _, d["arrondissement"] = d["Libellé de la commune"].str.split().str
d["arrondissement"] = d["arrondissement"].astype(int)
d = d.sort_values(["ville", "arrondissement"])
d.head()

Unnamed: 0,Date de l'export,Code du département,Type de scrutin,Libellé du département,Code de la commune,Libellé de la commune,Inscrits,Abstentions,% Abs/Ins,Votants,...,Liste.10,Sièges / Elu.10,Sièges Secteur.10,Sièges CC.10,Voix.10,% Voix/Ins.10,% Voix/Exp.10,date,ville,arrondissement
0,2014-03-25 12:52:00,69,LI2,RHONE,123SR01,Lyon secteur 1,16482,6936,42.08,9546,...,,,,,,,,2014-03-25,Lyon,1
1,2014-03-25 12:52:00,69,LI2,RHONE,123SR02,Lyon secteur 2,16863,6658,39.48,10205,...,,,,,,,,2014-03-25,Lyon,2
2,2014-03-25 12:52:00,69,LI2,RHONE,123SR03,Lyon secteur 3,52133,22494,43.15,29639,...,,,,,,,,2014-03-25,Lyon,3
3,2014-03-25 12:52:00,69,LI2,RHONE,123SR04,Lyon secteur 4,22557,9096,40.32,13461,...,,,,,,,,2014-03-25,Lyon,4
4,2014-03-25 12:52:00,69,LI2,RHONE,123SR05,Lyon secteur 5,28373,11724,41.32,16649,...,,,,,,,,2014-03-25,Lyon,5


The main difficulty lies in the fact that party labels (what we'll call "nuances") and party results are separated (they each have their dedicated column). Our first task is then to match each label with its results. Let's begin by isolating the useful columns:

In [63]:
subset = ["date", "ville", "arrondissement", "Exprimés"]
for i, j in zip(
    d.filter(like="Code Nuance").columns, d.columns[d.columns.str.startswith("Voix")]
):
    subset.append(i)
    subset.append(j)
d = d[subset]
d.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Code Nuance,Voix,Code Nuance.1,Voix.1,Code Nuance.2,Voix.2,...,Code Nuance.6,Voix.6,Code Nuance.7,Voix.7,Code Nuance.8,Voix.8,Code Nuance.9,Voix.9,Code Nuance.10,Voix.10
0,2014-03-25,Lyon,1,9433,LEXG,86,LFG,3156,LSOC,2447,...,LFN,583.0,,,,,,,,
1,2014-03-25,Lyon,2,10055,LFG,487,LSOC,2737,LVEC,609,...,,,,,,,,,,
2,2014-03-25,Lyon,3,29134,LFG,1579,LSOC,11256,LVEC,2854,...,LFN,3603.0,,,,,,,,
3,2014-03-25,Lyon,4,13199,LEXG,123,LFG,1323,LSOC,4522,...,LDIV,375.0,LUD,3493.0,LFN,1131.0,,,,
4,2014-03-25,Lyon,5,16405,LEXG,154,LFG,752,LSOC,5954,...,LFN,1857.0,,,,,,,,


The functions `extract_nuances` and `format_results` are designed to do just that: extract the unique nuances competing in this election, and then match each nuance with the corresponding score of the party:

In [64]:
nuances_set = extract_nuances(d.filter(like="Code Nuance"))
d = format_results(d, nuances_set)

d["LFG"] = d["LFG"].fillna(d["LPG"])  # same party
d = d.drop("LPG", axis=1)
d.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,LVEC,LFN,LDVD,LUMP,LUDI,LEXG,LDIV,LFG,LUG,LUD,LDVG,LSOC
0,2014-03-25,Lyon,1,9433,1064.0,583.0,,,,86.0,293.0,3156.0,,1804.0,,2447.0
1,2014-03-25,Lyon,2,10055,609.0,1159.0,,,,,325.0,487.0,,4738.0,,2737.0
2,2014-03-25,Lyon,3,29134,2854.0,3603.0,,,,,1681.0,1579.0,,8161.0,,11256.0
3,2014-03-25,Lyon,4,13199,1567.0,1131.0,,,,123.0,630.0,1323.0,,3493.0,410.0,4522.0
4,2014-03-25,Lyon,5,16405,1340.0,1857.0,,,,154.0,498.0,752.0,,5850.0,,5954.0


Now we have to attribute each general party denomination to its colloquial party name. However, some parties may have the same ideological leaning, or they have made alliance, so we have to add them together. The function `attribute_parties` take care of it, and then aggregates the rest of the parties into the category "other". It then drops all useless parties:

In [65]:
d = attribute_parties(d_, nuances_set)
d = d.rename(columns={"Exprimés": "N"})
d = d.reindex(
    ["date", "ville", "arrondissement", "N"] + list(AFFILIATIONS.keys()) + ["other"],
    axis=1,
)
d

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,right,farright,other
0,2014-03-25,Lyon,1,9433,3156.0,2447.0,1064.0,1804.0,583.0,379.0
1,2014-03-25,Lyon,2,10055,487.0,2737.0,609.0,4738.0,1159.0,325.0
2,2014-03-25,Lyon,3,29134,1579.0,11256.0,2854.0,8161.0,3603.0,1681.0
3,2014-03-25,Lyon,4,13199,1323.0,4522.0,1567.0,3493.0,1131.0,1163.0
4,2014-03-25,Lyon,5,16405,752.0,5954.0,1340.0,5850.0,1857.0,652.0
5,2014-03-25,Lyon,6,17920,561.0,4801.0,1110.0,8971.0,1867.0,610.0
6,2014-03-25,Lyon,7,19902,1543.0,7724.0,2165.0,4746.0,2597.0,1127.0
7,2014-03-25,Lyon,8,18543,1008.0,7473.0,1435.0,4303.0,3421.0,903.0
8,2014-03-25,Lyon,9,12220,686.0,5581.0,921.0,2701.0,1684.0,647.0
9,2014-03-25,Marseille,1,23480,2108.0,6331.0,,9063.0,3526.0,2452.0


### Municipales 2008

And with that, we're done formatting 2014 city council elections. Let's now turn to the 2008 election results, which present the same structure, but are disaggregated at the ballot box level - so we must first aggregate them at the district level:

In [None]:
# https://opendata.paris.fr/explore/dataset/bureaux-de-votes/table/
# https://fr.wikipedia.org/wiki/%C3%89lections_municipales_de_2008_%C3%A0_Paris#R%C3%A9sultats_par_arrondissement