This notebook formats results for Paris 2020 city-council elections at the district-level. This will allow us to evaluate the results of the model which was to predict these elections.
It's probable that this notebook will be deleted in future iterations, as this 2020 election will now enter our pool of past elections, from which subsequent models will learn.

In [1]:
import numpy as np
import pandas as pd

from typing import Dict, Set

AFFILIATIONS = {
    "farleft": ["LFI"],
    "left": ["LUG"],
    "green": ["LVEC"],
    "center": ["LUC"],
    "right": ["LUD"],
    "farright": ["LRN"],
}

First, here are some helper functions:

In [2]:
def select_columns(df: pd.DataFrame, party_col: str) -> pd.DataFrame:
    """
    party_col: either 'Nom', 'Nuance' or Code Nuance'.
    """
    subset = ["date", "ville", "arrondissement", "Exprimés"]
    for n, s in zip(
        df.filter(like=party_col).columns, df.columns[df.columns.str.startswith("Voix")]
    ):
        subset.append(n)
        subset.append(s)

    return df[subset]


def extract_nuances(nuances_df: pd.DataFrame) -> Set[str]:
    """
    Extract the nuances competing in this election.
    From the dataframe of nuances, we check each column for each line. 
    If the cell is not empty and the nuance is not already counted, we add it to the set of nuances.
    """
    nuances_set = set()

    for _, line in nuances_df.iterrows():
        for col in nuances_df.columns:
            if pd.notnull(line[col]):
                nuances_set.update({line[col]})

    return nuances_set


def format_results(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    Take the raw df, for each line switch the nuance's label to column name, 
    and match it with the corresponding score of this party.
    Return a dataframe with the proper format.
    """
    res = {
        "date": df.date.values,
        "ville": df.ville.values,
        "arrondissement": df.arrondissement.values,
        "Exprimés": df["Exprimés"].values,
    }
    res.update({nuance: [] for nuance in nuances_set})

    if not df.filter(like="Code Nuance").columns.empty:
        nuances_lbls = df.filter(like="Code Nuance").columns
    elif not df.filter(like="Nuance").columns.empty:
        nuances_lbls = df.filter(like="Nuance").columns
    elif not df.filter(like="Nuance Liste").columns.empty:
        nuances_lbls = df.filter(like="Nuance Liste").columns
    elif not df.filter(like="Nom").columns.empty:
        nuances_lbls = df.filter(like="Nom").columns
    scores_lbls = df.filter(like="Voix").columns

    # each line is an arrondissement:
    for _, line in df.iterrows():
        tempset = nuances_set.copy()

        # iterate over nuances in line:
        for n, s in zip(nuances_lbls, scores_lbls):
            name = line[n]
            score = line[s]
            if pd.notnull(name):
                # if 1st time we see this nuance in this line:
                if name in tempset:
                    res[name].append(score)
                    tempset.remove(name)
                # if we already saw this nuance in this line:
                else:
                    res[name][-1] += score
        # if nuance still in tempset after iteration, then it's not competing in this arrondissement:
        for nuance in tempset:
            res[nuance].append(np.nan)

    return pd.DataFrame(data=res)


def attribute_parties(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    From a dataframe with general party denomination, attribute colloquial party names and 
    add parties with same nuance.
    Then aggregate the rest of the parties, drop all useless ones, and reorder columns.
    """
    for p in AFFILIATIONS.keys():
        # which candidate represents the party this year?
        intersection = list(nuances_set & set(AFFILIATIONS[p]))
        # take only LFI for farleft, starting in 2012:
        if (
            ("LFG" in intersection)
            or ("FG" in intersection)
            or ("MÉLENCHON" in intersection)
            or ("FI" in intersection)
        ):
            df = df.rename(columns={"LFG": p, "FG": p, "MÉLENCHON": p, "FI": p})
        else:
            # add candidates with same nuance, then drop:
            if len(intersection) >= 2:
                df[p] = df[intersection].sum(axis=1)
                df.drop(intersection, axis=1, inplace=True)
            # rename column of only candidate of this party:
            elif len(intersection) == 1:
                df = df.rename(columns={intersection[0]: p})

    # aggregate other parties:
    core_cols = ["date", "ville", "arrondissement", "Exprimés"] + list(
        AFFILIATIONS.keys()
    )
    rest = df[df.columns.difference(core_cols)]
    df["other"] = rest.sum(axis=1)
    df.drop(rest.columns, axis=1, inplace=True)

    # reorder columns:
    df = df.rename(columns={"Exprimés": "N"})
    df = df.reindex(
        ["date", "ville", "arrondissement", "N"]
        + list(AFFILIATIONS.keys())
        + ["other"],
        axis=1,
    )

    return df

All this is very similar to the formatting we did in `munic_format_results.ipynb`, with just a few tweaks at the beginning to adapt to this election's idiosyncracies:

In [3]:
m20 = pd.read_excel("../data/raw_election_results_1st_round/munic2020-paris-ardmnt.xlsx")
m20["date"] = pd.to_datetime("2020-03-15")
m20["ville"] = m20["Libellé du département"]
m20["arrondissement"] = m20["Libellé de la commune"].str.extract('(\d+)').astype(int)

In [4]:
m20 = select_columns(m20, "Code Nuance")
nuances_set = extract_nuances(m20.filter(like="Code Nuance"))
m20 = format_results(m20, nuances_set)
m20 = attribute_parties(m20, nuances_set)
m20["type"] = "municipale"
m20

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2020-03-15,Paris,1,29762,739.0,10038,3090.0,6271.0,6096.0,328.0,3200.0,municipale
1,2020-03-15,Paris,5,17501,527.0,4448,1997.0,4987.0,3023.0,211.0,2308.0,municipale
2,2020-03-15,Paris,6,11565,142.0,2155,883.0,2623.0,4424.0,168.0,1170.0,municipale
3,2020-03-15,Paris,7,15146,101.0,1731,608.0,3506.0,7678.0,216.0,1306.0,municipale
4,2020-03-15,Paris,8,10292,93.0,1047,591.0,2738.0,3574.0,161.0,2088.0,municipale
5,2020-03-15,Paris,9,18046,465.0,4522,1729.0,6660.0,2841.0,231.0,1598.0,municipale
6,2020-03-15,Paris,10,24420,1154.0,10109,3764.0,3553.0,3368.0,367.0,2105.0,municipale
7,2020-03-15,Paris,11,39829,1905.0,17148,5340.0,5638.0,5147.0,240.0,4411.0,municipale
8,2020-03-15,Paris,12,39078,1905.0,13049,4670.0,6464.0,8935.0,394.0,3661.0,municipale
9,2020-03-15,Paris,13,43728,2663.0,17545,4605.0,5534.0,7460.0,936.0,4985.0,municipale


You'll notice that the dataframe above seems to not have any data for districts 2 to 4. That's because starting in 2020, election results in the first four districts of Paris are now aggregated to form only one district -- don't ask me why, just roll with it. It's no big deal though: to compare with past elections we'll just have to sum the results in the first four districts for elections prior to 2020.

Now we just have to concatenate this dataframe to the dataframe of past election results. At the time I wrote the city-council model (early 2020), the 2019 European elections were my test dataset, so I didn't include them in the big dataframe of past results. This doesn't make sense now that the 2020 city-council elections have passed. So, I'm adding this election results to the file, as well as the 2020 results:

In [5]:
d = pd.read_excel("../data/results_by_districts_paris.xlsx", index_col=0)

euro2019 = pd.read_excel(
    "../data/raw_election_results_1st_round/euro2019-districts.xlsx"
).rename(columns={"district": "arrondissement"})
euro2019["date"] = pd.to_datetime("2019-05-25")
euro2019["ville"] = "Paris"
euro2019["type"] = "european"

d = (
    pd.concat([d, euro2019, m20], axis=0, sort=False)
    .sort_values(["arrondissement", "date"])
    .reset_index(drop=True)
)
d

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2007-04-22,Paris,1,9152,239.0,2530,205.0,2051.0,3595.0,418.0,114.0,president
1,2007-06-10,Paris,1,6825,243.0,0,1969.0,818.0,3430.0,158.0,207.0,legislative
2,2008-03-09,Paris,1,6127,75.0,2289,439.0,531.0,2641.0,152.0,0.0,municipale
3,2009-06-07,Paris,1,5212,275.0,665,1493.0,419.0,1808.0,127.0,425.0,european
4,2010-03-14,Paris,1,4843,295.0,1077,1038.0,177.0,1758.0,261.0,237.0,regional
...,...,...,...,...,...,...,...,...,...,...,...,...
272,2015-12-06,Paris,20,49130,6100.0,18315,7348.0,,8340.0,4977.0,4050.0,regional
273,2017-04-23,Paris,20,89574,28512.0,12469,,27399.0,11451.0,5305.0,4438.0,president
274,2017-06-11,Paris,20,57413,11546.0,10700,7766.0,6505.0,4300.0,2446.0,14150.0,legislative
275,2019-05-25,Paris,20,59313,5493.0,6187,14684.0,12721.0,2556.0,4430.0,13242.0,european


And let's finish by saving these beautiful data:

In [6]:
d.to_excel("../data/results_by_districts_paris.xlsx")

In [7]:
%load_ext watermark
%watermark -a AlexAndorra -n -u -v -iv

numpy  1.19.1
pandas 1.0.5
AlexAndorra 
last updated: Fri Jul 24 2020 

CPython 3.8.5
IPython 7.16.1
