This notebook formats results for 2008 and 2014 city council elections in Paris, as well as all higher-level elections (president, legislative, european, regional) between 2006 and 2017. We start in 2006 because our district-level predictors start at this date. We can then use these data to try to model election results in Paris at the district level ("arrondissement") - we'll do that in another notebook.

The most recent elections as of this writing are the 2019 European elections. They will be our test dataset, which is why we will handle the extraction and formatting of the associated polls, results and predictors in a separate notebook.

In [1]:
import numpy as np
import pandas as pd

from typing import Dict, Set

AFFILIATIONS = {
    "farleft": [
        "LFG",
        "FG",
        "LEXG",
        "LCOP",
        "MÉLENCHON",
        "FI",
        "EXG",
        "COM",
        "BESANCENOT",
        "BUFFET",
        "LAGUILLER",
    ],
    "left": ["LUG", "LSOC", "HAMON", "RDG", "SOC", "ROYAL", "HOLLANDE"],
    "green": ["LVEC", "VEC", "ECO", "BOVÉ", "VOYNET", "JOLY"],
    "center": ["LCMD", "MACRON", "MDM", "REM", "CEN", "NCE", "LUC", "UDFD", "BAYROU"],
    "right": [
        "LUMP",
        "LUDI",
        "LUD",
        "LMAJ",
        "FILLON",
        "LR",
        "UDI",
        "UMP",
        "MAJ",
        "SARKOZY",
    ],
    "farright": ["LFN", "LE PEN", "FN"],
}

ELECTIONS = {
    "euro2009": {
        "date": "2009-06-07",
        "file": "euro2009.xls",
        "sheet": "Cantons",
        "denom": "Nuance Liste",
        "type": "european",
    },
    "euro2014": {
        "date": "2014-05-25",
        "file": "euro2014.xlsx",
        "sheet": "Cantons",
        "denom": "Nuance Liste",
        "type": "european",
    },
    "legis2007": {
        "date": "2007-06-10",
        "file": "leg2007.xls",
        "sheet": "Cantons T1",
        "denom": "Nuance",
        "type": "legislative",
    },
    "legis2012": {
        "date": "2012-06-10",
        "file": "leg2012.xls",
        "sheet": "Cantons T1",
        "denom": "Code Nuance",
        "type": "legislative",
    },
    "legis2017": {
        "date": "2017-06-11",
        "file": "leg2017.xlsx",
        "sheet": "Cantons T1",
        "denom": "Code Nuance",
        "type": "legislative",
    },
    "presid2007": {
        "date": "2007-04-22",
        "file": "pres2007.xls",
        "sheet": "Cantons T1",
        "denom": "Nom",
        "type": "president",
    },
    "presid2012": {
        "date": "2012-04-22",
        "file": "pres2012.xls",
        "sheet": "Cantons T1",
        "denom": "Nom",
        "type": "president",
    },
    "presid2017": {
        "date": "2017-04-23",
        "file": "pres2017.xls",
        "sheet": "Canton Tour 1",
        "denom": "Nom",
        "type": "president",
    },
    "regio2010": {
        "date": "2010-03-14",
        "file": "reg2010.xls",
        "sheet": "Cantons T1",
        "denom": "Nuance Liste",
        "type": "regional",
    },
    "regio2015": {
        "date": "2015-12-06",
        "file": "reg2015.xlsx",
        "sheet": "Cantons",
        "denom": "Nuance Liste",
        "type": "regional",
    },
}

First, here are some helper functions:

In [2]:
def extract_nuances(nuances_df: pd.DataFrame) -> Set[str]:
    """
    Extract the nuances competing in this election.
    From the dataframe of nuances, we check each column for each line. 
    If the cell is not empty and the nuance is not already counted, we add it to the set of nuances.
    """
    nuances_set = set()

    for _, line in nuances_df.iterrows():
        for col in nuances_df.columns:
            if pd.notnull(line[col]):
                nuances_set.update({line[col]})

    return nuances_set


def format_results(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    Take the raw df, for each line switch the nuance's label to column name, 
    and match it with the corresponding score of this party.
    Return a dataframe with the proper format.
    """
    res = {
        "date": df.date.values,
        "ville": df.ville.values,
        "arrondissement": df.arrondissement.values,
        "Exprimés": df["Exprimés"].values,
    }
    res.update({nuance: [] for nuance in nuances_set})

    if not df.filter(like="Code Nuance").columns.empty:
        nuances_lbls = df.filter(like="Code Nuance").columns
    elif not df.filter(like="Nuance").columns.empty:
        nuances_lbls = df.filter(like="Nuance").columns
    elif not df.filter(like="Nuance Liste").columns.empty:
        nuances_lbls = df.filter(like="Nuance Liste").columns
    elif not df.filter(like="Nom").columns.empty:
        nuances_lbls = df.filter(like="Nom").columns
    scores_lbls = df.filter(like="Voix").columns

    # each line is an arrondissement:
    for _, line in df.iterrows():
        tempset = nuances_set.copy()

        # iterate over nuances in line:
        for n, s in zip(nuances_lbls, scores_lbls):
            name = line[n]
            score = line[s]
            if pd.notnull(name):
                # if 1st time we see this nuance in this line:
                if name in tempset:
                    res[name].append(score)
                    tempset.remove(name)
                # if we already saw this nuance in this line:
                else:
                    res[name][-1] += score
        # if nuance still in tempset after iteration, then it's not competing in this arrondissement:
        for nuance in tempset:
            res[nuance].append(np.nan)

    return pd.DataFrame(data=res)


def attribute_parties(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    From a dataframe with general party denomination, attribute colloquial party names and 
    add parties with same nuance.
    Then aggregate the rest of the parties, drop all useless ones, and reorder columns.
    """
    for p in AFFILIATIONS.keys():
        # which candidate represents the party this year?
        intersection = list(nuances_set & set(AFFILIATIONS[p]))
        # take only LFI for farleft, starting in 2012:
        if (
            ("LFG" in intersection)
            or ("FG" in intersection)
            or ("MÉLENCHON" in intersection)
            or ("FI" in intersection)
        ):
            df = df.rename(columns={"LFG": p, "FG": p, "MÉLENCHON": p, "FI": p})
        else:
            # add candidates with same nuance, then drop:
            if len(intersection) >= 2:
                df[p] = df[intersection].sum(axis=1)
                df.drop(intersection, axis=1, inplace=True)
            # rename column of only candidate of this party:
            elif len(intersection) == 1:
                df = df.rename(columns={intersection[0]: p})

    # aggregate other parties:
    core_cols = ["date", "ville", "arrondissement", "Exprimés"] + list(
        AFFILIATIONS.keys()
    )
    rest = df[df.columns.difference(core_cols)]
    df["other"] = rest.sum(axis=1)
    df.drop(rest.columns, axis=1, inplace=True)

    # reorder columns:
    df = df.rename(columns={"Exprimés": "N"})
    df = df.reindex(
        ["date", "ville", "arrondissement", "N"]
        + list(AFFILIATIONS.keys())
        + ["other"],
        axis=1,
    )

    return df

### Municipales 2014
Let's begin by formatting 2014 city council election results - as they are already at the district level ("arrondissement") it will be easier.

In [3]:
m14 = pd.read_excel("data/raw_election_results_1st_round/munic2014-ardmnt.xlsx")
m14["date"] = pd.to_datetime("2014-03-23")
m14["ville"], _, m14["arrondissement"] = m14["Libellé de la commune"].str.split().str
m14["arrondissement"] = m14["arrondissement"].astype(int)
m14 = m14.sort_values(["ville", "arrondissement"])
m14.head()

Unnamed: 0,Date de l'export,Code du département,Type de scrutin,Libellé du département,Code de la commune,Libellé de la commune,Inscrits,Abstentions,% Abs/Ins,Votants,...,Liste.10,Sièges / Elu.10,Sièges Secteur.10,Sièges CC.10,Voix.10,% Voix/Ins.10,% Voix/Exp.10,date,ville,arrondissement
0,2014-03-25 12:52:00,69,LI2,RHONE,123SR01,Lyon secteur 1,16482,6936,42.08,9546,...,,,,,,,,2014-03-23,Lyon,1
1,2014-03-25 12:52:00,69,LI2,RHONE,123SR02,Lyon secteur 2,16863,6658,39.48,10205,...,,,,,,,,2014-03-23,Lyon,2
2,2014-03-25 12:52:00,69,LI2,RHONE,123SR03,Lyon secteur 3,52133,22494,43.15,29639,...,,,,,,,,2014-03-23,Lyon,3
3,2014-03-25 12:52:00,69,LI2,RHONE,123SR04,Lyon secteur 4,22557,9096,40.32,13461,...,,,,,,,,2014-03-23,Lyon,4
4,2014-03-25 12:52:00,69,LI2,RHONE,123SR05,Lyon secteur 5,28373,11724,41.32,16649,...,,,,,,,,2014-03-23,Lyon,5


The main difficulty lies in the fact that party labels (what we'll call "nuances") and party results are separated (they each have their dedicated column). Our first task is then to match each label with its results. Let's begin by isolating the useful columns:

In [4]:
subset = ["date", "ville", "arrondissement", "Exprimés"]
for n, s in zip(
    m14.filter(like="Code Nuance").columns,
    m14.columns[m14.columns.str.startswith("Voix")],
):
    subset.append(n)
    subset.append(s)
m14 = m14[subset]
m14.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Code Nuance,Voix,Code Nuance.1,Voix.1,Code Nuance.2,Voix.2,...,Code Nuance.6,Voix.6,Code Nuance.7,Voix.7,Code Nuance.8,Voix.8,Code Nuance.9,Voix.9,Code Nuance.10,Voix.10
0,2014-03-23,Lyon,1,9433,LEXG,86,LFG,3156,LSOC,2447,...,LFN,583.0,,,,,,,,
1,2014-03-23,Lyon,2,10055,LFG,487,LSOC,2737,LVEC,609,...,,,,,,,,,,
2,2014-03-23,Lyon,3,29134,LFG,1579,LSOC,11256,LVEC,2854,...,LFN,3603.0,,,,,,,,
3,2014-03-23,Lyon,4,13199,LEXG,123,LFG,1323,LSOC,4522,...,LDIV,375.0,LUD,3493.0,LFN,1131.0,,,,
4,2014-03-23,Lyon,5,16405,LEXG,154,LFG,752,LSOC,5954,...,LFN,1857.0,,,,,,,,


The functions `extract_nuances` and `format_results` are designed to do just that: extract the unique nuances competing in this election, and then match each nuance with the corresponding score of the party:

In [5]:
nuances_set = extract_nuances(m14.filter(like="Code Nuance"))
m14 = format_results(m14, nuances_set)

m14["LFG"] = m14["LFG"].fillna(m14["LPG"])  # same party
m14 = m14.drop("LPG", axis=1)
m14.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,LDIV,LVEC,LFG,LDVD,LUMP,LDVG,LSOC,LFN,LEXG,LUD,LUG,LUDI
0,2014-03-23,Lyon,1,9433,293.0,1064.0,3156.0,,,,2447.0,583.0,86.0,1804.0,,
1,2014-03-23,Lyon,2,10055,325.0,609.0,487.0,,,,2737.0,1159.0,,4738.0,,
2,2014-03-23,Lyon,3,29134,1681.0,2854.0,1579.0,,,,11256.0,3603.0,,8161.0,,
3,2014-03-23,Lyon,4,13199,630.0,1567.0,1323.0,,,410.0,4522.0,1131.0,123.0,3493.0,,
4,2014-03-23,Lyon,5,16405,498.0,1340.0,752.0,,,,5954.0,1857.0,154.0,5850.0,,


Now we have to attribute each general party denomination to its colloquial party name. However, some parties may have the same ideological leaning, or they have made alliance, so we have to add them together. The function `attribute_parties` takes care of it, and then aggregates the rest of the parties into the category "other". Finally, it drops all useless parties:

In [6]:
m14 = attribute_parties(m14, nuances_set)
m14["type"] = "municipale"
m14

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2014-03-23,Lyon,1,9433,3156.0,2447.0,1064.0,,1804.0,583.0,379.0,municipale
1,2014-03-23,Lyon,2,10055,487.0,2737.0,609.0,,4738.0,1159.0,325.0,municipale
2,2014-03-23,Lyon,3,29134,1579.0,11256.0,2854.0,,8161.0,3603.0,1681.0,municipale
3,2014-03-23,Lyon,4,13199,1323.0,4522.0,1567.0,,3493.0,1131.0,1163.0,municipale
4,2014-03-23,Lyon,5,16405,752.0,5954.0,1340.0,,5850.0,1857.0,652.0,municipale
5,2014-03-23,Lyon,6,17920,561.0,4801.0,1110.0,,8971.0,1867.0,610.0,municipale
6,2014-03-23,Lyon,7,19902,1543.0,7724.0,2165.0,,4746.0,2597.0,1127.0,municipale
7,2014-03-23,Lyon,8,18543,1008.0,7473.0,1435.0,,4303.0,3421.0,903.0,municipale
8,2014-03-23,Lyon,9,12220,686.0,5581.0,921.0,,2701.0,1684.0,647.0,municipale
9,2014-03-23,Marseille,1,23480,2108.0,6331.0,,,9063.0,3526.0,2452.0,municipale


### Municipales 2008

And with that, we're done formatting 2014 city council elections. Let's now turn to the 2008 election results, which present the same structure, but are disaggregated at the ballot box level - so we must first aggregate them at the district level:

In [7]:
m8 = pd.read_excel("data/raw_election_results_1st_round/munic2008-bdv.xlsx")
m8 = m8.rename(columns={"Libellé de la commune": "ville"})
m8 = m8[m8.ville == "Paris"].sort_values(["Code du b.vote"])
m8["date"] = pd.to_datetime("2008-03-09")

# Retrieve arrondissement from ballot-box number:
m8["arrondissement"], _ = (m8["Code du b.vote"] / 100).astype(str).str.split(".").str
m8["arrondissement"] = m8["arrondissement"].astype(int)
m8.head()

Unnamed: 0,Date de l'export,Code du departement,Libelle du departement,Code de la commune,ville,Code du b.vote,Inscrits,Abstentions,% Abs/Ins,Votants,...,Sexe.11,Nom.11,Prenom.11,Liste.11,Sieges.11,Voix.11,% Voix/Ins.11,% Voix/Exp.11,date,arrondissement
0,2008-05-23 16:09:14,75,PARIS,56,Paris,101,1047,397,37.92,650,...,,,,,,,,,2008-03-09,1
1,2008-05-23 16:09:15,75,PARIS,56,Paris,102,887,352,39.68,535,...,,,,,,,,,2008-03-09,1
2,2008-05-23 16:09:16,75,PARIS,56,Paris,103,1393,607,43.58,786,...,,,,,,,,,2008-03-09,1
3,2008-05-23 16:09:16,75,PARIS,56,Paris,104,1285,535,41.63,750,...,,,,,,,,,2008-03-09,1
4,2008-05-23 16:09:16,75,PARIS,56,Paris,105,1000,378,37.8,622,...,,,,,,,,,2008-03-09,1


The nuances are the same for all ballot boxes in each arrondissement, so when we group the data by arrondissement, we can just take the nuances present at the first ballot-box:

In [8]:
nuances_lbls = m8.filter(like="Code Nuance").columns.tolist()
nuances_df = m8[["date", "ville", "arrondissement"] + nuances_lbls]
nuances_df = nuances_df.groupby("arrondissement").first()

However, the scores of each nuance do change from one ballot box to another. When grouping by arrondissement, we must then sum all of the scores, for each party:

In [9]:
scores_lbls = m8.columns[m8.columns.str.startswith("Voix")].tolist()
scores_df = m8[["arrondissement", "Exprimés"] + scores_lbls]
scores_df = scores_df.groupby("arrondissement").sum()

Now we just have to join those two dataframes and we'll get each nuance and its score, aggregated at the district level:

In [10]:
m8 = nuances_df.join(scores_df).reset_index()
# reorder columns:
reorder = ["date", "ville", "arrondissement", "Exprimés"]
for n, s in zip(nuances_lbls, scores_lbls):
    reorder.append(n)
    reorder.append(s)
m8 = m8[reorder]
m8.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Code Nuance,Voix,Code Nuance.1,Voix.1,Code Nuance.2,Voix.2,...,Code Nuance.7,Voix.7,Code Nuance.8,Voix.8,Code Nuance.9,Voix.9,Code Nuance.10,Voix.10,Code Nuance.11,Voix.11
0,2008-03-09,Paris,1,6127,LEXG,75,LSOC,2289,LVEC,439,...,,0.0,,0.0,,0.0,,0.0,,0.0
1,2008-03-09,Paris,2,6736,LEXG,90,LSOC,2231,LVEC,2016,...,,0.0,,0.0,,0.0,,0.0,,0.0
2,2008-03-09,Paris,3,11974,LEXG,92,LEXG,133,LSOC,6685,...,,0.0,,0.0,,0.0,,0.0,,0.0
3,2008-03-09,Paris,4,10573,LEXG,151,LSOC,5127,LVEC,834,...,,0.0,,0.0,,0.0,,0.0,,0.0
4,2008-03-09,Paris,5,23614,LEXG,125,LEXG,563,LSOC,8187,...,LDVD,140.0,LFN,418.0,,0.0,,0.0,,0.0


The data are now in the same format as the 2014 results, so we can have the same workflow: extract the unique nuances competing in this election, match each nuance with its score, and attribute each nuance to its colloquial party name:

In [11]:
nuances_set = extract_nuances(m8.filter(like="Code Nuance"))
m8 = format_results(m8, nuances_set)
m8 = attribute_parties(m8, nuances_set)
m8["type"] = "municipale"
m8

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2008-03-09,Paris,1,6127,75.0,2289.0,439.0,531.0,2641.0,152.0,0.0,municipale
1,2008-03-09,Paris,2,6736,90.0,2231.0,2016.0,621.0,1543.0,167.0,68.0,municipale
2,2008-03-09,Paris,3,11974,225.0,6685.0,1237.0,1111.0,2458.0,258.0,0.0,municipale
3,2008-03-09,Paris,4,10573,151.0,5127.0,834.0,863.0,3312.0,286.0,0.0,municipale
4,2008-03-09,Paris,5,23614,688.0,8187.0,1287.0,3385.0,8958.0,418.0,691.0,municipale
5,2008-03-09,Paris,6,15488,,5166.0,590.0,1530.0,7269.0,356.0,577.0,municipale
6,2008-03-09,Paris,7,17967,,4080.0,535.0,2819.0,8894.0,537.0,1102.0,municipale
7,2008-03-09,Paris,8,12325,,2302.0,318.0,808.0,4119.0,293.0,4485.0,municipale
8,2008-03-09,Paris,9,20643,602.0,10163.0,1299.0,1659.0,6353.0,567.0,0.0,municipale
9,2008-03-09,Paris,10,28359,2158.0,13766.0,2564.0,2348.0,4513.0,837.0,2173.0,municipale


### Other elections

Now let's load and format the district-level results for the higher-level elections. Note that the variable `Code du canton` [indicates the district for Paris](https://fr.geneawiki.com/index.php/Cantons_de_Paris)).

As the formatting is almost the same for all files, let's write some handy functions:

In [12]:
def load_and_clean(election: Dict[str, Dict[str, str]], header=0) -> pd.DataFrame:
    """
    Load file for given election, select only Paris, add election date and label districts (aka arrondissements).
    """
    df = pd.read_excel(
        f"data/raw_election_results_1st_round/{election['file']}",
        header=header,
        sheet_name=election["sheet"],
    )
    df["Code du département"] = df["Code du département"].astype(str)
    df = df[df["Code du département"] == "75"].reset_index(drop=True)
    df["ville"] = "Paris"
    df["date"] = pd.to_datetime(election["date"])
    df["arrondissement"] = range(1, 21)

    return df


def select_columns(df: pd.DataFrame, party_col: str) -> pd.DataFrame:
    """
    party_col: either 'Nom', 'Nuance' or Code Nuance'.
    """
    subset = ["date", "ville", "arrondissement", "Exprimés"]
    for n, s in zip(
        df.filter(like=party_col).columns, df.columns[df.columns.str.startswith("Voix")]
    ):
        subset.append(n)
        subset.append(s)

    return df[subset]

These two functions basically turn the raw file into a format that we can give to the `format_results` function from the beginning. Let's detail how we do it for the 2017 presidential elections, and then we'll do all the elections in one pass:

In [13]:
p = load_and_clean(ELECTIONS["presid2017"])
p = select_columns(p, ELECTIONS["presid2017"]["denom"])
p.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Nom,Voix,Nom.1,Voix.1,Nom.2,Voix.2,...,Nom.6,Voix.6,Nom.7,Voix.7,Nom.8,Voix.8,Nom.9,Voix.9,Nom.10,Voix.10
0,2017-04-23,Paris,1,9026,MACRON,3561,FILLON,2831,MÉLENCHON,1231,...,ASSELINEAU,58,LASSALLE,57,POUTOU,32,ARTHAUD,15,CHEMINADE,11
1,2017-04-23,Paris,2,11292,MACRON,5014,FILLON,2640,MÉLENCHON,1802,...,ASSELINEAU,81,POUTOU,49,LASSALLE,46,ARTHAUD,17,CHEMINADE,17
2,2017-04-23,Paris,3,18485,MACRON,8325,FILLON,3994,MÉLENCHON,3078,...,ASSELINEAU,100,POUTOU,92,LASSALLE,73,ARTHAUD,47,CHEMINADE,15
3,2017-04-23,Paris,4,15106,MACRON,6182,FILLON,3956,MÉLENCHON,2329,...,ASSELINEAU,96,LASSALLE,83,POUTOU,82,ARTHAUD,36,CHEMINADE,23
4,2017-04-23,Paris,5,31008,MACRON,12316,FILLON,8273,MÉLENCHON,4960,...,ASSELINEAU,222,LASSALLE,178,POUTOU,170,ARTHAUD,63,CHEMINADE,39


In [14]:
nuances_set = extract_nuances(p.filter(like=ELECTIONS["presid2017"]["denom"]))
p = format_results(p, nuances_set)
p = attribute_parties(p, nuances_set)
p["type"] = ELECTIONS["presid2017"]["type"]
p

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2017-04-23,Paris,1,9026,1231,659,,3561,2831,443,301,president
1,2017-04-23,Paris,2,11292,1802,1099,,5014,2640,399,338,president
2,2017-04-23,Paris,3,18485,3078,1963,,8325,3994,615,510,president
3,2017-04-23,Paris,4,15106,2329,1370,,6182,3956,735,534,president
4,2017-04-23,Paris,5,31008,4960,3103,,12316,8273,1225,1131,president
5,2017-04-23,Paris,6,22332,2038,1419,,8729,8769,719,658,president
6,2017-04-23,Paris,7,27798,1552,1068,,8785,14650,1064,679,president
7,2017-04-23,Paris,8,20698,1392,849,,6568,10448,916,525,president
8,2017-04-23,Paris,9,32940,4783,3163,,14029,8879,1092,994,president
9,2017-04-23,Paris,10,44766,11396,6343,,16880,6724,1817,1606,president


Got it? Let's do the same thing for all the elections at the same time now:

In [15]:
results = []
for election in ELECTIONS.values():
    df = load_and_clean(election)
    df = select_columns(df, election["denom"])

    nuances_set = extract_nuances(df.filter(like=election["denom"]))
    df = format_results(df, nuances_set)
    df = attribute_parties(df, nuances_set)
    df["type"] = election["type"]

    results.append(df)

results = (
    pd.concat(results).sort_values(["date", "arrondissement"]).reset_index(drop=True)
)
results

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2007-04-22,Paris,1,9152,239.0,2530.0,205.0,2051.0,3595.0,418.0,114.0,president
1,2007-04-22,Paris,2,10461,331.0,3642.0,337.0,2398.0,3280.0,360.0,113.0,president
2,2007-04-22,Paris,3,18198,606.0,6729.0,515.0,4331.0,5316.0,545.0,156.0,president
3,2007-04-22,Paris,4,15534,471.0,4967.0,377.0,3545.0,5287.0,663.0,224.0,president
4,2007-04-22,Paris,5,31937,998.0,10099.0,760.0,7803.0,10615.0,1247.0,415.0,president
...,...,...,...,...,...,...,...,...,...,...,...,...
195,2017-06-11,Paris,16,51079,1161.0,1157.0,803.0,22759.0,19453.0,1675.0,4071.0,legislative
196,2017-06-11,Paris,17,56199,4142.0,4227.0,2040.0,26291.0,14701.0,1655.0,3143.0,legislative
197,2017-06-11,Paris,18,54371,9339.0,8969.0,6474.0,6828.0,12097.0,2039.0,8625.0,legislative
198,2017-06-11,Paris,19,50024,9798.0,4488.0,5463.0,18429.0,3790.0,2129.0,5927.0,legislative


And now we just have to concatenate the dataframes and save the formatted results. After that, we'll start working on our model, yay!

In [16]:
pd.concat([m8, m14[m14.ville == "Paris"], results]).sort_values(
    ["arrondissement", "date"]
).reset_index(drop=True).to_excel("data/results_by_districts_paris.xlsx")

In [17]:
%load_ext watermark
%watermark -a AlexAndorra -n -u -v -iv

numpy  1.17.3
pandas 0.25.2
AlexAndorra 
last updated: Wed Jan 29 2020 

CPython 3.7.5
IPython 7.9.0
