This notebook formats results for 2008 and 2014 city council elections, as well as presidential and legislative 2017 elections, for the 6 main parties. We can then use these data to model election results in Paris at the district level ("arrondissement") - we'll do that in another notebook.

In [8]:
%load_ext lab_black
%load_ext watermark

import numpy as np
import pandas as pd

from typing import Set

AFFILIATIONS = {
    "farleft": ["LFG", "FG", "LEXG", "LCOP", "MÉLENCHON", "FI"],
    "left": ["LUG", "LSOC", "HAMON", "RDG", "SOC"],
    "green": ["LVEC", "VEC", "ECO"],
    "center": ["LCMD", "MACRON", "MDM", "REM", "CEN", "NCE", "LUC"],
    "right": ["LUMP", "LUDI", "LUD", "LMAJ", "FILLON", "LR", "UDI", "UMP"],
    "farright": ["LFN", "LE PEN", "FN"],
}

DATES_ELECTIONS = {
    "euro2009": "2009-06-07",
    "euro2014": "2014-05-25",
    "legis2007": "2007-06-10",
    "legis2012": "2012-06-10",
    "legis2017": "2017-06-11",
    "presid2007": "2007-04-22",
    "presid2012": "2012-04-22",
    "presid2017": "2017-04-23",
    "regio2010": "2010-03-14",
    "regio2015": "2015-12-06",
}

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


First, here are some helper functions:

In [2]:
def extract_nuances(nuances_df: pd.DataFrame) -> Set[str]:
    """
    Extract the nuances competing in this election.
    From the dataframe of nuances, we check each column for each line. 
    If the cell is not empty and the nuance is not already counted, we add it to the set of nuances.
    """
    nuances_set = set()

    for _, line in nuances_df.iterrows():
        for col in nuances_df.columns:
            if pd.notnull(line[col]):
                nuances_set.update({line[col]})

    return nuances_set


def format_results(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    Take the raw df, for each line switch the nuance's label to column name, 
    and match it with the corresponding score of this party.
    Return a dataframe with the proper format.
    """
    res = {
        "date": df.date.values,
        "ville": df.ville.values,
        "arrondissement": df.arrondissement.values,
        "Exprimés": df["Exprimés"].values,
    }
    res.update({nuance: [] for nuance in nuances_set})

    if not df.filter(like="Code Nuance").columns.empty:
        nuances_lbls = df.filter(like="Code Nuance").columns
    elif not df.filter(like="Nuance Liste").columns.empty:
        nuances_lbls = df.filter(like="Nuance Liste").columns
    elif not df.filter(like="Nom").columns.empty:
        nuances_lbls = df.filter(like="Nom").columns
    scores_lbls = df.filter(like="Voix").columns

    # each line is an arrondissement:
    for _, line in df.iterrows():
        tempset = nuances_set.copy()

        # iterate over nuances in line:
        for n, s in zip(nuances_lbls, scores_lbls):
            name = line[n]
            score = line[s]
            if pd.notnull(name):
                # if 1st time we see this nuance in this line:
                if name in tempset:
                    res[name].append(score)
                    tempset.remove(name)
                # if we already saw this nuance in this line:
                else:
                    res[name][-1] += score
        # if nuance still in tempset after iteration, then it's not competing in this arrondissement:
        for nuance in tempset:
            res[nuance].append(np.nan)

    return pd.DataFrame(data=res)


def attribute_parties(df: pd.DataFrame, nuances_set: Set[str]) -> pd.DataFrame:
    """
    From a dataframe with general party denomination, attribute colloquial party names and 
    add parties with same nuance.
    Then aggregate the rest of the parties, drop all useless ones, and reorder columns.
    """
    for p in AFFILIATIONS.keys():
        # which candidate represents the party this year?
        intersection = list(nuances_set & set(AFFILIATIONS[p]))
        # take only LFI for farleft, starting in 2012:
        if ("LFG" in intersection) or ("FG" in intersection):
            df = df.rename(columns={"LFG": p, "FG": p})
        else:
            # add candidates with same nuance, then drop:
            if len(intersection) >= 2:
                df[p] = df[intersection].sum(axis=1)
                df.drop(intersection, axis=1, inplace=True)
            # rename column of only candidate of this party:
            elif len(intersection) == 1:
                df = df.rename(columns={intersection[0]: p})

    # aggregate other parties:
    core_cols = ["date", "ville", "arrondissement", "Exprimés"] + list(
        AFFILIATIONS.keys()
    )
    rest = df[df.columns.difference(core_cols)]
    df["other"] = rest.sum(axis=1)
    df.drop(rest.columns, axis=1, inplace=True)

    # reorder columns:
    df = df.rename(columns={"Exprimés": "N"})
    df = df.reindex(
        ["date", "ville", "arrondissement", "N"]
        + list(AFFILIATIONS.keys())
        + ["other"],
        axis=1,
    )

    return df

### Municipales 2014
Let's begin by formatting 2014 city council election results - as they are already at the district level ("arrondissement") it will be easier.

In [3]:
d = pd.read_excel("data/election_results_1st_round/munic2014-ardmnt.xlsx")
d["date"] = d["Date de l'export"].dt.normalize()  # only interested in the date
d["ville"], _, d["arrondissement"] = d["Libellé de la commune"].str.split().str
d["arrondissement"] = d["arrondissement"].astype(int)
d = d.sort_values(["ville", "arrondissement"])
d.head()

Unnamed: 0,Date de l'export,Code du département,Type de scrutin,Libellé du département,Code de la commune,Libellé de la commune,Inscrits,Abstentions,% Abs/Ins,Votants,...,Liste.10,Sièges / Elu.10,Sièges Secteur.10,Sièges CC.10,Voix.10,% Voix/Ins.10,% Voix/Exp.10,date,ville,arrondissement
0,2014-03-25 12:52:00,69,LI2,RHONE,123SR01,Lyon secteur 1,16482,6936,42.08,9546,...,,,,,,,,2014-03-25,Lyon,1
1,2014-03-25 12:52:00,69,LI2,RHONE,123SR02,Lyon secteur 2,16863,6658,39.48,10205,...,,,,,,,,2014-03-25,Lyon,2
2,2014-03-25 12:52:00,69,LI2,RHONE,123SR03,Lyon secteur 3,52133,22494,43.15,29639,...,,,,,,,,2014-03-25,Lyon,3
3,2014-03-25 12:52:00,69,LI2,RHONE,123SR04,Lyon secteur 4,22557,9096,40.32,13461,...,,,,,,,,2014-03-25,Lyon,4
4,2014-03-25 12:52:00,69,LI2,RHONE,123SR05,Lyon secteur 5,28373,11724,41.32,16649,...,,,,,,,,2014-03-25,Lyon,5


The main difficulty lies in the fact that party labels (what we'll call "nuances") and party results are separated (they each have their dedicated column). Our first task is then to match each label with its results. Let's begin by isolating the useful columns:

In [4]:
subset = ["date", "ville", "arrondissement", "Exprimés"]
for n, s in zip(
    d.filter(like="Code Nuance").columns, d.columns[d.columns.str.startswith("Voix")]
):
    subset.append(n)
    subset.append(s)
d = d[subset]
d.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Code Nuance,Voix,Code Nuance.1,Voix.1,Code Nuance.2,Voix.2,...,Code Nuance.6,Voix.6,Code Nuance.7,Voix.7,Code Nuance.8,Voix.8,Code Nuance.9,Voix.9,Code Nuance.10,Voix.10
0,2014-03-25,Lyon,1,9433,LEXG,86,LFG,3156,LSOC,2447,...,LFN,583.0,,,,,,,,
1,2014-03-25,Lyon,2,10055,LFG,487,LSOC,2737,LVEC,609,...,,,,,,,,,,
2,2014-03-25,Lyon,3,29134,LFG,1579,LSOC,11256,LVEC,2854,...,LFN,3603.0,,,,,,,,
3,2014-03-25,Lyon,4,13199,LEXG,123,LFG,1323,LSOC,4522,...,LDIV,375.0,LUD,3493.0,LFN,1131.0,,,,
4,2014-03-25,Lyon,5,16405,LEXG,154,LFG,752,LSOC,5954,...,LFN,1857.0,,,,,,,,


The functions `extract_nuances` and `format_results` are designed to do just that: extract the unique nuances competing in this election, and then match each nuance with the corresponding score of the party:

In [5]:
nuances_set = extract_nuances(d.filter(like="Code Nuance"))
d = format_results(d, nuances_set)

d["LFG"] = d["LFG"].fillna(d["LPG"])  # same party
d = d.drop("LPG", axis=1)
d.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,LDVG,LVEC,LFN,LFG,LEXG,LUD,LSOC,LUDI,LUMP,LDIV,LDVD,LUG
0,2014-03-25,Lyon,1,9433,,1064.0,583.0,3156.0,86.0,1804.0,2447.0,,,293.0,,
1,2014-03-25,Lyon,2,10055,,609.0,1159.0,487.0,,4738.0,2737.0,,,325.0,,
2,2014-03-25,Lyon,3,29134,,2854.0,3603.0,1579.0,,8161.0,11256.0,,,1681.0,,
3,2014-03-25,Lyon,4,13199,410.0,1567.0,1131.0,1323.0,123.0,3493.0,4522.0,,,630.0,,
4,2014-03-25,Lyon,5,16405,,1340.0,1857.0,752.0,154.0,5850.0,5954.0,,,498.0,,


Now we have to attribute each general party denomination to its colloquial party name. However, some parties may have the same ideological leaning, or they have made alliance, so we have to add them together. The function `attribute_parties` takes care of it, and then aggregates the rest of the parties into the category "other". Finally, it drops all useless parties:

In [6]:
d = attribute_parties(d, nuances_set)
d["type"] = "municipale"
d

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2014-03-25,Lyon,1,9433,3156.0,2447.0,1064.0,,1804.0,583.0,379.0,municipale
1,2014-03-25,Lyon,2,10055,487.0,2737.0,609.0,,4738.0,1159.0,325.0,municipale
2,2014-03-25,Lyon,3,29134,1579.0,11256.0,2854.0,,8161.0,3603.0,1681.0,municipale
3,2014-03-25,Lyon,4,13199,1323.0,4522.0,1567.0,,3493.0,1131.0,1163.0,municipale
4,2014-03-25,Lyon,5,16405,752.0,5954.0,1340.0,,5850.0,1857.0,652.0,municipale
5,2014-03-25,Lyon,6,17920,561.0,4801.0,1110.0,,8971.0,1867.0,610.0,municipale
6,2014-03-25,Lyon,7,19902,1543.0,7724.0,2165.0,,4746.0,2597.0,1127.0,municipale
7,2014-03-25,Lyon,8,18543,1008.0,7473.0,1435.0,,4303.0,3421.0,903.0,municipale
8,2014-03-25,Lyon,9,12220,686.0,5581.0,921.0,,2701.0,1684.0,647.0,municipale
9,2014-03-25,Marseille,1,23480,2108.0,6331.0,,,9063.0,3526.0,2452.0,municipale


### Municipales 2008

And with that, we're done formatting 2014 city council elections. Let's now turn to the 2008 election results, which present the same structure, but are disaggregated at the ballot box level - so we must first aggregate them at the district level:

In [7]:
df = pd.read_excel("data/election_results_1st_round/munic2008-bdv.xlsx")
df = df.rename(columns={"Libellé de la commune": "ville"})
df = df[df.ville == "Paris"].sort_values(["Code du b.vote"])
df["date"] = df["Date de l'export"].dt.normalize()

# Retrieve arrondissement from ballot-box number:
df["arrondissement"], _ = (df["Code du b.vote"] / 100).astype(str).str.split(".").str
df["arrondissement"] = df["arrondissement"].astype(int)
df.head()

Unnamed: 0,Date de l'export,Code du departement,Libelle du departement,Code de la commune,ville,Code du b.vote,Inscrits,Abstentions,% Abs/Ins,Votants,...,Sexe.11,Nom.11,Prenom.11,Liste.11,Sieges.11,Voix.11,% Voix/Ins.11,% Voix/Exp.11,date,arrondissement
0,2008-05-23 16:09:14,75,PARIS,56,Paris,101,1047,397,37.92,650,...,,,,,,,,,2008-05-23,1
1,2008-05-23 16:09:15,75,PARIS,56,Paris,102,887,352,39.68,535,...,,,,,,,,,2008-05-23,1
2,2008-05-23 16:09:16,75,PARIS,56,Paris,103,1393,607,43.58,786,...,,,,,,,,,2008-05-23,1
3,2008-05-23 16:09:16,75,PARIS,56,Paris,104,1285,535,41.63,750,...,,,,,,,,,2008-05-23,1
4,2008-05-23 16:09:16,75,PARIS,56,Paris,105,1000,378,37.8,622,...,,,,,,,,,2008-05-23,1


The nuances are the same for all ballot boxes in each arrondissement, so when we group the data by arrondissement, we can just take the nuances present at the first ballot-box:

In [8]:
nuances_lbls = df.filter(like="Code Nuance").columns.tolist()
nuances_df = df[["date", "ville", "arrondissement"] + nuances_lbls]
nuances_df = nuances_df.groupby("arrondissement").first()

However, the scores of each nuance do change from one ballot box to another. When grouping by arrondissement, we must then sum all of the scores, for each party:

In [9]:
scores_lbls = df.columns[df.columns.str.startswith("Voix")].tolist()
scores_df = df[["arrondissement", "Exprimés"] + scores_lbls]
scores_df = scores_df.groupby("arrondissement").sum()

Now we just have to join those two dataframes and we'll get each nuance and its score, aggregated at the district level:

In [10]:
df = nuances_df.join(scores_df).reset_index()
# reorder columns:
reorder = ["date", "ville", "arrondissement", "Exprimés"]
for n, s in zip(nuances_lbls, scores_lbls):
    reorder.append(n)
    reorder.append(s)
df = df[reorder]
df.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Code Nuance,Voix,Code Nuance.1,Voix.1,Code Nuance.2,Voix.2,...,Code Nuance.7,Voix.7,Code Nuance.8,Voix.8,Code Nuance.9,Voix.9,Code Nuance.10,Voix.10,Code Nuance.11,Voix.11
0,2008-05-23,Paris,1,6127,LEXG,75,LSOC,2289,LVEC,439,...,,0.0,,0.0,,0.0,,0.0,,0.0
1,2008-05-23,Paris,2,6736,LEXG,90,LSOC,2231,LVEC,2016,...,,0.0,,0.0,,0.0,,0.0,,0.0
2,2008-05-23,Paris,3,11974,LEXG,92,LEXG,133,LSOC,6685,...,,0.0,,0.0,,0.0,,0.0,,0.0
3,2008-05-23,Paris,4,10573,LEXG,151,LSOC,5127,LVEC,834,...,,0.0,,0.0,,0.0,,0.0,,0.0
4,2008-05-23,Paris,5,23614,LEXG,125,LEXG,563,LSOC,8187,...,LDVD,140.0,LFN,418.0,,0.0,,0.0,,0.0


The data are now in the same format as the 2014 results, so we can have the same workflow: extract the unique nuances competing in this election, match each nuance with its score, and attribute each nuance to its colloquial party name:

In [11]:
nuances_set = extract_nuances(df.filter(like="Code Nuance"))
df = format_results(df, nuances_set)
df = attribute_parties(df, nuances_set)
df["type"] = "municipale"
df

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2008-05-23,Paris,1,6127,75.0,2289.0,439.0,531.0,2641.0,152.0,0.0,municipale
1,2008-05-23,Paris,2,6736,90.0,2231.0,2016.0,621.0,1543.0,167.0,68.0,municipale
2,2008-05-23,Paris,3,11974,225.0,6685.0,1237.0,1111.0,2458.0,258.0,0.0,municipale
3,2008-05-23,Paris,4,10573,151.0,5127.0,834.0,863.0,3312.0,286.0,0.0,municipale
4,2008-05-23,Paris,5,23614,688.0,8187.0,1287.0,3385.0,8958.0,418.0,691.0,municipale
5,2008-05-23,Paris,6,15488,,5166.0,590.0,1530.0,7269.0,356.0,577.0,municipale
6,2008-05-23,Paris,7,17967,,4080.0,535.0,2819.0,8894.0,537.0,1102.0,municipale
7,2008-05-23,Paris,8,12325,,2302.0,318.0,808.0,4119.0,293.0,4485.0,municipale
8,2008-05-23,Paris,9,20643,602.0,10163.0,1299.0,1659.0,6353.0,567.0,0.0,municipale
9,2008-05-23,Paris,10,28359,2158.0,13766.0,2564.0,2348.0,4513.0,837.0,2173.0,municipale


### Other elections

Now let's load and format the district-level results for several elections (2010 regional, 2012 legislative, 2014 european, 2015 regional, 2017 presidential and legislative). Note that the variable `Code du canton` [indicates the district for Paris](https://fr.geneawiki.com/index.php/Cantons_de_Paris)).

As the formatting is almost the same for all files, let's write some handy functions:

In [9]:
df = pd.read_excel(
    f"data/election_results_1st_round/leg2007.xls", sheet_name="Cantons T1"
)  # .rename(columns={"Libellé du département": "ville"})
df = df[df["Code du département"] == "75"].reset_index(drop=True)
df["ville"] = "Paris"
df["date"] = pd.to_datetime(DATES_ELECTIONS["legis2007"])
df["arrondissement"] = range(1, 21)
df

Unnamed: 0,Code du département,Code du canton,Libellé du canton,Inscrits,Abstentions,% Abs/Ins,Votants,% Vot/Ins,Blancs et nuls,Exprimés,...,Nuance.39,Voix.39,% Voix/Exp.39,Unnamed: 252,Unnamed: 253,Unnamed: 254,Unnamed: 255,ville,date,arrondissement
0,75,15,C,10813,3938,36.42,6875,63.58,50,6825,...,,,,,,,,Paris,2007-06-10,1
1,75,16,C,12367,4846,39.18,7521,60.82,45,7476,...,,,,,,,,Paris,2007-06-10,2
2,75,17,C,21496,8055,37.47,13441,62.53,113,13328,...,,,,,,,,Paris,2007-06-10,3
3,75,18,C,18373,6650,36.19,11723,63.81,109,11614,...,,,,,,,,Paris,2007-06-10,4
4,75,19,C,37153,11677,31.43,25476,68.57,232,25244,...,,,,,,,,Paris,2007-06-10,5
5,75,20,C,27475,9716,35.36,17759,64.64,176,17583,...,,,,,,,,Paris,2007-06-10,6
6,75,21,C,32759,11894,36.31,20865,63.69,122,20743,...,,,,,,,,Paris,2007-06-10,7
7,75,22,C,23768,9088,38.24,14680,61.76,115,14565,...,,,,,,,,Paris,2007-06-10,8
8,75,23,C,35833,12799,35.72,23034,64.28,152,22882,...,,,,,,,,Paris,2007-06-10,9
9,75,24,C,49775,19575,39.33,30200,60.67,279,29921,...,,,,,,,,Paris,2007-06-10,10


In [11]:
subset = ["date", "ville", "arrondissement", "Exprimés"]
for n, s in zip(
    df.filter(like="Nuance").columns, df.columns[df.columns.str.startswith("Voix")]
):
    subset.append(n)
    subset.append(s)

df = df[subset]

In [12]:
extract_nuances(df.filter(like="Nuance"))

{'COM',
 'DIV',
 'DVD',
 'DVG',
 'ECO',
 'EXD',
 'EXG',
 'FN',
 'MAJ',
 'MPF',
 'RDG',
 'SOC',
 'UDFD',
 'UMP',
 'VEC'}

In [12]:
def load_and_clean(election: str) -> pd.DataFrame:
    """
    Load file for given election, select only Paris, add election date and label arrondissements.
    """
    header = 0
    sheet = "Cantons"
    if "euro" in election:
        date = "2014-05-25"
    elif "leg" in election:
        sheet = "Cantons T1"
        if "2012" in election:
            date = "2012-06-10"
        elif "2017" in election:
            date = "2017-06-11"
    elif "pres" in election:
        header = 3
        sheet = "Canton Tour 1"
        date = "2017-04-23"
    elif "reg" in election:
        if "2010" in election:
            sheet = "Cantons T1"
            date = "2010-03-14"
        elif "2015" in election:
            date = "2015-12-06"

    df = pd.read_excel(
        f"data/election_results_1st_round/{election}", header=header, sheet_name=sheet
    ).rename(columns={"Libellé du département": "ville"})
    df["ville"] = df.ville.str.title()
    df = df[df.ville == "Paris"].reset_index(drop=True)
    df["date"] = pd.to_datetime(date)
    df["arrondissement"] = range(1, 21)

    return df


def select_columns(df: pd.DataFrame, party_col: str) -> pd.DataFrame:
    """
    party_col: either 'Nom', 'Nuance' or Code Nuance'.
    """
    subset = ["date", "ville", "arrondissement", "Exprimés"]
    for n, s in zip(
        df.filter(like=party_col).columns, df.columns[df.columns.str.startswith("Voix")]
    ):
        subset.append(n)
        subset.append(s)

    return df[subset]

These two functions basically turn the raw file into a format that we can give to the `format_results` function from the beginning. Let's detail how we do it for the 2017 presidential elections, and then we'll do all the other elections in one pass:

In [13]:
p = load_and_clean("pres2017.xls")
p = select_columns(p, "Nom")
p.head()

Unnamed: 0,date,ville,arrondissement,Exprimés,Nom,Voix,Nom.1,Voix.1,Nom.2,Voix.2,...,Nom.6,Voix.6,Nom.7,Voix.7,Nom.8,Voix.8,Nom.9,Voix.9,Nom.10,Voix.10
0,2017-04-23,Paris,1,9026,MACRON,3561,FILLON,2831,MÉLENCHON,1231,...,ASSELINEAU,58,LASSALLE,57,POUTOU,32,ARTHAUD,15,CHEMINADE,11
1,2017-04-23,Paris,2,11292,MACRON,5014,FILLON,2640,MÉLENCHON,1802,...,ASSELINEAU,81,POUTOU,49,LASSALLE,46,ARTHAUD,17,CHEMINADE,17
2,2017-04-23,Paris,3,18485,MACRON,8325,FILLON,3994,MÉLENCHON,3078,...,ASSELINEAU,100,POUTOU,92,LASSALLE,73,ARTHAUD,47,CHEMINADE,15
3,2017-04-23,Paris,4,15106,MACRON,6182,FILLON,3956,MÉLENCHON,2329,...,ASSELINEAU,96,LASSALLE,83,POUTOU,82,ARTHAUD,36,CHEMINADE,23
4,2017-04-23,Paris,5,31008,MACRON,12316,FILLON,8273,MÉLENCHON,4960,...,ASSELINEAU,222,LASSALLE,178,POUTOU,170,ARTHAUD,63,CHEMINADE,39


In [14]:
nuances_set = extract_nuances(p.filter(like="Nom"))
p = format_results(p, nuances_set)
p = attribute_parties(p, nuances_set)
p["type"] = "president"
p

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2017-04-23,Paris,1,9026,1231,659,,3561,2831,443,301,president
1,2017-04-23,Paris,2,11292,1802,1099,,5014,2640,399,338,president
2,2017-04-23,Paris,3,18485,3078,1963,,8325,3994,615,510,president
3,2017-04-23,Paris,4,15106,2329,1370,,6182,3956,735,534,president
4,2017-04-23,Paris,5,31008,4960,3103,,12316,8273,1225,1131,president
5,2017-04-23,Paris,6,22332,2038,1419,,8729,8769,719,658,president
6,2017-04-23,Paris,7,27798,1552,1068,,8785,14650,1064,679,president
7,2017-04-23,Paris,8,20698,1392,849,,6568,10448,916,525,president
8,2017-04-23,Paris,9,32940,4783,3163,,14029,8879,1092,994,president
9,2017-04-23,Paris,10,44766,11396,6343,,16880,6724,1817,1606,president


Got it? Let's do the same thing for the other five elections, without giving you all the details:

In [15]:
for e, c, t in zip(
    ["leg2017.xlsx", "leg2012.xls", "reg2010.xls", "reg2015.xlsx", "euro2014.xlsx"],
    ["Code Nuance", "Code Nuance", "Nuance Liste", "Nuance Liste", "Nuance Liste"],
    ["legislative", "legislative", "regional", "regional", "european"],
):
    l = load_and_clean(e)
    l = select_columns(l, c)

    nuances_set = extract_nuances(l.filter(like=c))
    l = format_results(l, nuances_set)
    l = attribute_parties(l, nuances_set)
    l["type"] = t

    p = pd.concat([p, l], ignore_index=True)
p

Unnamed: 0,date,ville,arrondissement,N,farleft,left,green,center,right,farright,other,type
0,2017-04-23,Paris,1,9026,1231.0,659.0,,3561.0,2831.0,443.0,301.0,president
1,2017-04-23,Paris,2,11292,1802.0,1099.0,,5014.0,2640.0,399.0,338.0,president
2,2017-04-23,Paris,3,18485,3078.0,1963.0,,8325.0,3994.0,615.0,510.0,president
3,2017-04-23,Paris,4,15106,2329.0,1370.0,,6182.0,3956.0,735.0,534.0,president
4,2017-04-23,Paris,5,31008,4960.0,3103.0,,12316.0,8273.0,1225.0,1131.0,president
...,...,...,...,...,...,...,...,...,...,...,...,...
115,2014-05-25,Paris,16,45436,572.0,3734.0,1768.0,6382.0,20935.0,4933.0,7112.0,european
116,2014-05-25,Paris,17,45869,1596.0,6426.0,4204.0,6801.0,14664.0,4704.0,7474.0,european
117,2014-05-25,Paris,18,45808,3799.0,10082.0,8847.0,4431.0,6269.0,4254.0,8126.0,european
118,2014-05-25,Paris,19,40066,3740.0,8721.0,7688.0,3301.0,5962.0,3894.0,6760.0,european


And now we just have to concatenate the four dataframes and save the formatted results. After that, we'll start working on our model, yay!

In [16]:
pd.concat([df, d[d.ville == "Paris"], p]).sort_values(
    ["arrondissement", "date"]
).reset_index(drop=True).to_excel("data/results_by_arrdmt.xlsx")

In [17]:
%watermark -a AlexAndorra -n -u -v -iv

numpy  1.17.3
pandas 0.25.2
AlexAndorra 
last updated: Mon Nov 18 2019 

CPython 3.7.5
IPython 7.9.0
