# Cours 6 - Exercice - Création de dataframe

INFMDI721 - Cours - 07/11/2019

À partir des datasets population_communes.csv et surface_departements.csv, créer un nouveau dataset qui contient une ligne par département, avec ces colonnes:

    - la somme des "Population municipale" du département
    - la somme des "Population totale" du département (pour l'explication de la distinction entre "Population municipale" et "Population totale", voir: https://www.insee.fr/fr/metadonnees/definition/c1270)
    - la part (en pourcentage) de la population municipale par rapport à la population totale
    - la part (en pourcentage) de la population (municipale) du département au sein de sa région
    - la densité de la population (municipale) en nb d'habitants / km2

(Le dataset final devrait ressembler à result-exo-cc.csv)


In [39]:
import pandas as pd
import numpy as np

## Chargement des datasets

In [74]:
pop = pd.read_csv("inputs/population_communes.csv")
pop.head()

Unnamed: 0,Code département,Code canton,Code arrondissement,Code région,Nom de la commune,Code commune,Nom de la région,variable,value
0,64,27.0,3,75,Aast,1,Nouvelle-Aquitaine,Population totale,184
1,64,27.0,3,75,Aast,1,Nouvelle-Aquitaine,Population municipale,177
2,55,10.0,2,44,Abainville,1,Grand Est,Population totale,310
3,55,10.0,2,44,Abainville,1,Grand Est,Population municipale,305
4,60,11.0,1,32,Abancourt,1,Hauts-de-France,Population totale,658


In [75]:
area = pd.read_csv("inputs/surface_departements.csv")
area.head()

Unnamed: 0,code_insee,nom,surf_km2
0,974,La Réunion,2505.0
1,11,Aude,6343.0
2,43,Haute-Loire,5003.0
3,13,Bouches-du-Rhône,5247.0
4,47,Lot-et-Garonne,5385.0


## Création du dataframe

In [76]:
df = pd.DataFrame(columns=["Population municipale","Population totale","Population municipale %","Population municipale / région","Densité"])
df.head()

Unnamed: 0,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité


## Calcul des populations

In [77]:
df["Population municipale"] = pop[pop["variable"] == "Population municipale"].groupby("Code département").agg({"value":"sum"})["value"]
df["Population totale"] = pop[pop["variable"] == "Population totale"].groupby("Code département").agg({"value":"sum"})["value"]
df["Population municipale %"] = df["Population municipale"] / df["Population totale"]

In [78]:
df.head()

Unnamed: 0_level_0,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
Code département,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,638425,655171,0.97444,,
2,536136,549587,0.975525,,
3,339384,349336,0.971512,,
4,162565,167331,0.971518,,
5,141107,146148,0.965508,,


In [82]:
df.reset_index(inplace=True)
df.head()

Unnamed: 0,Code département,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
0,1,638425,655171,0.97444,,
1,2,536136,549587,0.975525,,
2,3,339384,349336,0.971512,,
3,4,162565,167331,0.971518,,
4,5,141107,146148,0.965508,,


## Calcul population département / population région

In [156]:
region_pop = pop.groupby(["Code département","Code région"]).sum().groupby("Code région").transform("sum")["value"]
region_pop.index = region_pop.index.droplevel(1)

In [159]:
df["Population municipale / région"] = df["Population municipale"] / region_pop
df.head()

Unnamed: 0,Code département,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
0,1,638425,655171,0.97444,,110.377766
1,2,536136,549587,0.975525,,72.343274
2,3,339384,349336,0.971512,,45.993224
3,4,162565,167331,0.971518,,23.246818
4,5,141107,146148,0.965508,,24.76865


La division ne fonctionne pas. Il faut transformer les données au préalable.

## Calcul densité

In [87]:
df = df.merge(area.rename(columns={"code_insee":"Code département"}))
df.head()

Unnamed: 0,Code département,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité,nom,surf_km2
0,1,638425,655171,0.97444,,110.377766,Ain,5784.0
1,2,536136,549587,0.975525,,72.343274,Aisne,7411.0
2,3,339384,349336,0.971512,,45.993224,Allier,7379.0
3,4,162565,167331,0.971518,,23.246818,Alpes-de-Haute-Provence,6993.0
4,5,141107,146148,0.965508,,24.76865,Hautes-Alpes,5697.0


In [89]:
df["Densité"] = df["Population municipale"]/df["surf_km2"]
df.drop(['nom', 'surf_km2'], axis=1, inplace=True)
df.head()

Unnamed: 0,Code département,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
0,1,638425,655171,0.97444,,110.377766
1,2,536136,549587,0.975525,,72.343274
2,3,339384,349336,0.971512,,45.993224
3,4,162565,167331,0.971518,,23.246818
4,5,141107,146148,0.965508,,24.76865


In [23]:
pop[pop["variable"] == "Population municipale"].groupby("Code région").agg({"value":"sum"})["value"]

Code région
1       394110
2       376480
3       269352
4       852924
11    12117132
24     2577866
27     2818338
28     3335929
32     6006870
44     5555186
52     3737632
53     3306529
75     5935603
76     5808435
84     7916889
93     5021928
94      330455
Name: value, dtype: int64

In [41]:
pop[pop["variable"] == "Population municipale"].groupby(["Code région", "Code département"]).agg({"value":"sum"})

Unnamed: 0_level_0,Unnamed: 1_level_0,value
Code région,Code département,Unnamed: 2_level_1
1,971,394110
2,972,376480
3,973,269352
4,974,852924
11,75,2190327
11,77,1397665
11,78,1431808
11,91,1287330
11,92,1603268
11,93,1606660


In [24]:
df["Population municipale / région"] = df["Population municipale"] / pop[pop["variable"] == "Population municipale"].groupby("Code région").agg({"value":"sum"})["value"]
df.head()

Unnamed: 0_level_0,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
Code département,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,638425,655171,0.97444,,
2,536136,549587,0.975525,,
3,339384,349336,0.971512,,
4,162565,167331,0.971518,,
5,141107,146148,0.965508,,


In [37]:
df["Densité"] = df["Population totale"] / area["surf_km2"]
df.head()

Unnamed: 0_level_0,Population municipale,Population totale,Population municipale %,Population municipale / région,Densité
Code département,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,638425,655171,0.97444,,
2,536136,549587,0.975525,,
3,339384,349336,0.971512,,
4,162565,167331,0.971518,,
5,141107,146148,0.965508,,
