linogaliana
diff --git a/‎_quarto-prod.yml
Lines changed: 1 addition & 0 deletions b/‎_quarto-prod.yml
Lines changed: 1 addition & 0 deletions
diff --git a/‎_quarto.yml
Lines changed: 2 additions & 2 deletions b/‎_quarto.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎content/manipulation/02_pandas_suite.qmd
Lines changed: 18 additions & 467 deletions b/‎content/manipulation/02_pandas_suite.qmd
Lines changed: 18 additions & 467 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo1_en.qmd
Lines changed: 17 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo1_en.qmd
Lines changed: 17 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo1_fr.qmd
Lines changed: 17 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo1_fr.qmd
Lines changed: 17 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo1_solution.qmd
Lines changed: 82 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo1_solution.qmd
Lines changed: 82 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo2_en.qmd
Lines changed: 12 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo2_en.qmd
Lines changed: 12 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo2_fr.qmd
Lines changed: 12 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo2_fr.qmd
Lines changed: 12 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo2_solution.qmd
Lines changed: 54 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo2_solution.qmd
Lines changed: 54 additions & 0 deletions
diff --git a/‎content/manipulation/02_pandas_suite/_exo3_en.qmd
Lines changed: 24 additions & 0 deletions b/‎content/manipulation/02_pandas_suite/_exo3_en.qmd
Lines changed: 24 additions & 0 deletions
@@ -17,6 +17,7 @@ project:
     - content/manipulation/02_pandas_intro.qmd
     - content/manipulation/02_pandas_intro_en.qmd
     - content/manipulation/02_pandas_suite.qmd
+    - content/manipulation/02_pandas_suite_en.qmd
     - content/manipulation/03_geopandas_intro.qmd
     - content/manipulation/02a_pandas_tutorial.qmd
     - content/manipulation/02b_pandas_TP.qmd
 
@@ -12,8 +12,8 @@ project:
     - content/getting-started/06_rappels_fonctions.qmd
     - content/getting-started/07_rappels_classes.qmd
     - content/manipulation/index.qmd
-    - content/manipulation/02_pandas_intro.qmd
-    - content/manipulation/02_pandas_intro_en.qmd
+    - content/manipulation/02_pandas_suite.qmd
+    - content/manipulation/02_pandas_suite_en.qmd
     - content/visualisation/index.qmd
     - content/modelisation/index.qmd
     - content/NLP/index.qmd
 
@@ -0,0 +1,17 @@
+::: {.exercise}
+## Exercise 1: Group Aggregations
+
+1. Calculate the total emissions of the "Residential" sector by department and compare the value to the most polluting department in this sector. Draw insights from the reality that this statistic reflects.
+
+2. Calculate the total emissions for each sector in each department. For each department, calculate the proportion of total emissions coming from each sector.  
+
+<details>
+<summary>
+Hint for this question
+</summary>
+
+* _"Group by"_ = `groupby`
+* _"Total emissions"_ = `agg({*** : "sum"})`
+</details>
+
+:::
@@ -0,0 +1,17 @@
+::: {.exercise}
+## Exercice 1 : agrégations par groupe
+
+1. Calculer les émissions totales du secteur "Résidentiel" par département et rapporter la valeur au département le plus polluant dans le domaine. En tirer des intutitions sur la réalité que cette statistique reflète.
+
+2. Calculer, pour chaque département, les émissions totales de chaque secteur. Pour chaque département, calculer la proportion des émissions totales venant de chaque secteur.  
+
+<details>
+<summary>
+Indice pour cette question
+</summary>
+
+* _"Grouper par"_ = `groupby`
+* _"émissions totales"_ = `agg({*** : "sum"})`
+</details>
+
+:::
@@ -0,0 +1,82 @@
+```{python}
+#| echo: false
+#| output: asis
+if lang == "en":
+    print("In question 1, the result should be as follows:")
+else:
+    print("A la question 1, le résultat obtenu devrait être le suivant:")
+```
+
+
+```{python}
+# Question 1
+emissions_residentielles = (
+    emissions
+    .groupby("dep")
+    .agg({"Résidentiel" : "sum"})
+    .reset_index()
+    .sort_values("Résidentiel", ascending = False)
+)
+emissions_residentielles["Résidentiel (% valeur max)"] = emissions_residentielles["Résidentiel"]/emissions_residentielles["Résidentiel"].max()
+emissions_residentielles.head(5)
+```
+
+
+```{python}
+#| echo: false
+#| output: asis
+if lang == "en":
+    print(
+    """
+    This ranking may reflect demographics rather than the process we wish to measure. Without the addition of information on the population of each département to control for this factor, it is difficult to know whether there is a structural difference in behavior between the inhabitants of Nord (département 59) and Moselle (département 57). 
+    """
+    )
+else:
+    print(
+    """
+    Ce classement reflète peut-être plus la démographie que le processus qu'on désire mesurer. Sans l'ajout d'une information annexe sur la population de chaque département pour contrôler ce facteur, on peut difficilement savoir s'il y a une différence structurelle de comportement entre les habitants du Nord (département 59) et ceux de la Moselle (département 57). 
+    """
+    )
+```
+
+
+```{python}
+# Question 2
+emissions_par_departement = (
+  emissions.groupby('dep').sum(numeric_only=True)
+)
+emissions_par_departement['total'] = emissions_par_departement.sum(axis = 1)
+emissions_par_departement["Part " + secteurs] = (
+  emissions_par_departement
+  .loc[:, secteurs]
+  .div(emissions_par_departement['total'], axis = 0)
+  .mul(100)
+)
+```
+
+
+```{python}
+#| echo: false
+#| output: asis
+if lang == "en":
+    print(
+    """
+    At the end of question 2, let's take the share of emissions from agriculture and the tertiary sector in departmental emissions: 
+    """
+    )
+else:
+    print(
+    """
+    A l'issue de la question 2, prenons la part des émissions de l'agriculture et du secteur tertiaire dans les émissions départementales:
+    """
+    )
+```
+
+
+```{python}
+emissions_par_departement.sort_values("Part Agriculture", ascending = False).head(5)
+```
+
+```{python}
+emissions_par_departement.sort_values("Part Tertiaire", ascending = False).head(5)
+```
@@ -0,0 +1,12 @@
+::: {.exercise}
+## Exercice 2: Restructuring Data: Wide to Long
+
+1. Create a copy of the ADEME data by doing `df_wide = emissions.copy()`
+
+2. Restructure the data into the *long* format to have emission data by sector while keeping the commune as the level of analysis (pay attention to other identifying variables).
+
+3. Sum the emissions by sector and represent it graphically.
+
+4. For each department, identify the most polluting sector.
+
+:::
@@ -0,0 +1,12 @@
+::: {.exercise}
+## Exercice 2: Restructurer les données : wide to long
+
+1. Créer une copie des données de l'`ADEME` en faisant `df_wide = emissions_wide.copy()`
+
+2. Restructurer les données au format *long* pour avoir des données d'émissions par secteur en gardant comme niveau d'analyse la commune (attention aux autres variables identifiantes).
+
+3. Faire la somme par secteur et représenter graphiquement
+
+4. Garder, pour chaque département, le secteur le plus polluant
+
+:::
@@ -0,0 +1,54 @@
+```{python}
+#| output: false
+#| label: question1
+# Question 1
+
+emissions_wide = emissions.copy()
+emissions_wide[['Commune','dep', "Agriculture", "Tertiaire"]].head() 
+```
+
+```{python}
+#| output: false
+#| label: question2
+# Question 2
+emissions_wide.reset_index().melt(id_vars = ['INSEE commune','Commune','dep'],
+                          var_name = "secteur", value_name = "emissions")
+```
+
+```{python}
+#| output: false
+#| label: question3
+# Question 3
+
+emissions_totales = (
+  emissions_wide.reset_index()
+ .melt(
+    id_vars = ['INSEE commune','Commune','dep'],
+    var_name = "secteur", value_name = "emissions"
+    )
+ .groupby('secteur')
+ .sum(numeric_only = True)
+)
+
+emissions_totales.plot(kind = "barh")
+```
+
+```{python}
+#| output: false
+#| label: question4
+# Question 4
+
+top_commune_dep = (
+  emissions_wide
+  .reset_index()
+  .melt(
+    id_vars = ['INSEE commune','Commune','dep'],
+    var_name = "secteur", value_name = "emissions"
+  )
+ .groupby(['secteur','dep'])
+ .sum(numeric_only=True).reset_index()
+ .sort_values(['dep','emissions'], ascending = False)
+ .groupby('dep').head(1)
+)
+display(top_commune_dep)
+```
@@ -0,0 +1,24 @@
+::: {.exercise}
+## Exercise 3: Verification of Join Keys
+
+Let's start by checking the dimensions of the `DataFrames` and the structure of some key variables.
+In this case, the fundamental variables for linking our data are the communal variables.
+Here, we have two geographical variables: a commune code and a commune name.
+
+1. Check the dimensions of the `DataFrames`.
+
+2. Identify in `filosofi` the commune names that correspond to multiple commune codes and select their codes. In other words, identify the `LIBGEO` where there are duplicate `CODGEO` and store them in a vector `x` (tip: be careful with the index of `x`).
+
+We temporarily focus on observations where the label involves more than two different commune codes.
+
+* _Question 3_. Look at these observations in `filosofi`.
+
+* _Question 4_. To get a better view, reorder the obtained dataset alphabetically.
+
+* _Question 5_. Determine the average size (variable number of people: `NBPERSMENFISC16`) and some descriptive statistics of this data. Compare it to the same statistics on the data where labels and commune codes coincide.
+
+* _Question 6_. Check the major cities (more than 100,000 people) for the proportion of cities where the same name is associated with different commune codes.
+
+* _Question 7_. Check in `filosofi` the cities where the label is equal to Montreuil. Also, check those that contain the term _'Saint-Denis'_.
+
+:::