-
Notifications
You must be signed in to change notification settings - Fork 1
/
CO2_emissions_demo_sol.qmd
113 lines (72 loc) · 3.69 KB
/
CO2_emissions_demo_sol.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Exploring CO2 Emissions Over Time"
format: html
editor: visual
---
Clears the environment.
```{r}
rm(list = ls())
```
## Introduction
This data analysis project is based on [Open Case Studies on Exploring CO2 Emissions Over Time](https://www.opencasestudies.org/ocs-bp-co2-emissions/). This case study explores how different countries have contributed to Carbon Dioxide (CO2) emissions over time.
### Load the packages and data in.
```{r, warning=F, message=F}
library(OCSdata)
library(tidyverse)
library(esquisse)
library(readxl)
#raw_data("ocs-bp-co2-emissions", outpath = getwd())
CO2_emissions = read_xlsx("OCS_data/data/raw/yearly_co2_emissions_1000_tonnes.xlsx")
```
#### Examine the data.
```{r}
dim(CO2_emissions)
head(CO2_emissions)
CO2_emissions[1:5, 200:205]
```
Recall the definition of **Tidy Data**:
1. Each variable is a column; each column is a variable.
2. Each observation is a row; each row is an observation.
3. Each value is a cell; each cell is a single value.
This dataset is **not** **tidy**, because the value of the cells describe the variable Emissions, but they don't correspond back to the column names. Also, it seems that the column names belong to a variable Year.
To fix this, we use the `pivot_longer` function:
```{r}
CO2_emissions = pivot_longer(CO2_emissions, cols = -country, names_to = "Year", values_to = "Emissions")
CO2_emissions$Year = as.numeric(CO2_emissions$Year)
head(CO2_emissions)
```
The dataset is now **tidy**. The previous column names are now values of the column Year, and all previous cells are under the column Emissions.
#### Make a plot to show how the CO2 emission of United States changed over time. You will need to filter your dataset to the appropriate country of interest.
```{r}
US_emissions = CO2_emissions %>% filter(country == "United States")
ggplot(US_emissions) + aes(x = Year, y = Emissions) + geom_point()
#1900 - 2010
US_emissions_recent = US_emissions %>% filter(Year >= 1900 & Year <= 2010)
ggplot(US_emissions_recent) + aes(x = Year, y = Emissions) + geom_point()
```
#### Compare CO2 emissions of United States with other countries. You can either filter for a few countries of interest and make the comparison. Or use the `group` aesthetic to create a line or point for each country.
```{r}
ggplot(CO2_emissions) + aes(x = Year, y = Emissions, group = country) + geom_line()
```
#### What is the aggregrate CO2 emission of the world changing over time? Use `group_by` and `summarise` to create this aggregate dataframe, then make a plot out of it. You would need to use `sum` function within `summarise` to total up the CO2 emission across all countries.
```{r}
CO2_emissions_grouped = group_by(CO2_emissions, country)
CO2_emissions_summary = summarise(CO2_emissions_grouped, SumEmissions = sum(Emissions, na.rm=TRUE), n_years = n())
```
#### We load in a second dataset, `US_temperature`, that gives the average temperature of the United States every year.
```{r}
US_temperature = read.csv("OCS_data/data/raw/temperature.csv", skip = 4)
US_temperature = mutate(US_temperature, Year = as.numeric(str_sub(Date, start = 1, end = 4)))
US_temperature = US_temperature %>% mutate(country = "United States", temperature = Value) %>% select(-Anomaly, -Value)
```
#### Using `filter` and a `join` function (ie. `full_join`, `inner_join`, `left_join`, or `right_join`), merge the two datasets together.
```{r}
US_together = full_join(US_emissions, US_temperature, by = c("Year", "country"))
```
#### Make a plot to see whether there is any relationship between temperature and CO2 emissions in the United States.
```{r}
ggplot(US_together) + aes(x = temperature, y = Emissions) + geom_point()
```
#### Anything else you want to explore?
```{r}
```