This repository has been archived by the owner on Sep 7, 2021. It is now read-only.
/
index.Rmd
90 lines (66 loc) · 3.16 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
title: "Exploring the Geo-PKO dataset"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
toc: false
editor_options:
chunk_output_type: console
---
This document contains a series of steps that the project members have performed to extract meaningful information the Geo-PKO dataset. More details on the dataset, as well as the version used here, can be found on its [homepage](https://www.pcr.uu.se/data/geo-pko/). Note that the current version of this report is produced using version 1.2.
## Setting up
Load packages.
```{r, warning=FALSE, message=FALSE}
library(tidyverse)
library(readr)
library(ggthemes)
library(knitr)
library(kableExtra)
library(lubridate)
```
We start by importing the dataset. To get a sense of how `read_csv` would parse the dataset, run `spec_csv()` on the dataset beforehand.
```{r}
specs <- spec_csv("data/geopko.csv")
```
This shows that R might arbitrarily parse our columns as logical, which may mess up the data. Here, we are going to dodge this issue by telling R to parse all columns as character.
```{r, warning=FALSE, message =FALSE}
GeoPKO <- read_csv("data/geopko.csv",
col_types = cols(.default="c"))
```
## An overview
Let's have a quick look at the dataset.
```{r}
str(GeoPKO)
kable(GeoPKO[1:5,]) %>% kable_styling() %>%
scroll_box(width = "100%", height = "200px") #displaying the first five rows
```
Whew, this list was kind of long, but this was because the GeoPKO includes 12,190 rows and 73 columns.
### What missions are included?
To see what missions are available in the dataset, run the following lines.
```{r}
unique(GeoPKO$Mission)
```
The dataset covers the period of 1994-2018. Some missions are still ongoing to this day, while others begun before 1994. From when till when are these missions covered?
```{r}
GeoPKO %>% select(Mission, year, month) %>%
mutate(date=zoo::as.yearmon(str_c(year, month, sep="-"))) %>% group_by(Mission) %>%
summarize(start_date=min(date), end_date=max(date)) %>% arrange(start_date) -> mission.period
kable(mission.period, caption= "Missions arranged by the earliest start date",
col.names=c("Mission", "Starting point", "End point")) %>% kable_styling() %>%
scroll_box(width = "100%", height = "300px")
```
One thing to note from the above table: the starting and end points are not necessarily the official start and end dates of the missions. Since data in GeoPKO is collected from deployment maps, these timestamps reflect the publication dates.
We can also extract the numbers of active missions during 1994-2018 and present the results with a simple line plot.
```{r}
NoMission <- GeoPKO %>% select(year, Mission) %>% distinct(year, Mission) %>% count(year)
Plot1 <- ggplot(NoMission, aes(x=(as.numeric(year)), y=n)) + geom_point() + geom_line(size=0.5) +
scale_x_continuous("Year", breaks=seq(1994, 2018, 1))+theme_classic()+
scale_y_continuous("Number of missions", breaks=seq(0,10,1)) +
theme(panel.grid=element_blank(),
axis.text.x=element_text(angle=45, vjust=0.5))
Plot1
```
### Number of troops
Placeholder for average troops by year calculation.
### Other queries
Placeholder for other queries related to HQ, UNMO, UNPOL, etc.