analysis/index.Rmd

---
title: "Exploring the Geo-PKO dataset"
site: workflowr::wflow_site
output:
  workflowr::wflow_html:
    toc: false
editor_options:
  chunk_output_type: console
---

```{css, echo=FALSE}
pre {max-height: 400px;
  overflow-yL auto;
}

pre[class]{
  max-height: 400px;
}

```

## Overview

This document contains a series of steps that the project members have performed to extract meaningful information the Geo-PKO dataset. More details on the dataset can be found on its [homepage](https://www.pcr.uu.se/data/geo-pko/). 

We start by importing the dataset. To get a sense of how `read_csv` would parse the dataset, run `spec_csv()` on the dataset beforehand. 
```{r}
library(readr)

spec_csv("data/Geo_PKO_v.2.0.csv")
```

This snippet shows that R might arbitrarily parse our columns as a random mix of logical or character variables. For example, some of the `notroopspertcc` columns are numeric, while others are logical. Most of the time R will import the variables correctly, but here we are going to dodge this issue by telling R to parse all columns as character. Running `str()` can help us ensure that the process was carried out correctly. 

```{r, warning=FALSE, message =FALSE, results="hide"}
GeoPKO <- readr::read_csv("data/Geo_PKO_v.2.0.csv", col_types = cols(.default="c"))
str(GeoPKO)
nrow(GeoPKO)
ncol(GeoPKO)
```
The last two code lines show us that the dataframe contains `r nrow(GeoPKO)` observations and `r ncol(GeoPKO)` columns. The output from `str()` could get rather clunky, so here is a prettier snippet produced by `kableExtra`.

```{r, warning=FALSE}
library(kableExtra)

kable(GeoPKO[1:5,]) %>% kable_styling() %>%
  scroll_box(width = "100%", height = "200px") #displaying the first five rows
```

### What missions are included?

Previous versions of the dataset included UN-authorized peacekeeping missions in Africa between 1994-2018. Version 2.0 -- the latest to date -- brings the scope of the dataset from regional to global. Specifically, it now covers 52 missions in 44 different countries. However, running `unique(GeoPKO$country) will return a vector of 45 in length. This is because UNMEE, which took place along the border of Ethiopia and Eritrea, had one location located in a contested territory between these two countries.

```{r}

str(TotalCount <- with(GeoPKO, table(mission, country)))
unique(GeoPKO$mission)
unique(GeoPKO$country)
```

The following table shows missions (bottom row) and countries (top row). From this, we can see that most missions are active in only one single country. Only 1 mission is active in 9 countries at the same time.

```{r}
table(TotalCount. <- rowSums(TotalCount>0))
```
To see what missions are active in more than three countries we can do the following:

```{r} 
rownames(TotalCount)[TotalCount.>2] #listing missions active in three countries or more
rownames(TotalCount)[TotalCount.>3] #missions active in four or more countries
rownames(TotalCount)[TotalCount.>4] #mission active in 9 countries
```

Geo-PKO v2.0 covers the period of 1994-2019. Some missions that are included here were already well underway before 1994, and several are still ongoing to this date. Since data in GeoPKO is collected from deployment maps, these timestamps reflect the publication dates of the maps. Many UN peacekeeping missions release these maps on a regular basis, which could be once a year or once every three months. We can have a look at the specific periods for which each mission is coded in the dataset, arranged by the earliest starting timepoint. Note that this is not necessarily the start and end dates of a missions. 

```{r, warning=FALSE, message=FALSE}
library(zoo)
library(tidyr)
library(dplyr)
library(lubridate)

GeoPKO %>% select(mission, year, month) %>% 
  mutate(month=as.numeric(month),
         MonthChr=as.character(month(month, label=TRUE, abbr= FALSE))) %>%   #turn numeric months into words 
  unite(timepoint, c("year", "MonthChr"), sep=" ") %>% 
  mutate(timepoint=zoo::as.yearmon(timepoint, "%Y %B")) %>% #parse our joined date string as date to arrange
  group_by(mission) %>%
  summarize(start_date=min(timepoint), end_date=max(timepoint)) %>% arrange(start_date) %>% 
  kable(., caption= "Missions arranged by the earliest start date",
        col.names=c("Mission", "Starting point", "End point")) %>% kable_styling() %>%
  scroll_box(width = "100%", height = "300px") 
```

### Deployment size

Because the dataset gathers observations by deployment maps, depending on how often the maps are published, for certain years a location may have multiple observations. This makes it a bit tricky especially when we want to run analysis at the yearly level. There are several ways to aggregate the data by year, such as taking the yearly average, or taking the first or last observations of the year. The code below uses the first method to obtain the deployment size figures per location-year. For the sake of simplicity, we're starting first with only MINUSMA, but the same method could be applied to the entire dataset (with some reservations - more on this later). 

```{r}
GeoPKO %>% 
  filter(mission=="MINUSMA") %>% 
  mutate(no.troops=as.numeric(no.troops)) %>% 
  group_by(mission, year, location) %>% 
  summarise(YearlyAverage = round(mean(no.troops, na.rm=TRUE))) %>% 
  pivot_wider(names_from=year, values_from=YearlyAverage) %>% # turning the dataframe from long to wide format for easier viewing
  kable() %>% kable_styling() %>%
  scroll_box(width = "100%", height = "200px") #displaying the first ten rows
```

### Other features of the dataset

Other than locations and deployment size, version 2.0 also includes variables capturing other dimensions of peacekeeping activities. Components such as military observers, civilian and formed police, and liaison offices also make important contributions to the outcome of a mission. This information could be use to complement troop sizes in analysis, or to compare between missions.

To illustrate, we start by looking at UN mission observers and the UN police. Which mission has both UNMO and UNPOL at the same time, and when? In other words, we are looking for missions for which, at one given time, both UNMO and UNPOL bodies were present regardless of location. To do this, we want to filter for any source map that has _at least_ one UNMO _and_ one UNPOL deployment. This gives us seven missions. 

```{r}
GeoPKO %>% select(source, mission, location, unmo.dummy, unpol.dummy) %>%   group_by(source) %>% filter(any(unmo.dummy==1)& any(unpol.dummy==1)) %>%  ungroup() -> UNMO.UNPOL

unique(UNMO.UNPOL$mission)
```

Focusing on UNOCI, we can also run a cross-table to find out whether these UNMO and UNPOL were lodged together or not. When UNMO and UNPOL were both present in UNOCI, it seems that both often appeared at the same location. 

```{r}
UNMO.UNPOL %>% filter(mission =="UNOCI") %>% {table(.$unpol.dummy, .$unmo.dummy)} #the curly braces allow us to run table() directly within a pipe, rather than having to save the filtered tibble as a new object. 
```

Version 2.0 also records information on the type of military units deployed. While infantry units are the most common type, missions also make use of other military functions, such as engineer, medical, air force, and demining. Refer to the codebook for a full list of this type of variables. Overall, this may give us a better grasp on a mission's various capabilities. 

As an example, the following code snippet reveals that both MINUSCA and MINUSMA employ Unmanned Aerial Vehicles in their activities. 
```{r}
GeoPKO %>% filter(uav==1) %>% distinct(mission)
```

We can also see if missions become more specialized over time, here with MINUSMA as an example. Disregarding variations by location, for each source map we want to find out which troop functions were available in the mission. Then we calculate the number of troops functions at each time period. 

```{r, warning=FALSE, message=FALSE}
library(ggplot2)
library(ggthemes)

GeoPKO %>% filter(mission=="MINUSMA") %>%   unite(timepoint, c("year", "month"), sep=" ") %>% 
  mutate(timepoint=zoo::as.yearmon(timepoint, "%Y %m")) %>% 
  select(timepoint, mission, location, rpf, inf, fpu, res, fp, eng:uav) %>% #select all the troop type variables
  mutate_at(vars(rpf:uav), as.integer) %>%
  group_by(timepoint, mission) %>% 
  summarise_at(vars(rpf:uav), funs(case_when(any(.) ==1 ~ 1, TRUE ~ 0))) %>% 
  mutate(TroopCap=rowSums(across(rpf:uav), na.rm=TRUE)) %>% # at this point this seems about right, but the dates are all over the place, so we rearrange them using a similar code from above
  select(timepoint, mission, TroopCap) %>% 
  arrange(timepoint) %>% 
  ggplot(aes(timepoint, TroopCap))+
  geom_line()
```