analysis/merge.Rmd

---
title: "Integrating Conflict Data"
output:
  workflowr::wflow_html:
    toc: false
editor_options:
  chunk_output_type: console
---

This page shows how to merge Geo-PKO data with conflict data and visualise the results. The examples used here are Uppsala University's ViEWS project, which forecasts conflict risk, and the Uppsala Conflict Data Programme (UCDP), one of the world's leading sources of data on armed conflict. Merging these datasets can provide insights into the links between conflict risk and peacekeeping deployments, and help policymakers make effective peacekeeping decisions where the risk of conflict is high.

## Setting up

Load packages.
```{r, warning=FALSE, message=FALSE}
library(sp)
library(tidyr)
library(dplyr)
library(geojsonio)
library(broom)
library(rgdal)
library(tidyverse)
library(ggplot2)
library(leaflet)
library(sf)
library(spdep)
library(maptools)
library(plyr)
library(rjson)
library(RJSONIO)
library(rmapshaper)
library(htmltools)
library(htmlwidgets)
```

### Geo-PKO x ViEWS
First, we import the datasets. We're  using two sets of forecast data from ViEWS. Both forecast the risk of state-based conflict, non-state conflict, and one-sided violence over the next 36 months in Africa. One is at country level, and the other is at PRIO-grid level, an innovative geospatial unit from the Peace Research Institute Oslo that divides the world into roughly 100km x 100km squares. We're using a version of Geo-PKO data that has been coded to the ViEWS format and includes numerical month IDs and PRIO-GRID IDs. And lastly, we're also importing a version of the Geo-PKO data at the country level.

```{r}
 geopkoviews <- read.csv("geopko_pgm.csv")
 predictors <- read.csv("ensemble_pgm.csv")
 countrypredict <- read.csv("ensemble_cm.csv")
 countrygeopkoviews <- read.csv("geopko_cm.csv")
```

Here's what the Geo-PKO dataset looks like in the ViEWS format, showing five rows within the database.
 
```{r}
kable(geopkoviews[90545:90550,]) %>% kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
```

Months are coded numerically and rather than coordinates, the data is mapped to a PRIO-grid ID. `pg_month_id` gives a unique month identifier to each PRIO-grid square. To use the published Geo-PKO dataset (as opposed to the ViEWS version), you'll need to merge ViEWS' `month_id` with Geo-PKO's `month' and `year` fields. This page will be updated when the new dataset is published, including how to do this merge.

### Filtering, joining and subsetting the datasets at the PRIO-grid level

The predictor database begins with July 2020 and forecasts the risk of conflict over the next 36 months ahead. Here's a preview of the data within it, showing state-based (`sb`), non-state (`ns`) and one-sided violence (`os`) forecasts.

```{r}
kable(predictors[90545:90550,]) %>% kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
```

The Geo-PKO dataset we're working with includes data from previous years, but we just want the past year (July 2019 - June 2020), so we'll filter the data to include only that period. Both the Geo-PKO and ViEWS datasets include `pg_id`, a unique code that corresponds to a specific grid square on the map. This is what we'll use to merge the datasets. _Please note the ViEWS-adapted version of the Geo-PKO dataset is actual only to month 474 (2018), and has not yet been updated to actual data from months following this. Therefore in this exercise we are working with extrapolated data for months 475-486. This page will be updated when the new datasets are published._

```{r}
# filtering for troop deployments over the most recent year
geopkoviews2 <- geopkoviews %>%
  filter(between(month_id, 475, 486))

# merging geopko with priogrid predictor data
priogriddf <- left_join(
  geopkoviews2, predictors,
  by = c("pg_id"), 
  na.rm = TRUE)
```

That's the dataset merged! Now, to work with it, we'll need to change the class of the `no_troops` and `pg_id` fields to 'numeric' so we can run functions like maximum, average etc.

```{r}
priogriddf$no_troops<-as.numeric(priogriddf$no_troops)
priogriddf$pg_id<-as.numeric(priogriddf$pg_id)
priogriddf$pg_id<-as.numeric(priogriddf$month_id)
```

Next, we're going to take the maximum value of `no_troops` over the year we're working with (July 2019 - June 2020), so we can compare conflict forecasts with the maximum number of troops deployed to a location in the year prior. We're also taking the maximum value of `average_allwthematic_sb` for each location, and removing duplicates so we're left with only one row per location.
```{r}
pgnewdf <- priogriddf %>% 
  select(pg_id, pg_month_id, no_troops,unpol_dummy,no_tcc,average_allwthematic_sb,average_allwthematic_ns,average_allwthematic_os)

pgnewdf1 <- pgnewdf %>% 
  group_by(pg_id) %>%
  dplyr::filter(average_allwthematic_sb == max(average_allwthematic_sb)) %>%
  dplyr::filter(average_allwthematic_ns == max(average_allwthematic_ns)) %>%
  dplyr::filter(average_allwthematic_os == max(average_allwthematic_os))

pgnewdf2 <- subset(pgnewdf1, !duplicated(subset(pgnewdf1, select=c(pg_id, no_troops))))

View(pgnewdf2)
```

### Preparing the shapefile and merging with data of interest

Like we mentioned before, the PRIO-grid unit involves dividing the entire world into roughly 100km x 100km squares. That means that if we want to map it, we'll be working with large files, so keep that in mind when you're reading in the shapefile:

```{r}
shapefile <- rgdal::readOGR(".../ViEWS/pgc.geojson")
```

The shapefile contains both geospatial polygon data and numerical data that corresponds to the ViEWS dataset; specifically, a PRIO-grid ID and a country ID. Here's what the non-spatial data looks like, showing five rows in the dataset.
 
```{r}
kable(shapefile@data[101:106,]) %>% kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
```

To work with the data within this shapefile, we need to fortify the shapefile. We also convert the IDs to rownames to make it easier to work with. And, finally, we merge it with `pgnewdf2`, which we created earlier.

```{r}

# fortify
shapefile@data$id <- rownames(shapefile@data)
shapefile.df <- fortify(shapefile, region = "id")

# merge data of interest
shapefile.df <- merge(shapefile, pgnewdf2, by.x = "pg_id", by.y = "pg_id", all.x=F, all.y=T, duplicateGeoms=TRUE)

# checking it worked - shapefile.df has the new attributes, yay
View(shapefile.df)
```

### Mapping Geo-PKO and ViEWS data

To map the data, we're going to use the `leaflet` package (and a bunch of others to support it). The first thing we do is set up our colour palette and bins. The other thing we include is a small segment of code that fixes spacing between any NA value in the legend, and the remainder of the legend.

```{r}
bins <- c(0, 10, 20, 50, 100, 200, 500, 1000, Inf)
pal <- colorNumeric("viridis", NULL)

#to fix spacing of NA in legend
css_fix <- "div.info.legend.leaflet-control br {clear: both;}" # CSS to correct spacing
html_fix <- htmltools::tags$style(type = "text/css", css_fix)  # Convert CSS to HTML
```

Next, let's map. We include three colour layers to shade squares according to their conflict forecast value. These layers cover state-based conflict, non-state conflict, and one-sided violence. Troop deployments are incorporated as labels, which you can see for each square on hover.
```{r}
map <- leaflet(shapefile.df) %>%
  addTiles() %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(shapefile.df$average_allwthematic_sb),
              group = "State-Based Conflict",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(shapefile.df$average_allwthematic_ns),
              group = "Non-State Conflict",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(shapefile.df$average_allwthematic_os),
              group = "One-Sided Violence",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.1, smoothFactor = 0.5,
              opacity = 0.0, fillOpacity = 0.0,
              fillColor = ~pal(shapefile.df$no_troops),
              label=paste("Troops Deployed: ", shapefile.df$no_troops),
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px", color="blue"),
                textsize = "15px", direction = "auto"),
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addLegend("bottomright",
            pal = pal,
            values = shapefile.df$average_allwthematic_sb,
            title = "Conflict Forecast",
            opacity = 1) %>%
  
  addLayersControl(
    baseGroups = c("State-Based Conflict", "Non-State Conflict", "One-Sided Violence"),
    options = layersControlOptions(collapsed = FALSE)
  )

map <- map %>% htmlwidgets::prependContent(html_fix) # legend NA fix

# to save as HTML, you can use the following code:
# saveWidget(map, file="pkoviews - priogrid.html")

map

```

And there we have it: an interactive map to view recent peacekeeping deployments (2019-2020) and projected conflict risk over the next 36 months.

### Doing the same at country level

Now we're going to go through the same steps as above, but use country-level databases so we can view the same information at country level.
```{r}

# filtering geo-pko for the most recent 12 months
countrygeopkoviews2 <- countrygeopkoviews %>%
  filter(between(month_id, 475, 486))

# selecting relevant variables from the predictor dataset
countrypredict2 <- countrypredict %>%
  select(month_id, country_id, average_basewthematic_sb, average_basewthematic_ns, average_basewthematic_os)

# merging the two datasets
countrydf <- left_join(
  countrygeopkoviews2, countrypredict2,
  by = c("country_id", "month_id"), 
  na.rm = TRUE)

# converting classes to numeric
countrydf$no_troops<-as.numeric(countrydf$no_troops)
countrydf$country_id<-as.numeric(countrydf$country_id)

# selecting relevant variables from the merged dataset
cnewdf <- countrydf %>% 
  select(country_id, country_month_id, no_troops,unpol_dummy,no_tcc,average_basewthematic_sb,average_basewthematic_ns,average_basewthematic_os)

# filtering to include only the maximum value per location for conflict forecast and maximum value per location for conflict forecast
cnewdf1 <- cnewdf %>% 
  group_by(country_id) %>%
  dplyr::filter(no_troops == max(no_troops), )%>%
  dplyr::filter(average_basewthematic_sb == max(average_basewthematic_sb)) %>%
  dplyr::filter(average_basewthematic_ns == max(average_basewthematic_ns)) %>%
  dplyr::filter(average_basewthematic_os == max(average_basewthematic_os))

# removing duplicates so we have only one value per location for number of troops and conflict forecast
cnewdf2 <- subset(cnewdf1, !duplicated(subset(cnewdf1, select=c(country_id, no_troops))))

# reading in the shapefile
cshapefile <- rgdal::readOGR("C:/Users/tanus/Documents/R/R Files/ViEWS/country.geojson")

# fortifying and merging the shapefile with our dataset
cshapefile@data$id <- rownames(cshapefile@data)
cshapefile.df <- fortify(cshapefile, region = "id")
cshapefile.df <- merge(cshapefile, cnewdf2, by.x = "country_id", by.y = "country_id", all.x=F, all.y=T, duplicateGeoms=TRUE)

# mapping using leaflet (we already set up the bins, colour palette and legend fix in the PRIO-grid-level process)

cmap <- leaflet(cshapefile.df) %>%
  addTiles() %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(cshapefile.df$average_basewthematic_sb),
              group = "State-Based Conflict",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(cshapefile.df$average_basewthematic_ns),
              group = "Non-State Conflict",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.25, smoothFactor = 0.5,
              opacity = 0.05, fillOpacity = 0.4,
              fillColor = ~pal(cshapefile.df$average_basewthematic_os),
              group = "One-Sided Violence",
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addPolygons(color = "#444444", weight = 0.1, smoothFactor = 0.5,
              opacity = 0.0, fillOpacity = 0.0,
              fillColor = ~pal(cshapefile.df$no_troops),
              label=paste("Troops Deployed: ", cshapefile.df$no_troops),
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px", color="blue"),
                textsize = "15px", direction = "auto"),
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = FALSE)) %>%
  addLegend("bottomright",
            pal = pal,
            values = cshapefile.df$average_basewthematic_sb,
            title = "Conflict Forecast",
            opacity = 1) %>%
  
  addLayersControl(
    baseGroups = c("State-Based Conflict", "Non-State Conflict", "One-Sided Violence"),
    options = layersControlOptions(collapsed = FALSE)
  )

cmap <- cmap %>% htmlwidgets::prependContent(html_fix) # legend NA fix

# to save as HTML, use the following code:
# saveWidget(cmap, file="pkoviews - country.html")

cmap
```

This now gives us insights at both the PRIO-grid level and country level. This can be extended to include more useful features; for example, a time-slider can help us identify how the risk of conflict changes given peacekeeping deployments, and vice versa. We'll soon add information on merging with UCDP data, which is already used around the world to inform conflict prevention and response.