<a href="https://colab.research.google.com/github/newton-c/haiti_monitoring/blob/main/ucdp_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [36]:
library(tidyverse) # data wrangling
library(httr) # call the api
library(jsonlite) # convert the JSON api response to a dataframe


Attaching package: ‘zoo’


The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric




### Get the last month's GED dataset
To automatically get the lastest data, we may run into two problems:

1. As the data are updated monthly, we need the version of the previous month. This is usually simple as we can take one away from the current month. But in January, we have to subtract from the year, and make the month 12.

2. If we call the API before the newest month is availibe, we'll get an error.
 To make it easier for those that are unfamiliar with the code, have the code return an error, but then try the previous month to see if the error is a result of asking for data that hasn't been updated yet, or if there is another
 connection error that need to be debugged. The `month` parameter in this
`def_version` function allows the later code to define the earlier month.

In [71]:
def_version <- function(month = NULL) {
  if (is.null(month)) {
    # for January, we need to go the December of the previous year
    if (as.numeric(format(Sys.Date(), "%m")) == 1) {
      year <- as.numeric(format(Sys.Date(), "%y")) - 1
      month <- 12
      version <- paste(year, 0, month, sep=".")
    } else {
      year <- format(Sys.Date(), "%y")
      month <- as.numeric(format(Sys.Date(), "%m")) - 1
      version <- paste(year, 0, month, sep=".")
    }
  } else if (!is.null(month)) {
    # for January, we need to go the December of the previous year
    if (month == 1) {
      year <- as.numeric(format(Sys.Date(), "%y")) - 1
      month <- 12
      version <- paste(year, 0, month, sep=".")
    } else {
      year <- format(Sys.Date(), "%y")
      month <- month
      version <- paste(year, 0, month, sep=".")
    }
  }
  return(version)
}

As of now, the only parameter changing in the API is the version of the dataset. This will likely change as we refine what we want to monitor.

In [80]:
def_url <- function(version) {
  url = paste0("https://ucdpapi.pcr.uu.se/api/",
              "gedevents",  "/",
              version, "?pagesize=1000&",
              "Country=41")
}

In [99]:
# get the version from last month
version <- def_version()
version

# define the API's URL
api_url <- def_url(version = version)
api_url

This code with will make the API request, assuming last month's data is the latest available. The code checks whether the connection was successful (`status_code` should be 200). If the connection is not successful, the code returns an error, printing the `status_code` (this can be looked up online to better understand what may be causing the problem), and then tries the API again with the versions from two months ago. If still unsucceful, the program prints the latest `status_code` and stops running.

In [96]:
# make the API request
api <- GET(api_url)

if (api$status_code != 200) {
  print(paste("Error: could not connect to the API | ",
              "Status: ", api$status_code,
              " | The latest month may not be availible. ",
              "Trying the previous month", sep = "\n"))

  # redefine the API call with the version from two months ago
  month <- as.numeric(format(Sys.Date(), "%m")) - 2
  version <- def_version(month = month)
  api_url <- def_url(version = version)
  api <- GET(api_url)
  if (api$status_code != 200) {
    stop(paste0("Error: Still cannot connect to the API | Status: ",
         api$status_code))
  } else {
    print(paste0("Conected successfully | Status: ", api$status_code))
  }
}

print(api)

Response [https://ucdpapi.pcr.uu.se/api/gedevents/24.0.8?pagesize=1000&Country=41]
  Date: 2024-09-25 15:50
  Status: 200
  Content-Type: application/json; charset=utf-8
  Size: 13.8 kB
{
  "TotalCount": 7,
  "TotalPages": 1,
  "PreviousPageUrl": null,
  "NextPageUrl": "",
  "Result": [
    {
      "id": 535457,
      "relid": "HAI-2024-1-1-XXX41-32",
      "year": 2024,
...


In [102]:
api_df = fromJSON(rawToChar(api$content))$Result # API to dataframe
print(colnames(api_df))
head(api_df)

 [1] "id"                "relid"             "year"             
 [4] "active_year"       "code_status"       "type_of_violence" 
 [7] "conflict_dset_id"  "conflict_new_id"   "conflict_name"    
[10] "dyad_dset_id"      "dyad_new_id"       "dyad_name"        
[13] "side_a_dset_id"    "side_a_new_id"     "side_a"           
[16] "side_b_dset_id"    "side_b_new_id"     "side_b"           
[19] "number_of_sources" "source_article"    "source_office"    
[22] "source_date"       "source_headline"   "source_original"  
[25] "where_prec"        "where_coordinates" "where_description"
[28] "adm_1"             "adm_2"             "latitude"         
[31] "longitude"         "geom_wkt"          "priogrid_gid"     
[34] "country"           "country_id"        "region"           
[37] "event_clarity"     "date_prec"         "date_start"       
[40] "date_end"          "deaths_a"          "deaths_b"         
[43] "deaths_civilians"  "deaths_unknown"    "best"             
[46] "high"              

Unnamed: 0_level_0,id,relid,year,active_year,code_status,type_of_violence,conflict_dset_id,conflict_new_id,conflict_name,dyad_dset_id,⋯,date_end,deaths_a,deaths_b,deaths_civilians,deaths_unknown,best,high,low,gwnoa,gwnob
Unnamed: 0_level_1,<int>,<chr>,<int>,<lgl>,<chr>,<int>,<chr>,<int>,<chr>,<chr>,⋯,<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<lgl>,<lgl>
1,535457,HAI-2024-1-1-XXX41-32,2024,False,Check dyad,1,XXX41,70,XXX41,1-XXX41,⋯,2024-08-11T00:00:00,0,7,0,0,7,7,7,,
2,539259,HAI-2024-1-1-XXX41-33,2024,False,Check dyad,1,XXX41,70,XXX41,1-XXX41,⋯,2024-08-16T00:00:00,0,12,0,0,12,12,12,,
3,539262,HAI-2024-1-1-XXX41-34,2024,False,Check dyad,1,XXX41,70,XXX41,1-XXX41,⋯,2024-08-28T00:00:00,1,0,0,0,1,1,1,,
4,539865,HAI-2024-1-1-XXX41-35,2024,False,Check dyad,1,XXX41,70,XXX41,1-XXX41,⋯,2024-08-24T00:00:00,0,4,0,0,4,4,4,,
5,539867,HAI-2024-2-2-XXX41-27,2024,False,Check dyad,2,2-XXX41,70,XXX41 - XXX41,2-XXX41,⋯,2024-08-15T00:00:00,0,0,0,1,1,1,1,,
6,539871,HAI-2024-3-3-XXX41-19,2024,False,Check dyad,3,XXX41,4329,XXX41 - Civilians,XXX41,⋯,2024-08-11T00:00:00,0,0,1,0,1,1,1,,


### Most Violent Areas

**WARNING: We have to be carefull with all monitoring to check a lack of reported violence in an area is due to there not being violence, due to a lack of coverage, due to the violence not falling under a dataset's definition, or due to some other cause that could create a false sense of security**

In [113]:
api_df$events <- 1
events_adm2 <- api_df |>
  group_by(adm_2) |>
  summarise(events = sum(events, na.rm = FALSE))

deaths_adm2 <- api_df |>
  group_by(adm_2) |>
  summarise(total_deaths = sum(best, na.rm = FALSE))

civ_deaths_adm2 <- api_df |>
  group_by(adm_2) |>
  summarise(civilian_deaths = sum(deaths_civilians, na.rm = FALSE))

events_adm2
deaths_adm2
civ_deaths_adm2

adm_2,events
<chr>,<dbl>
Port-au-Prince arrondissement,6
Saint-Marc arrondissement,1


adm_2,total_deaths
<chr>,<int>
Port-au-Prince arrondissement,15
Saint-Marc arrondissement,12


adm_2,civilian_deaths
<chr>,<int>
Port-au-Prince arrondissement,2
Saint-Marc arrondissement,0
