# PC Session 2

**Author:**
[Helge Liebert](https://hliebert.github.io/)

# Web APIs

## Requirements

In [None]:
## Libraries
library(xml2)
library(jsonlite)
library(httr)
library(rvest)

## Examples

You query APIs by sending a HTTP GET request for the data you desire. The request is specified using a *unique resource identifier* (URI) string. The process is similiar to how you specify a URL (*uniform resource locator*) when visiting a website using your computer's browser. For specific information on how to construct the API query, consult the documentation of the specific API you want to use. 

This tutorial mostly relies on the `kiva.org` API, a crowdfunding site for small business loans for entrepreneurs in developing countries. The link to the API documentation is given below.

In [None]:
## Request specific info from KIVA API

## Examples
## https://build.kiva.org/api
## https://build.kiva.org/docs/getting_started
## https://web.archive.org/web/20181110112914/http://build.kiva.org/

## Get the 20 most recent loans
newloans <- fromJSON("https://api.kivaws.org/v1/loans/newest.json", flatten = TRUE)
newloans

Simple queries can be immediately passed to the `jsonlite::fromJSON()` function. You can vary the method of your request and specify additional parameters to narrow down your request. Methods and parameters are listed in the documentation. 

In [None]:
## Only (sector==Agriculture), returned in html, look at in your browswer
## https://api.kivaws.org/v1/loans/search.html?sector=Agriculture

## Only (sector == Agriculture) & (country == Vietnam)
vnagsectorloans <- fromJSON("https://api.kivaws.org/v1/loans/search.json?sector=Agriculture&country_code=VN", flatten = TRUE)
head(vnagsectorloans)

In [None]:
## All lenders for a particular loan id
loans <- fromJSON("http://api.kivaws.org/v1/loans/38239/lenders.json")
str(loans)

JSON (*Java Script Object Notation*), like XML (*Extended Markup Language*), is a tree-like nested data format. Both data types are popular response formats for APIs. JSON was explicitly developed for this purpose. 

In [None]:
toJSON(loans, pretty = TRUE)

Using the `flatten = TRUE` option, or the `jsonlite::flatten()` function will simplify the structure of some nested elements.

In [None]:
## Simplify structure
loans <- fromJSON("http://api.kivaws.org/v1/loans/38239/lenders.json",
                  flatten = TRUE)
str(loans)
loans <- as.data.frame(loans$loans)
toJSON(loans, pretty = TRUE)

The Kiva API also has a method that lists all available methods. The API is well documented on their website. 

In [None]:
## List of all API methods
methods <- fromJSON("https://api.kivaws.org/v1/methods.json", flatten = TRUE)
methods

## Request specific info from KIVA API

The code in the next block constructs a query string from different string variables as the base components.

In [None]:
## Parameters
baseurl <- "https://api.kivaws.org/v1/"
method <- "loans/search.json?"
## method <- "loans/search.xml?"
## method <- "loans/search.html?"
country <- "VN,KH"
sector <- "Agriculture"
type <- "individuals"
status <- "funded"
sortby <- "newest"

## Construct URL
query <- paste0("country_code=", country, "&",
                "sector=", sector, "&",
                "borrower_type=", type, "&",
                "status=", status, "&",
                "sort_by=", sortby)
query
uri <- paste0(baseurl, method, query )
uri

Sometimes you may need to construct the request more explicitly. This is also useful for catchig errors when embedding your requests in a program.

In [None]:
## Send HTTP GET request, handle response content, library(httr)
response <- GET(uri)
response
if (response$status_code == 200) {
    jsontable <- content(response, as = "text")
} else {
    stop("HTTP response not OK!")
}
jsontable

The API only returns a single page consisting of 20 loans per request. To get more, you need to request additional pages. The metadata is returned in the `paging` list of the returned object. The loans are contained in the `loans` element.

In [None]:
## Parse json data
data <- fromJSON(jsontable, flatten = TRUE)
#str(data)
#names(data)
data$paging
data <- data$loans
head(data)
dim(data)

Again, we can also simply pass the URI directly.

In [None]:
## Even more simple, pass URI directly
data <- fromJSON(uri, flatten = TRUE)
data <- data$loans
#str(data)
names(data)
dim(data)
head(data[, c("tags", "themes", "description.languages")])


There are a few nested list elements in the returned data.table that we need to flatten. This lambda function transforms them to simple string columns.

In [None]:
## Nested elements need to be flattened
data$tags <- sapply(data$tags, function(x) paste(unlist(x), collapse = ", "))
data$themes <- sapply(data$themes, function(x) paste(unlist(x), collapse = ", "))
data$description.languages <- sapply(data$description.languages, function(x) paste(unlist(x), collapse = ", "))
head(data)    
head(data[, c("tags", "themes", "description.languages")])

## Simple script to collect more information

This script reads the metadata and interates over all pages to get all data for a specific search query.

We first set the parameters, then request info. Response tables have a fixed pagelength, so you need to send multiple requests, iterating over the page numbers you request. 

In [None]:
## Get all data, multiple requests, iterate over pages

## Note: very simple proof of concept
## (should check http response for error and have better tests)
## (more efficient to large queries to file immediately)

## Parameters
baseurl <- "https://api.kivaws.org/v1/"
method  <- "loans/search.json?"
country <- "VN"
sector  <- "Agriculture"
type    <- "individuals"
status  <- "funded"
sortby  <- "oldest" # (o/w duplicates may occur when new entries are added)
pagelength <- 20 # max page length allowed is 500

## Construct URL
query <- paste0("country_code=", country, "&",
                "sector=", sector, "&",
                "borrower_type=", type, "&",
                "status=", status, "&",
                ## "per_page=", pagelength, "&"
                "sort_by=", sortby)
uri <- paste0(baseurl, method, query)

## Get maxpagenumber and other information for iteration
response <- fromJSON(uri, flatten = TRUE)
#response$paging
maxpages <- response$paging$pages
records  <- response$paging$total
columns  <- ncol(response$loans)

## Open csv, write header
header <- names(response$loans)
write.table(t(header), file = "Data/kiva.csv", sep = ";",
            col.names = FALSE, row.names = FALSE)

# Or collect in data frame (don't do this for large jobs)
## data <- data.frame(matrix(nrow = 0, ncol = columns))
## names(data) <- header
header
maxpages 

In [None]:
## Simple helper function to flatten columns
unnest <- function(col) paste(unlist(col), collapse = ", ")

In [None]:
## Iterate over pages, limit to first three for test
for (p in seq(1, maxpages, by = 1)[1:3]) {

    ## Info
    print(paste0(p, "/", maxpages))

    ## Append page to uri
    pquery <- paste0(uri, "&page=", p)

    ## Get data, assert completeness
    loans <- fromJSON(pquery, flatten = TRUE)$loans
    stopifnot(nrow(loans) == pagelength)
    stopifnot(ncol(loans) == columns)

    ## Fix nested list columns
    loans$tags <- sapply(loans$tags, unnest)
    ## loans$themes <- sapply(loans$themes, unnest) # missing for older records
    loans$description.languages <- sapply(loans$description.languages, unnest)
 
    ## Collect loans in data frame
    ## data <- rbind(data, loans)

    ## Better to append to file
    write.table(loans, "Data/kiva.csv", sep = ";", append = TRUE,
                col.names = FALSE, row.names = FALSE)

}

head(data)
dim(data)

## Kiva GraphQL API

### Simple request using the new Kiva API

In [None]:
baseurl <- "https://api.kivaws.org/graphql?query="
query <- "{lend {loan (id: 1568001){id name}}}"
q <- URLencode(paste0(baseurl, query))

In [None]:
response <- GET(q)
response <- fromJSON(content(response, as = "text"))
response

In [None]:
## more elaborate query, nicer formatting.
query <- "{
  lend {
    loans(sortBy: newest) {
      values {
        id
        name
        loanAmount
      }
    }
  }
}"
q <- URLencode(paste0(baseurl, query))
q

In [None]:
response <- GET(q)
response <- fromJSON(content(response, as = "text"))
response

### Using a dedicated GHQL library

In [None]:
library("ghql")

In [None]:
## Link to the GraphQL schema api
link <- "https://api.kivaws.org/graphql?query="

In [None]:
## Connection
conn <- GraphqlClient$new(url = link)

In [None]:
## Define a Graphql Query
query <- '{
  lend {
    loans(sortBy: newest) {
      values {
        id
        name
        loanAmount
      }
    }
  }
}'

In [None]:
## The ghql query class and define query in a character string
new <- Query$new()$query("link", query)

In [None]:
## Inspecting the schema
str(new)
new$link

In [None]:
# Execute query
result <- conn$exec(new$link)
result

In [None]:
# Transform result to JSON
result.injson <- fromJSON(result, flatten = F)
result.injson

In [None]:
## more elaborate query
query <- '{
  lend {
    loans (filters: {gender: female, country: ["KE", "US"]}, limit: 5) {
      totalCount
      values {
        name
        loanAmount
        image {
          url(presetSize: small)
        }
        activity {
          name
        }
        geocode {
          country {
            isoCode
            name
          }
        }
        lenders (limit: 0) {
          totalCount
        }
        ... on LoanPartner {
          partnerName
        }
        ... on LoanDirect {
          trusteeName
        }
      }
    }
  }
}'

In [None]:
# set up new query and get JSON result
new <- Query$new()$query("link", query)
result.injson <- fromJSON(conn$exec(new$link), flatten = F)
result.injson

## More examples: UK Parliament info

This is an example using another API from the `theyworkforyou.com` API. The website provides data about UK politics and parliament. You need to request an API key for authorization to use their site. The basic plan is free for educational or charitable purposes.

In [None]:
## TheyWorkForYou.com Example
apikey <- "G3WVqtBtKAbdGVqrd8BKajm8"
base <- "https://www.theyworkforyou.com/api/"
format <- "js"
func <- "getMPs?"
query <- paste0("&", "key=", apikey, "&", "output=", format)
uri <- paste0(base, func, query)
uri

In [None]:
## listofmps <- fromJSON(uri) # problem with encoding, maybe xml is better
response <- GET(uri)
response <- content(response, as = "raw")
listofmps <- fromJSON(rawToChar(response))
head(listofmps)
