# Pulling data from the NHS BSA Open Data Portal (ODP) using R

In [0]:
# Load any packages
library(jsonlite)

The ODP https://opendata.nhsbsa.net/ has two programatic methods to access data from it...

* `datastore_search` e.g. https://opendata.nhsbsa.net/api/3/action/datastore_search?resource_id=EPD_201401&limit=5
* `datastore_search_sql` e.g. https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?sql=SELECT%20*%20FROM%20EPD_201401%20LIMIT%205

The following code demonstrates the process using the SQL style query. It is a more flexible way to access any data and easy if you already know some SQL (if not don't worry - the code is there for you to follow).

In [0]:
# Define the url for the API call
base_endpoint <- "https://opendata.nhsbsa.net/api/3/action"
action_method <- "/datastore_search_sql?sql=" # SQL

# Define the parameters for the SQL query
resource_name <- "EPD_202001"
pco_code <- "13T00" # Newcastle Gateshead CCG
bnf_chemical_substance <- "0407010H0" # Paracetamol

# Construct the SQL query
query <- paste0(
    "
    SELECT 
        * 
    FROM ", 
        resource_name, " 
    WHERE 
        1=1 
    AND pco_code = '", pco_code, "' 
    AND bnf_chemical_substance = '", bnf_chemical_substance, "'"
)

# Send API call and grab the response as a json
response <- jsonlite::fromJSON(paste0(
    base_endpoint,
    action_method, 
    URLencode(query) # Encode spaces in the url
))


The response from the API is held as a dictionary, you can view it by using the `print()` command below:

In [0]:
# Try to print some of the data we have... e.g. print(response), print(query)

Now we can use base `R` to analyse the data in a tabular format, we can use `ggplot2` to produce plots, this is the most popular R package for plotting.

In [0]:
# Extract records in the response to a dataframe
result_df <- response$result$result$records

# View the first 6 rows of data
head(result_df)

Next up we can utilise some of the inbuilt `pandas` plotting functionality to create some quick and easy visualisations

In [0]:
# Lets inspect the QUANTITY column

# Can we try removing the background

# How about using more bins

# What about one bin per value of QUANTITY

# Lets see if QUANTITY varies by BNF_DESCRIPTION

# We can see that BNF_DESCRIPTION contains different forms for the drugs... 
# why don't we limit this to 'tablet' and check again

# We can see there are peaks for certain QUANTITY so lets examine the 10 most 
# common QUANITTY


Now recreate the previous graph but for 'oral suspension' instead of 'tablet'

In [0]:
# Try to create a DataFrame called oral_suspension_df and then produce a histogram from it