# PICSURE API test notebook

Notebook aimed at testing ongoing issues with the PIC-SURE API. Two parts, 1. Environment set-up, and 2. Ongoing issues

# Environment set-up

### Installation of external dependencies

In [None]:
list_packages <- c("jsonlite", 
                   "ggplot2",
                   "plyr",
                   "dplyr",
                   "tidyr",
                   "purrr",
                   "stringr",
                   "ggrepel",
                   "devtools")

for (package in list_packages){
     if(! package %in% installed.packages()){
         install.packages(package, dependencies = TRUE, 
                         character.only = TRUE)
     }
     library(package, character.only = TRUE)
}

#### Installing latest R PIC-SURE API libraries from github

In [None]:
devtools::install_github("hms-dbmi/pic-sure-r-client", force=T)
devtools::install_github("hms-dbmi/pic-sure-r-adapter-hpds", force=T)

### Connecting to a PIC-SURE network

Testing environment: BioData Catalyst 

In [None]:
PICSURE_network_URL <- "https://biodatacatalyst.integration.hms.harvard.edu/picsure"
resource_id <- "02e23f52-f354-4e8b-992c-d37c8b9ba140"
token_file <- "token.txt"

In [None]:
token <- scan(token_file, what = "character")

In [None]:
myconnection <- picsure::connect(url = PICSURE_network_URL,
                                 token = token)

In [None]:
resource <- hpds::get.resource(myconnection,
                               resourceUUID = resource_id)

### Retrieving variables dictionary from HPDS Database

NB: dictionary methods work alright, it just might be useful for getting variable names

In [None]:
random_variable_name <- "\\NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese (THRV)\\Laboratory Measurements\\Blood and Urine Measurements\\Insulin\\"

In [None]:
random_variable_name

# Errors reproduction

## Issue 1: query.anyof.add() → HTTP Error 

query method `anyof` is throwing HTTP Error, although other query methods work fine (`select`, `add`, `filter`)

In [None]:
print(random_variable_name)

In [None]:
my_query = hpds::new.query(resource = resource)
hpds::query.anyof.add(query = my_query, 
                      keys = random_variable_name)
facts = hpds::query.run(query = my_query, result.type = "dataframe")

In [None]:
dim(facts)

In [None]:
head(facts)

In [None]:
hpds::query.show(my_query)

## Issue 2: Retrieving variables dictionary from HPDS Database doesn't work when no string is specified

In [None]:
plain_variablesDict <- hpds::find.in.dictionary(resource, "")

In [None]:
plain_variablesDict <- hpds::find.in.dictionary(resource, "NHLBI TOPMed")

In [None]:
plain_variablesDict

### Issue 3: Retrieving the whole dictionary from HPDS database takes forever

As compared to python, check R implementation

# Issue 4: Variable names in returned DataFrame are not the same as the ones from the dictionary

# Issue 5: Error returned by query.getResultDataFrame() when using invalid token is misleading

States that variable doesn't exist in resource, instead of saying that token is invalid.

In [None]:
hpds::query.select.add(query = my_query, 
                      keys = consent_dic[["name"]])

# Issue 6: Filter on a categorical variable cannot accept a vector as an argument, but only a list

- Actually other methods doesn't accept vectors but list, but only try a conversion and raise a warning without consequences
- Vector and list are often used indifferently in R. Just need to add a line to convert a vector to a list.  my_vector %>% list()
- And filter yet doesn't return a specific error, but is misleading instead

## Issue 7: Query shouldn't be modified in place, but rather use assignment operator

In [None]:
# BAD, current implementation
hpds::query.filter.add(query = my_query, 
                      keys = consent_dic[["name"]], 
                      phs_copdgene)
# usual R implementation 
my_query <- hpds::query.filter.add(query = my_query, 
                      keys = consent_dic[["name"]], 
                      phs_copdgene)

## Removing
- query.delete functions, not useful
- maybe:
    - dictionary.get.entries
    - get.crosscount


# Implementing a way to specify the query using OR instead of AND

For instance, one couldwant variables that do meet one specific query.filter() conditions, OR one other specfic query.filter() conditions. But might be hard to implement though, and not 100% sure that one need that actually.