# UDN Data Characterization

This notebook is intended to characterize the content of the clinical data in UDN to have a first glimpse of what is in the network and how is discribed.

## 1. DATA: UDN Network Resource 

The Undiagnosed Diseases Network (UDN), funded by the NIH Common Fund, is a research study to improve diagnosis and care of patients with undiagnosed conditions. The UDN established a nationwide network of clinicians and researchers who use both basic and clinical research to uncover the underlying disease mechanisms associated with these conditions. In its first 20 months, the UDN accepted 601 participants undiagnosed by traditional medical practices. Of those who completed their UDN evaluation during this time, 35% were given a diagnosis. Many of these diagnoses were rare genetic diseases including 31 previously unknown syndromes. 

The specific goals of UDN are to: (1) improve the level of diagnosis and care for patients with undiagnosed diseases through the development of common protocols designed by a large community of investigators; (2) facilitate research into the etiology of undiagnosed diseases, by collecting and sharing standardized, high-quality clinical and laboratory data including genotyping, phenotyping, and documentation of environmental exposures; and (3) create an integrated and collaborative community across multiple clinical sites and among laboratory and clinical investigators prepared to investigate the pathophysiology of these new and rare diseases.

For more information, please refer to https://commonfund.nih.gov/diseases

### PIC-SURE API

Databases exposed through PIC-SURE API encompass a wide heterogeneity of architectures and data models underneath. PIC-SURE hides this complexity, allowing researchers to access data in a normalized way and focus on the analysis and medical insights. The API is available in Python and R programming languages. 

The API is actively developed by the Avillach-Lab at Harvard Medical School. For more information, please refer to the GitHub repo:
* https://github.com/hms-dbmi/pic-sure-python-adapter-hpds
* https://github.com/hms-dbmi/pic-sure-python-client

---

### Environment setup

* Pre-requisites: R >= 3.6.1
* Anaconda

In [None]:
# list the packages required to create your R environment with conda
system('cat udn-r.yml', intern=TRUE)

In [None]:
# set up environment
system('conda env create -f udn-r.yml', intern=TRUE)
system('conda activate udn-r', intern=TRUE)

### Packages 

#### Install R packages for the analysis example

In [2]:
# R packages for analysis
list_packages <- c("devtools")

for (package in list_packages){
     if(! package %in% installed.packages()){
         install.packages(package, dependencies = TRUE)
     }
     library(package, character.only = TRUE)
}

#### Install latest R PIC-SURE API libraries from GitHub

In [4]:
# pic-sure api lib
devtools::install_github("hms-dbmi/pic-sure-r-client", force=T)
#devtools::install_github("hms-dbmi/pic-sure-r-adapter-hpds", force=T)

Downloading GitHub repo hms-dbmi/pic-sure-r-client@master


askpass (1.0   -> 1.1  ) [CRAN]
curl    (3.3   -> 4.3  ) [CRAN]
httr    (1.4.0 -> 1.4.1) [CRAN]
mime    (0.6   -> 0.8  ) [CRAN]
openssl (1.3   -> 1.4.1) [CRAN]
R6      (2.4.0 -> 2.4.1) [CRAN]
stringi (1.4.3 -> 1.4.5) [CRAN]
sys     (3.2   -> 3.3  ) [CRAN]


Installing 8 packages: askpass, curl, httr, mime, openssl, R6, stringi, sys


ERROR: Error in i.p(...): (converted from warning) installation of package ‘stringi’ had non-zero exit status


#### Load user functions

In [None]:
# R_lib for pic-sure
source("R_lib/utils.R")

## 2. DATA ACCESS Workflow
### 1. Connect to the UDN data resource using the HPDS adapter

In [None]:
# token is the individual key given to connect to the UDN resource
token_file <- "token.txt"
token <- scan(token_file, what = "character")

In [None]:
# Connection to the PicSure Client w/ key
PICSURE_network_URL <- "https://udn.hms.harvard.edu/picsure"
resource_id <- "8e8c7ed0-87ea-4342-b8da-f939e46bac26"

In [None]:
myconnection <- picsure::connect(url = PICSURE_network_URL,
                                 token = token)

In [None]:
resource <- hpds::get.resource(myconnection,
                               resourceUUID = resource_id)

In [None]:
# get object information
resource.help()

### 2. Explore data: data structure description

**Methods**:

    * Search: Dictionary method
    * Retrieve: Query method

**Data structures**:

    * Dictionary object structure
    * Query object structure

### 3. Data characterization
#### Download data
##### demographics