# Clinical Queries

## Setup the Client and Login into *pyopencga* 

**Configuration and Credentials** 

Let's assume we already have *pyopencga* installed in our python setup (all the steps described on [pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)).

You need to provide **at least** a host server URL in the standard configuration format for OpenCGA as a python dictionary or in a json file.


In [17]:
## Step 1. Import pyopencga dependecies
from pyopencga.opencga_config import ClientConfiguration # import configuration module
from pyopencga.opencga_client import OpencgaClient # import client module
from pprint import pprint
from IPython.display import JSON
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

## Step 2. User credentials
user = 'demouser'
####################################

## Step 3. Create the ClientConfiguration dict
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'
config_dict = {'rest': {
                       'host': host
                    }
               }

## Step 4. Create the ClientConfiguration and OpenCGA client
config = ClientConfiguration(config_dict)
oc = OpencgaClient(config)

## Step 5. Login to OpenCGA using the OpenCGA client- add password when prompted
oc.login(user)

print('Logged succesfuly to {}, your token is: {} well done!'.format(host, oc.token))


## Define some common variables

Here you can define some variables that will be used repeatedly over the notebook.

In [14]:
# Define the study id
study = 'reanalysis:rd38'

# Define a clinicalCaseId
case_id = 'OPA-10044-1'

# Define a interpretationId
interpretation_id = 'OPA-10044-1__2'

## 1. Comon Queries for Clinical Analysis

### Retrieve cases in a study
----
The query below retrieves the cases in a study. For performance reasons, we have limited the number of results retrieved in the query.

You can change the parameter `limit` to controle the number of cases you want to retrieve for the query. 

You can also control the information you want to retrieve and print from the cases with the parameters `include` and `fields`. 

In [16]:
## Query using the clinical search web service
cases_search = oc.clinical.search(study=study, include='id,type,proband,description,panels,interpretation', limit=5)
cases_search.print_results(title='Cases found for study {}'.format(study), fields='id,type,proband.id,panels.id,interpretation.id')

## Uncomment next line to display an interactive JSON viewer
# JSON(cases_search.get_results())

### Proband information: List of disorders and HPO terms from proband of a case
-------
The proband field from a case contains all the information related to a proband, including phenotypes and disorders.

You can retrieve all the phenotypes and disorders of a proband from a case by inspecting the information at the proband level. We'll use the random `case_id` defined above:

In [18]:
## Query using the clinical info web service
disorder_search = oc.clinical.search(study=study, include='id,type,proband', limit=5)
disorder_search.print_results(title='Disorders and phenotypes', fields='id,type,proband.id')

disorder_object = disorder_search.get_results()[0]['proband']

## Uncomment next line to display an interactive JSON viewer
# JSON(disorder_object)

### Check the interpretation id of a case
----
You can find the`interpretation id` from a case. This is useful to perform subsequent queries for that interpretation.

Note that you can control the fields that are printed by the function `print_results` with the parameter `fields`. To see the whole clinical analysis object, you can use the interactive JSON viewer below.

In [27]:
# Query using the clinical info web service
clinical_info = oc.clinical.info(clinical_analysis=case_id, study=study)
clinical_info.print_results(fields='id,interpretation.id,type,proband.id')

## Uncomment next line to display an interactive JSON viewer
# JSON(clinical_info.get_results()[0]['interpretation'])

### Inspect the Interpretation object
----
Here you will retrieve many useful information from a case interpretation.

In [116]:
## Query using the clinical info_interpretation web service
interpretation_object = oc.clinical.info_interpretation(interpretations='OPA-12120-1__2', study=study).get_results()

## Uncomment next line to display an interactive JSON viewer
# JSON(interpretation_object)

### Check Reported pathogenic variants in a case interpretation and list the variant tier
-----
Run the cell below to retrieve the interpretation stats, including the pathogenic variants reported in a case. 

In [69]:
## Query using the clinical info_interpretation web service
interpretation_stats = oc.clinical.info_interpretation(interpretations='OPA-12120-1__2', include='stats', study=study).get_results()[0]['stats']['primaryFindings']

## Uncomment next line to display an interactive JSON viewer
# JSON(interpretation_stats)

### Retrieve the annotation for the reported variants
----

Run the cell below to retrieve the annotation for the variants obtained 

In [77]:
## Query using the clinical info_interpretation web service
variant_annotation = oc.clinical.info_interpretation(interpretations='OPA-12120-1__2', include='primaryFindings.annotation', study=study).get_results()[0]['primaryFindings']

## Uncomment next line to display an interactive JSON viewer
# JSON(variant_annotation)

### PanelApp panels applied in the original analysis 
--------

Obtain the list of genes that were in the panel at the time of the original analysis

In [42]:
cases_search = oc.clinical.search(study=study, include='id,panels', limit= 5)
cases_search.print_results(title='Cases found for study {}'.format(study), fields='id,panels.id')

## Uncomment next line to display an interactive JSON viewer
# JSON(cases_search.get_results())

## 2. Use Case

**Situation**: I want to retrieve a case, check whether the case has a reported pathogenic variant. Retriev the annotation information about these variants, if available.
Finally, I want to come up with the list of tier 1, 2 and 3 variants for the sample.

### 1. Search Cases in the study and select one random case.
- First you need to perform the query of searching over all the cases in a study. Uncomment the second line to have a look at the JSON with all the cases in the study.

Note that this query can take time because there is plenty of information. it is recommended to restrict the search to a number of cases with the parameter `limit` as below:

In [88]:
## Search the cases
cases_search = oc.clinical.search(study=study, limit=3)
## Uncomment next line to display an interactive JSON viewer
# JSON(cases_search.get_results())

- Now you can select one random case id for the subsequent analysis

In [39]:
## Define an empty list to keep the case ids:
case_ids = []

## Iterate over the cases and retrieve the ids:
for case in oc.clinical.search(study=study, include='id').result_iterator():
    case_ids.append(case['id'])

## Uncomment for printing the list with all the case ids
# print(case_ids)

## Select a random case from the list
import random
if case_ids != []:
    print('There are {} cases in study {}'.format(len(case_ids), study))
    selected_case = random.choice(case_ids)
    print('Case selected for analysis is {}'.format(selected_case))
else:
    print('There are no cases in the study', study)

### 2. Retrieve the interpretation id/s from the seleted case

In [40]:
## Query using the clinical info web service
interpretation_info = oc.clinical.info(clinical_analysis=selected_case, study=study)
interpretation_info.print_results(fields='id,interpretation.id,type,proband.id')

## Select interpretation object 
interpretation_object = interpretation_info.get_results()[0]['interpretation']

## Select interpretation id 
interpretation_id = interpretation_info.get_results()[0]['interpretation']['id']

## Uncomment next line to display an interactive JSON viewer
# JSON(interpretation_object)

print('The interpretation id for case {} is {}'.format(selected_case, interpretation_object['id'] ))

### 3. Retrieve reported variants and the annotation, including tiering
- **Obtain the interpretation stats from the case**

In [33]:
## Query using the clinical info_interpretation web service
interpretation_stats = oc.clinical.info_interpretation(interpretations=interpretation_id, include='stats', study=study).get_results()[0]['stats']['primaryFindings']

## Uncomment next line to display an interactive JSON viewer
# JSON(interpretation_stats)

- **Obtain annotation from variants reported in a interpretation from a case as a JSON object**

In [34]:
## Query using the clinical info_interpretation web service
primary_findings = oc.clinical.info_interpretation(interpretations=interpretation_id, study=study).get_results()[0]['primaryFindings']

## Uncomment next line to display an interactive JSON viewer
# JSON(primary_findings)

- **Obtain tiering: variant ids, genes, and tier from a case interpretation**

In [35]:
## Perform the query
variants_reported = oc.clinical.info_interpretation(interpretations=interpretation_id, study=study)

## Define empty list to store the variants, genes and the tiering
variant_list = []
gene_id_list=[]
genename_list=[]
tier_list =[]


for variant in variants_reported.get_results()[0]['primaryFindings']:
    variant_id = variant['id']
    variant_list.append(variant_id)
    gene_id = variant['evidences'][0]['genomicFeature']['id']
    gene_id_list.append(gene_id)
    gene_name = variant['evidences'][0]['genomicFeature']['geneName']
    genename_list.append(gene_name)
    tier = variant['evidences'][0]['classification']['tier']
    tier_list.append(tier)
    
## Construct a Dataframe and return the first 5 rows
df = pd.DataFrame(data = {'variant_id':variant_list, 'gene_id':gene_id_list, 'gene_name':genename_list, 'tier': tier_list})
df.head()
