# *pyopencga* Catalog: Clinical Data and other Metadata. 

------
**[NOTE]** The server methods used by *pyopencga* client are defined in the following swagger URL:
- http://bioinfodev.hpc.cam.ac.uk/opencga-test/webservices

**[NOTE]** Current implemented methods are registered at the following spreadsheet:
- https://docs.google.com/spreadsheets/d/1QpU9yl3UTneqwRqFX_WAqCiCfZBk5eU-4E3K-WVvuoc/edit?usp=sharing

This notebook is intended to provide guidance for querying an OpenCGA server through *pyopencga* to explore studies which the user has access to, Clinical data provided in the study (Samples, Individuals Genotypes etc.) and other types of metadata, like permissions.

A good first step when start working with OpenCGA is to retrieve information about our user, which projects and studies are we allowed to see.<br>
It is also recommended to get a taste of the clinical data we are encountering in the study: How many samples and individuals does the study have? Is there any defined cohorts? Can we get some statistics about the genotypes of the samples in the Sudy?

For guidance on how to loggin and get started with *opencga* you can refer to : [001-pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)
 

## 1. Setup the Client and Login into *pyopencga* 

**Configuration and Credentials** 

Let's assume we already have *pyopencga* installed in our python setup (all the steps described on [001-pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)).

You need to provide **at least** a host server URL in the standard configuration format for OpenCGA as a python dictionary or in a json file.


In [2]:
from pyopencga.opencga_config import ClientConfiguration # import configuration module
from pyopencga.opencga_client import OpencgaClient # import client module
from pprint import pprint
import json

## Configuration parameters
# OpenCGA host
host = 'https://ws.opencb.org/opencga-prod'

# User credentials
user = 'demouser'
passwd = 'demouser' ## you can skip this, see below.
study = 'demo@family:platinum'
####################################

# Creating ClientConfiguration dict
config_dict = {'rest': {
                       'host': host 
                    }
               }
print('Config information:\n',config_dict)

# Pass the config_dict dictionary to the ClientConfiguration method
config = ClientConfiguration(config_dict)

# Create the client
oc = OpencgaClient(config)

# Pass the credentials to the client
# (here we put only the user in order to be asked for the password interactively)
# oc.login(user)

# or you can pass the user and passwd
oc.login(user, passwd)



Config information:
 {'rest': {'host': 'https://ws.opencb.org/opencga-prod'}}


# 2. Use Cases 

In this seciton we are going to show how to work with some of the most common scenarios.<br>
The user-cases addresed here constute a high-level introduction aimed to provide a basis for the user to make their own explorations. This example can be adapted to each individual user-case.



## 2.1 Working with our User: see permissions for Projects and studies

In this use case we cover retrieving information for our user.

First, we can get the list of available methods for the user client object:

In [3]:
# Listing available methods for the user client object ?????
user_client = oc.users


Depending on the permissions granted, an user can be the owner of a study or just have access to some studies owned by other users.<br>We can retrieve information about our usser and its permissions by:

In [8]:
## getting user information
## [NOTE] User needs the quey_id string directly --> (user)
user_info = user_client.info(user).get_result(0)

print('user info:')
print('name: {}\towned_projects: {}'.format(user_info['name'], len(user_info['projects'])))

### It would be great to have creation date and other info ?????


user info:
name: OpenCGA Demo User	owned_projects: 0


We can appreciate that our user (demouser) has **not** projects from its own, but has been granted access to some projects from `demo` user. Let's see how to find it out.

### Exploring the projects for our user

We can to list our user's projects using **project client** `search()` function.

In [5]:
## Getting user projects
## [NOTE] Client specific methods have the query_id as a key:value (i.e (user=user_id)) 
project_client = oc.projects
projects_info = project_client.search()

projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly', metadata=False)

#id	name	organism.scientificName	organism.assembly
family	Family Studies GRCh37	Homo sapiens	GRCh37
population	Population Studies GRCh38	Homo sapiens	GRCh38


Our user (demo) has access to 2 different projects:
- Project: **family**
- Project: **population**

### Exploring the Studies for our user

All the user permissions are established at a study level in OpenCGA. One project may contain different studies.

#### Full Qualified Name (fqn) of Studies 

It is also very important to understand that in OpenCGA, the projects and studies have a full qualified name (**fqn**) with the format [owner]@[project]:[study]
We can access the studies for the specific project *family*:

In [6]:
project_id = 'family'  # The project we want to retrieve info

## you can also use the following command:
## projects = project_client.search(id=project_id)

projects = project_client.info(project_id)
project = projects.get_result(0)

# Print the studies in the project
for study in project['studies']:
    print("project:{}\t study:{}".format(project_id, study['id']))

project:family	 study:platinum
project:family	 study:corpasome


Our user (demo) has access to 2 different studies within the *family* project:

Project: *family*
- study: *platinum*
- study: *corpasome*

## 2.2 Browsing samples or individuals

### Exploring Samples

Once we know the studies our user 'demo' has access to, we can explore the samples that a project contains.<br>
To fetch samples you need to use the sample client built in pyopencga.<br>

Remember that it is recomended to use the **[fqn](#Full-Qualified-Name-(fqn)-of-Studies )** when referencing studies, since we cannot be sure if there might be other studies with the same name contained in other projects (E.g: the study *platinium* could e into two different projects: *GRch37_project and GRch38_project*)

In this case, we can see the samples from the project *platinium*.

In [9]:
## Let's print samples from platinum usinf project:study notation
study_id = 'family:platinum'   #fqn of the study
samples = oc.samples.search(study=study_id, count=True, limit = 2) ## other possible params, count=False, id='NA12880,NA12881'
samples.print_results()

#Time: 66
#Num matches: 17
#Num results: 2
#Num inserted: 0
#Num updated: 0
#Num deleted: 0
#id	annotationSets	uuid	qualityControl	release	version	creationDate	modificationDate	description	somatic	phenotypes	individualId	fileIds	status	internal	attributes
NA12877	.	eba106b2-0172-0004-0001-0090f938ae01	{'fileIds': [], 'comments': [], 'alignmentMetrics': [], 'variantMetrics': {'variantStats': [], 'signatures': [], 'vcfFileIds': []}}	1	1	20200625131818	20201117012312		False	.	NA12877	data:platinum-genomes-vcf-NA12877_S1.genome.vcf.gz	{'name': '', 'description': '', 'date': ''}	{'status': {'name': 'READY', 'date': '20200625131818', 'description': ''}}	{}
NA12878	.	eba10c89-0172-0004-0001-8c90462fc396	{'fileIds': [], 'comments': [], 'alignmentMetrics': [], 'variantMetrics': {'variantStats': [], 'signatures': [], 'vcfFileIds': []}}	1	1	20200625131819	20201117015700		False	.	NA12878	data:platinum-genomes-vcf-NA12878_S1.genome.vcf.gz	{'name': '', 'description': '', 'date': ''}	{'status': {'nam

We can see that project *platinium* has **17 samples** (given by #Num matches). The count is returned because we have set the parameter `count=True`.

However, only information about **2 samples** is returned, because we have set the parameter `limit=2`.

### Exploring Individuals

### How many Samples are in a given Individual?

## 2.3 Browsing Genotypes and Phenotypes