# pyopencga Basic User Usage

------


**[NOTE]** The server methods used by pyopencga client are defined in the following swagger URL:
- http://bioinfo.hpc.cam.ac.uk/opencga-demo/webservices


For tutorials and more info about accessing the OpenCGA REST please read the documentation at http://docs.opencb.org/display/opencga/Python

## Loading pyOpenCGA

You have two main options

a) From source code: If you need to import from the source code remember that Python3 does not accept relative importing, so you need to append the module path to `sys.path` 

b) Installed pyopencga (recommended): You can import pyopencga directly (skip next section) if you have installed pyopencga with `pip install pyopencga`
 

#### Preparing environment for importing from source

In [21]:
# Initialize PYTHONPATH for pyopencga
import sys
import os
from pprint import pprint

cwd = os.getcwd()
print("current_dir: ...."+cwd[-10:])

base_modules_dir = os.path.dirname(cwd)
print("base_modules_dir: ...."+base_modules_dir[-10:])

sys.path.append(base_modules_dir)


current_dir: ..../notebooks
base_modules_dir: ....ain/python



#### Importing pyopencga Library

This is the recommended way of using *pyopencga* 

In [2]:
from pyopencga.opencga_config import ClientConfiguration
from pyopencga.opencga_client import OpencgaClient
from pprint import pprint
import json


## Setup client and login


**Configuration and Credentials** 

You need to provide a server URL in the standard configuration format for OpenCGA as a dict or in a json file

Regarding credentials, if you don't pass the password, it would be asked interactively without echo.


In [3]:
# server host
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

# user credentials
user = "demouser"
passwd = "demouser" ## you can skip this, see below.

# the user demo access projects from user opencga
prj_owner = "demo"


#### Creating ConfigClient for server connection configuration

In [4]:
# Creating ClientConfiguration dict
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

config_dict = {"rest": {
                       "host": host 
                    }
               }

print("Config information:\n",config_dict)


Config information:
 {'rest': {'host': 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'}}


#### Initialize the client configuration

You can pass a dictionary to the ClientConfiguration 


In [5]:
config = ClientConfiguration(config_dict)
oc = OpencgaClient(config)


#### Make the login

In [6]:
# here we put only the user in order to be asked for the password interactively
# oc.login(user)

In [7]:
# or you can pass the user and passwd
oc.login(user, passwd)

**You are now connected to OpenCGA**

## Working with RestResponse results

All OpenCGA client libraris including pyopencga implement a **RestReponse** wrapper object to make even easier to work with REST web services responses. REST responsess include metadata and OpenCGA 2.0.0 has been designed to work in a federation mode, all these can make a bit difficult to start working with the responses. Please read this brief documentation http://docs.opencb.org/display/opencga/RESTful+Web+Services#RESTfulWebServices-OpenCGA2.x

Let's see a quick example of how to use RESTResponse wrapper in pyopencga. You can get some extra inforamtion here http://docs.opencb.org/display/opencga/Python#Python-WorkingwiththeRestResponse. Let's execute a first simple query to fetch all projects for the user **demouser** (already logged in):

In [8]:
## Let's fecth the available projects.
## First let's get the project client and execute search() funciton
project_client = oc.projects
projects = project_client.search()

## Uncomment this line to view the JSON response.
## NOTE: it incudes study information so this can be big
##pprint(projects.get_responses())

Although you can iterate through all diferent projects by executing this, this is **not recommended** way.

In [9]:
## Loop through all diferent projects 
for project in projects.responses[0]['results']:
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


### Using RestResponse object makes things much easier!

You can use `get_results` function to iterate over all results, in this case the projects:

In [10]:
## Loop through all diferent projects 
for project in projects.get_results():
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


You can also iterate results, this is specially interesting when fetching many results from the server:

In [11]:
## Iterate through all diferent projects 
for project in projects.result_iterator():
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


**IMPORTANT**: RestResponse obejct implements a very powerful custom function to print results :-)

In [12]:
## This function iterates over all the results, it can be configured to exclude metadata, change separator or even select the fields!
projects.print_results()

#Time: 80
#Num matches: -1
#Num results: 2
#Num inserted: 0
#Num updated: 0
#Num deleted: 0
#id	name	uuid	fqn	creationDate	modificationDate	description	organism	currentRelease	studies	internal	attributes
family	Family Studies GRCh37	eba0e1c7-0172-0001-0001-c7af712652b2	demo@family	20200625131808	20200625131808		{'scientificName': 'Homo sapiens', 'commonName': '', 'assembly': 'GRCh37'}	1	.	{'datastores': {}, 'status': {'name': 'READY', 'date': '20200625131808', 'description': ''}}	{}
population	Population Studies GRCh38	25f2842a-0173-0001-0001-e7bcbedc77ff	demo@population	20200706210517	20200706210517	Some population reference studies for GRCh38	{'scientificName': 'Homo sapiens', 'commonName': '', 'assembly': 'GRCh38'}	1	.	{'datastores': {}, 'status': {'name': 'READY', 'date': '20200706210517', 'description': ''}}	{}


With `print_results` you can even print nested fields in an array:

In [14]:
## Lets exclude metadata and print only few fields, use dot notation for ensted fields
projects.print_results(fields='id,name,organism.scientificName,organism.assembly',metadata=False)

## You can change separator
print()
print('With a different separator:\n')
projects.print_results(fields='id,name,organism.scientificName,organism.assembly', separator=',', metadata=False)


#id	name	organism.scientificName	organism.assembly
family	Family Studies GRCh37	Homo sapiens	GRCh37
population	Population Studies GRCh38	Homo sapiens	GRCh38

With a different separator:

#id,name,organism.scientificName,organism.assembly
family,Family Studies GRCh37,Homo sapiens,GRCh37
population,Population Studies GRCh38,Homo sapiens,GRCh38


## Working with Users

In [15]:
# Listing available methods for the user client object
user_client = oc.users


In [16]:
## getting user information
## [NOTE] User needs the quey_id string directly --> (user)
user_info = user_client.info(user).get_result(0)

print("user info:")
print("name: {}\towned_projects: {}".format(user_info["name"], len(user_info["projects"])))


user info:
name: OpenCGA Demo User	owned_projects: 0


The demouser has **not** projects from its own, but has been granted access to some projects from `demo` user. Let's see how to find it out.

We need to list user's projects using **project client** `search()` function.

In [17]:
## Getting user projects
## [NOTE] Client specific methods have the query_id as a key:value (i.e (user=user_id)) 
project_client = oc.projects
projects_info = project_client.search()

projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly', metadata=False)


#id	name	organism.scientificName	organism.assembly
family	Family Studies GRCh37	Homo sapiens	GRCh37
population	Population Studies GRCh38	Homo sapiens	GRCh38


**User demo has access to one project called demo@family**

## Working with Projects

As seen above you can fetch projects and studies for the logged user executing this:

NOTE: in opencga the projects and studies have a `full qualified name (fqn)` with the format [owner]@[project]:[study] 

In [18]:
## Getting all projects from logged in user
project_client = oc.projects
projects = project_client.search()

for project in projects.get_results():
    print("Name: {}\tFQN: {}".format(project["name"], project["fqn"]))

Name: Family Studies GRCh37	FQN: demo@family
Name: Population Studies GRCh38	FQN: demo@population


All OpenCGA REST web services accept many parameters to filter results: 

In [19]:
## Getting information from a specific project
project_id = 'family'

## you can also use the following command:
## projects = project_client.search(id=project_id)

projects = project_client.info(project_id)
project = projects.get_result(0)

# Print the studies in the project
for study in project['studies']:
    print("project:{}\t study:{}".format(project_id, study['id']))

project:family	 study:platinum
project:family	 study:corpasome


Fetch studies for a given project:

In [20]:
## Fetching the studies from a project using the studies method
# studies = project_client.studies(project_id)
# for study in studies.get_results():
#     pprint(study)

## Working with Samples

To fetch samples you need to use the sample client built in pyopencga:

In [21]:
## Let's print samples from platinum usinf project:study notation
study_id = 'family:platinum'
samples = oc.samples.search(study=study_id, count=True, limit = 2) ## other params@ , count=True, id='NA12880,NA12881'
samples.print_results()

#Time: 73
#Num matches: 17
#Num results: 2
#Num inserted: 0
#Num updated: 0
#Num deleted: 0
#id	annotationSets	uuid	release	version	creationDate	modificationDate	description	somatic	phenotypes	individualId	fileIds	status	internal	attributes
NA12877	.	eba106b2-0172-0004-0001-0090f938ae01	1	1	20200625131818	20201002112834		False	.	NA12877	data:platinum-genomes-vcf-NA12877_S1.genome.vcf.gz	{'name': '', 'description': '', 'date': ''}	{'status': {'name': 'READY', 'date': '20200625131818', 'description': ''}}	{}
NA12878	.	eba10c89-0172-0004-0001-8c90462fc396	1	1	20200625131819	20201002113649		False	.	NA12878	data:platinum-genomes-vcf-NA12878_S1.genome.vcf.gz	{'name': '', 'description': '', 'date': ''}	{'status': {'name': 'READY', 'date': '20200625131819', 'description': ''}}	{}
