# pyopencga Basic User Usage

------


**[NOTE]** The server methods used by pyopencga client are defined in the following swagger URL:
- http://bioinfo.hpc.cam.ac.uk/opencga-demo/webservices


For tutorials and more info about accessing the OpenCGA REST please read the documentation at http://docs.opencb.org/display/opencga/Python

## Loading pyOpenCGA

You have two main options

a) From source code: If you need to import from the source code remember that Python3 does not accept relative importing, so you need to append the module path to `sys.path` 

b) Installed pyopencga (recommended): You can import pyopencga directly (skip next section) if you have installed pyopencga with `pip install pyopencga`
 

#### Preparing environment for importing from source

In [21]:
# Initialize PYTHONPATH for pyopencga
import sys
import os
from pprint import pprint

cwd = os.getcwd()
print("current_dir: ...."+cwd[-10:])

base_modules_dir = os.path.dirname(cwd)
print("base_modules_dir: ...."+base_modules_dir[-10:])

sys.path.append(base_modules_dir)


current_dir: ..../notebooks
base_modules_dir: ....ain/python



#### Importing pyopencga Library

This is the recommended way of using *pyopencga* 

In [46]:
from pyopencga.opencga_config import ClientConfiguration
from pyopencga.opencga_client import OpencgaClient
from pprint import pprint
import json



## Creating some useful functions to manage the results

In [28]:
def get_not_private_methods(client):
    all_methods = dir(client)
    
    #showing all methos (exept the ones starting with "_", as they are private for the API)
    methods = [method for method in all_methods if not method.startswith("_")]
    return methods

## Setup client and login




**Configuration and Credentials** 

You need to provide a server URL in the standard configuration format for OpenCGA as a dict or in a json file

Regarding credentials, if you don't pass the password, it would be asked interactively without echo.


In [23]:
# server host
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

# user credentials
user = "demouser"
passwd = "demouser" ## you can skip this, see below.

# the user demo access projects from user opencga
prj_owner = "demo"


#### Creating ConfigClient for server connection configuration

In [24]:
# Creating ClientConfiguration dict
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

config_dict = {"rest": {
                       "host": host 
                    }
               }

print("Config information:\n",config_dict)


Config information:
 {'rest': {'host': 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'}}


#### Initialize the client configuration

You can pass a dictionary to the ClientConfiguration 


In [25]:
config = ClientConfiguration(config_dict)
oc = OpencgaClient(config)


#### Make the login

In [11]:
# here we put only the user in order to be asked for the password interactively
# oc.login(user)

In [59]:
# or you can pass the user and passwd
oc.login(user, passwd)

**You are now connected to OpenCGA**

## Working with RestResponse results

All OpenCGA client libraris including pyopencga implement a **RestReponse** wrapper object to make even easier to work with REST web services responses. REST responsess include metadata and OpenCGA 2.0.0 has been designed to work in a federation mode, all these can make a bit difficult to start working with the responses. Please read this brief documentation http://docs.opencb.org/display/opencga/RESTful+Web+Services#RESTfulWebServices-OpenCGA2.x

Let's see a quick example of how to use RESTResponse wrapper in pyopencga. You can get some extra inforamtion here http://docs.opencb.org/display/opencga/Python#Python-WorkingwiththeRestResponse. Let's execute a first simple query to fetch all projects for the user **demouser** (already logged in):

In [70]:
## Let's fecth the available projects.
## First let's get the project client and execute search() funciton
project_client = oc.projects
projects = project_client.search()

## Uncomment this line to view the JSON response.
## NOTE: it incudes study information so this can be big
##pprint(projects.get_responses())

Although you can iterate through all diferent projects by executing this, this is **not recommended** way.

In [67]:
## Loop through all diferent projects 
for project in projects.get_responses()[0]['results']:
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


### Using RestResponse object makes things much easier!

You can use `get_results` function to iterate over all results, in this case the projects:

In [73]:
## Loop through all diferent projects 
for project in projects.get_results():
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


You can also iterate results, this is specially interesting when fetching many results from the server:

In [74]:
## Iterate through all diferent projects 
for project in projects.result_iterator():
   print(project['id'], project['name'])

family Family Studies GRCh37
population Population Studies GRCh38


**IMPORTANT**: RestResponse obejct implements a very powerful custom function to print results :-)

In [79]:
## This function iterates over all the results, it can be configured to exclude metadata, change separator or even select the fields!
projects.print_results()

#Time: 61
#Num matches: -1
#Num results: 2
#Num inserted: 0
#Num updated: 0
#Num deleted: 0
#id	name	uuid	fqn	creationDate	modificationDate	description	organism	currentRelease	studies	internal	attributes
family	Family Studies GRCh37	eba0e1c7-0172-0001-0001-c7af712652b2	demo@family	20200625131808	20200625131808		{'scientificName': 'Homo sapiens', 'commonName': '', 'assembly': 'GRCh37'}	1	.	{'datastores': {}, 'status': {'name': 'READY', 'date': '20200625131808', 'description': ''}}	{}
population	Population Studies GRCh38	25f2842a-0173-0001-0001-e7bcbedc77ff	demo@population	20200706210517	20200706210517	Some population reference studies for GRCh38	{'scientificName': 'Homo sapiens', 'commonName': '', 'assembly': 'GRCh38'}	1	.	{'datastores': {}, 'status': {'name': 'READY', 'date': '20200706210517', 'description': ''}}	{}


With `print_results` you can even print nested fields in an array:

In [88]:
## Lets exclude metadata and print only few fields, use dot notation for ensted fields
projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly',metadata=False)

## You can change separator
print()
print('With a different separator:\n')
projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly', separator=',', metadata=False)


#id	name	organism.scientificName	organism.assembly
family	Family Studies GRCh37	Homo sapiens	GRCh37
population	Population Studies GRCh38	Homo sapiens	GRCh38

With a different separator:

#id,name,organism.scientificName,organism.assembly
family,Family Studies GRCh37,Homo sapiens,GRCh37
population,Population Studies GRCh38,Homo sapiens,GRCh38


## Working with Users

In [13]:
# Listing available methods for the user client object
user_client = oc.users

# showing all methods (except the ones starting with "_", as they are private for the API)
# get_not_private_methods(user_client)

In [27]:
## getting user information
## [NOTE] User needs the quey_id string directly --> (user)
uc_info = user_client.info(user).responses[0]['results'][0]

print("user info:")
print("name: {}\towned_projects: {}".format(uc_info["name"], len(uc_info["projects"])))


user info:
name: OpenCGA Demo User	owned_projects: 0


The demo user has **not** projects from its own, but has access to some projects from `opencga` user.

Let's see how to find it out.

We need to list user's projects using **project client** `search()` function.

And remember that OpenCGA REST objects encapsulate the result inside the responses property, so we need to access the first element of the responses array. 


In [38]:
## Getting user projects
## [NOTE] Client specific methods have the query_id as a key:value (i.e (user=user_id)) 
project_client = oc.projects
projects_info = project_client.search().responses[0]["results"]

for project in projects_info:
    print("Name: {}\tfull_id: {}".format(project["name"], project["fqn"]))


projects_info = project_client.search()
for res in projects_info.get_results():
    print(res['id'])

projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly', separator=',', metadata=False)


Name: Family Studies GRCh37	full_id: demo@family
Name: Population Studies GRCh38	full_id: demo@population
family
population
#id,name,organism.scientificName,organism.assembly
family,Family Studies GRCh37,Homo sapiens,GRCh37
population,Population Studies GRCh38,Homo sapiens,GRCh38


**User demo has access to one project called opencga@exomes_grch37**

note: in opencga the projects and studies have a `full qualify name, fqn` with the format [owner]@[porject]:[study] 

## Working with Projects

In [32]:
project_client = oc.projects

get_not_private_methods(project_client)

['aggregation_stats',
 'auto_refresh',
 'create',
 'delete',
 'increment_release',
 'info',
 'login_handler',
 'on_retry',
 'search',
 'session_id',
 'studies',
 'token',
 'update']

In [38]:
## Getting all projects from logged in user
project_client = oc.projects
projects_list = project_client.search().responses[0]["results"]

for project in projects_list:
    print("Name: {}\tfull_id: {}".format(project["name"], project["fqn"]))

Name: Exomes GRCh37	full_id: opencga@exomes_grch37


In [56]:
## Getting information from a specific project
project_name = 'exomes_grch37'
project_info = project_client.info(project_name).responses[0]['results'][0]

#show the studies
for study in project_info['studies']:
    print("project:{}\nstudy:{}\ttype:{}".format(project_name, study['name'], study['type'] ))
    print('--')

project:exomes_grch37
study:Corpasome	type:CASE_CONTROL
--
project:exomes_grch37
study:CEPH Trio	type:CASE_CONTROL
--


In [58]:
## Fetching the studies from a project using the studies method
results = project_client.studies(project_name).responses[0]['results']
for result in results:
    pprint(result)

{'attributes': {},
 'cipher': 'none',
 'cohorts': [],
 'creationDate': '20190604154741',
 'dataStores': {},
 'datasets': [],
 'description': '',
 'experiments': [],
 'files': [],
 'fqn': 'opencga@exomes_grch37:corpasome',
 'groups': [{'id': '@members',
             'name': '@members',
             'userIds': ['opencga', 'demo']},
            {'id': '@admins', 'name': '@admins', 'userIds': []}],
 'id': 'corpasome',
 'individuals': [],
 'jobs': [],
 'lastModified': '20190604154741',
 'modificationDate': '20190604154741',
 'name': 'Corpasome',
 'panels': [],
 'permissionRules': {},
 'release': 1,
 'samples': [],
 'size': 0,
 'stats': {},
 'status': {'date': '20190604154741', 'message': '', 'name': 'READY'},
 'type': 'CASE_CONTROL',
 'uri': 'file:///mnt/data/opencga-demo/sessions/users/opencga/projects/1/2/',
 'uuid': 'Iyy1cwFrAAIAAViPXu86gw',
 'variableSets': []}
{'attributes': {},
 'cipher': 'none',
 'cohorts': [],
 'creationDate': '20190617155526',
 'dataStores': {},
 'datasets': [],
 '