# *pyopencga* Catalog: Clinical Data and other Metadata. 

------
**[NOTE]** The server methods used by *pyopencga* client are defined in the following swagger URL:
- http://bioinfodev.hpc.cam.ac.uk/opencga-test/webservices

**[NOTE]** Current implemented methods are registered at the following spreadsheet:
- https://docs.google.com/spreadsheets/d/1QpU9yl3UTneqwRqFX_WAqCiCfZBk5eU-4E3K-WVvuoc/edit?usp=sharing

This notebook is intended to provide guidance for querying an OpenCGA server through *pyopencga* to explore studies which the user has access to, Clinical data provided in the study (Samples, Individuals Genotypes etc.) and other types of metadata, like permissions.

A good first step when start working with OpenCGA is to retrieve information about our user, which projects and studies are we allowed to see. 
Then, Before starting to explore the variants, a good practice is get a taste of the clinical data we are encountering in the study: How many samples and individuals does the study have? Is there any defined cohorts? Can we get some statistics about the genotypes of the samples in the Sudy?

The user-cases addresed here constute a high-level introduction aimed to provide a basis for the user to make their own explorations. This example can be adapted to each individual user-case.

For guidance on how to loggin and get started with *opencga* you can refer to : ---ADD NOTEBOOK-001---
 

## 1. Setup the Client and Login into *pyopencga* 

**Configuration and Credentials** 

Let's assume we already have *pyopencga* installed in our python setup (all the steps described on [001-pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)).

You need to provide **at least** a host server URL in the standard configuration format for OpenCGA as a python dictionary or in a json file.

Regarding credentials, you can set both user and password as two variables in the script.<br>If you prefer not to show the password, it would be asked interactively without echo.

In [1]:
from pyopencga.opencga_config import ClientConfiguration # import configuration module
from pyopencga.opencga_client import OpencgaClient # import client module
from pprint import pprint
import json

# Server host
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

# User credentials
user = "demouser"
passwd = "demouser" ## you can skip this, see below.

# The user demo access projects from user opencga
prj_owner = "demo"

# Creating ClientConfiguration dict
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'

config_dict = {"rest": {
                       "host": host 
                    }
               }
print("Config information:\n",config_dict)

# Pass the config_dict dictionary to the ClientConfiguration method
config = ClientConfiguration(config_dict)

# Create the client
oc = OpencgaClient(config)

# Pass the credentials to the client

# (here we put only the user in order to be asked for the password interactively)
# oc.login(user)

# or you can pass the user and passwd
oc.login(user, passwd)



Config information:
 {'rest': {'host': 'http://bioinfo.hpc.cam.ac.uk/opencga-prod'}}


## 2. Working with our User: see permissions for Projects and studies

### 2.1 General user information

First, we can get the list of available methods for the user client object:

In [12]:
# Listing available methods for the user client object ?????
user_client = oc.users



Depending of the permissions granted, an user can be the owner of a study or just have access to some studies owned by other users.<br>We can retrieve information about our usser and its permissions by:

In [13]:
## getting user information
## [NOTE] User needs the quey_id string directly --> (user)
user_info = user_client.info(user).get_result(0)

print("user info:")
print("name: {}\towned_projects: {}".format(user_info["name"], len(user_info["projects"])))

### It would be great to have creation date and other info ?????


user info:
name: OpenCGA Demo User	owned_projects: 0


We can appreciate that our user (demouser) has **not** projects from its own, but has been granted access to some projects from `demo` user. Let's see how to find it out.

### 2.2 Exploring the projects for our user

We can to list our user's projects using **project client** `search()` function.

In [16]:
## Getting user projects
## [NOTE] Client specific methods have the query_id as a key:value (i.e (user=user_id)) 
project_client = oc.projects
projects_info = project_client.search()

projects_info.print_results(fields='id,name,organism.scientificName,organism.assembly', metadata=False)

#id	name	organism.scientificName	organism.assembly
family	Family Studies GRCh37	Homo sapiens	GRCh37
population	Population Studies GRCh38	Homo sapiens	GRCh38


Our user (demo) has access to 2 different projects:
- Project: **family**
- Project: **population**

### 2.2 Exploring the Studies for our user

All the user permissions are established at a study level in OpenCGA. One project may contain different studies.<br>
It is also very important to understand that in OpenCGA, the projects and studies have a full qualified name (**fqn**) with the format [owner]@[project]:[study]