# Overview
------
This notebook is intended to provide guidance on performing advanced common administrator actios required for maintaining a OpenCGA installation, from granting permission to other users to updating the metadata for samples or individuals in the database.

It is highly recommended to check the documentation about [Sharing and Permissions](https://app.gitbook.com/@opencb/s/opencga/~/drafts/-Magktiifb08PPiPnoYk/manual/data-management/sharing-and-permissions) in OpenCGA before trying to run this notebook.

**[NOTES]** 
- For guidance on how to loggin and get started with *opencga* you can refer to : [pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)<br>

- A good first step when start working with OpenCGA is to explore **Catalog** the OpenCGA component which holds information about our user, the projects and studies our user has permission to access and the clinical data from the studies. For guidance you can refer to : [pyopencga_catalog.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)<br>

- The server methods used by *pyopencga* client are defined in the following swagger URL: https://ws.opencb.org/opencga-prod/webservices/


## Table of Contents:

This Notebook is organised in the following sections:

* __Grant/edit permissions to groups or individual users__
* __Asign samples to individuals__
* __Define phenotypes or diseases for individuals__
* __Define, add, edit and remove variable sets__
* __Use Cases__

## Setup the Client and Login into *pyopencga* 

**Configuration and Credentials** 

Let's assume we already have *pyopencga* installed in our python setup (all the steps described on [pyopencga_first_steps.ipynb](https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python/notebooks/user-training)).

You need to provide **at least** a host server URL in the standard configuration format for OpenCGA as a python dictionary or in a json file.


In [20]:
## Step 1. Import pyopencga dependecies
from pyopencga.opencga_config import ClientConfiguration # import configuration module
from pyopencga.opencga_client import OpencgaClient # import client module
from pprint import pprint
from IPython.display import JSON
# import json
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

## Step 2. OpenCGA host
host = 'https://ws.opencb.org/opencga-prod'

## Step 3. Create the ClientConfiguration dict
config_dict = {'rest': {
                       'host': host 
                    }
               }

## Step 4. Create the ClientConfiguration and OpenCGA client
config = ClientConfiguration(config_dict)
oc = OpencgaClient(config)


For the purpose of the training, we will work with an user that belongs to the @admins group. This means the user has admin priviledges. 

__[NOTE]__ Working with an admin user is required for follow up with the queries used in this notebook.

In [21]:
## Step 5. Define admin user credentials
admin_user = 'demo-admin'
password = 'quBocgIvQ2r83SG'

In [22]:
## Step 6. Login to OpenCGA using the OpenCGA client 
# Pass the credentials to the client
# (here we put only the user in order to be asked for the password interactively)
#oc.login(admin_user)

# or you can pass the user and passwd
oc.login(admin_user, password)

print('Logged succesfuly to {}, your token is: {} well done!'.format(host, oc.token))

Logged succesfuly to https://ws.opencb.org/opencga-prod, your token is: eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJkZW1vLWFkbWluIiwiYXVkIjoiT3BlbkNHQSB1c2VycyIsImlhdCI6MTYyMjExOTAwNywiZXhwIjoxNjIyMTIyNjA3fQ.pc5XF6cAR75zq2945rGQfYHRGBwfuL5tL34YjRYDELI well done!


## Setup OpenCGA Variables

Once we have defined a variable with the client configuration and credentials, we can access to all the methods defined for the client. These methods implement calls to query different data models in *OpenCGA*. 

- For this training we have created a test project and study. As an administrator, we assume that you're familiar with the concept of the `fqn`. If not, please check the documentation [here]().


In [7]:
## Define the study we will work with
study = 'demo@training:admin'

## Define the user ids from some NOT admin demo users 
user1 = 'trainee1'
user2 = 'trainee2'


# Permission Management
------

In OpenCGA all the permissions are establish at the study level. 

All the studys come intrinsically with two administrative groups: `@admins` and `@members`.
All the users added to a new study belong to the `@members` group. This is useful for keeping track of the users that have access to that specific study. However, `@members` doesn't have any permission defined by default.

Users belonging to the group `@admins`, on the other hand, have administrator priviledges. The admins can add users to an study and grant them permissions. However, an admin can't provide admin priviledges to other users (only the user `owner` of the study is able to do so).

# 1. Add/remove users from a Study

## Add users

First, we're going to add the users `trainee1` and `trainee2` to the study. Internally, this means that we are adding those users to the `@members` group of the study.

In [14]:
## Add users to study
oc.studies.update_users(study=study, group='members', data={"users": ["trainee1","trainee2"]})

<pyopencga.rest_response.RestResponse at 0x7f996ef562e0>

## Remove users

Inversely, we can remove the users `trainee1` and `trainee2` from the study. Internally, this means that we are removing those users from the `@members` group of the study.

In [18]:
## Delete users from study
oc.studies.update_users(study=study, group='members', action='REMOVE', data={"users": ["trainee1","trainee2"]})

<pyopencga.rest_response.RestResponse at 0x7f996f10bc70>

## Check users in the Study

In any case, we can always check the current status of the `@members` group i.e check which users has access to the study, as well that of any other group defined for the study.

### Hands-on exercise: 
Run the next cell after adding users `trainee1` and `trainee2`. Then run remove them from the group `@members` and run it again. 

Do you notice any difference in the output?

In [27]:
## Check Study groups
groups = oc.studies.groups(study=study)
groups.print_results(fields='', metadata=False, title='Groups in Study {}'.format(study))


Groups in Study demo@training:admin
----------------------------------------
#id	userIds
@members	pfurio,wspooner,demo-admin,demo,jcoll,llopez,imedina
@admins	pfurio,wspooner,demo-admin,jcoll,llopez,imedina


# Asign samples to individuals
-------


OpenCGA Catalog allows you to create samples entities. Then, you might need to create an individual entity associated with the sample.

**[Important Note:]** When VCFs are ingested into OpenCGA, the pipeline automatically recognises the header of the VCF. If the name of the sample is present, it automatically creates the samples contained in the VCFs.

## 1. Create samples

First, let's create 3 samples: `sample1`, `sample2.1` and `sample2.2`. The most straight-forward way to create samples is using a loop, as showed in the next cell:

In [39]:
## Create 3 samples
sample_ids = ['sample1', 'sample2.1', 'sample2.2']

for sample in sample_ids:
    sample_data = {
      "id": sample,
      "description": "germinal sample",
      "processing": {
        "preparationMethod": "Illumina",
        "extractionMethod": "Parafin Embedded"
      },
      "collection": {
        "tissue": "skin",
        "organ": "skin",
        "method": "biopsy"
      },
      "somatic":'false'
    }

    oc.samples.create(study=study, data=sample_data)
    sample = oc.samples.search(study=study, id=sample)
    sample.print_results(fields='id,collection,description,somatic', metadata=False, title='Info about sample {}'.format(sample))
    print('\n')
                         

Info about sample <pyopencga.rest_response.RestResponse object at 0x7f996f0b30d0>
--------------------------------------------------------------------------------------
#id	collection	description	somatic
sample1	{'tissue': 'skin', 'organ': 'skin', 'method': 'biopsy'}	germinal sample	False


Info about sample <pyopencga.rest_response.RestResponse object at 0x7f996ef32df0>
--------------------------------------------------------------------------------------
#id	collection	description	somatic
sample2.1	{'tissue': 'skin', 'organ': 'skin', 'method': 'biopsy'}	germinal sample	False


Info about sample <pyopencga.rest_response.RestResponse object at 0x7f996f0a9040>
--------------------------------------------------------------------------------------
#id	collection	description	somatic
sample2.2	{'tissue': 'skin', 'organ': 'skin', 'method': 'biopsy'}	germinal sample	False




In [38]:
## Uncomment this to delete the samples
oc.samples.delete(study=study, samples='sample1,sample2.1,sample2.2')

<pyopencga.rest_response.RestResponse at 0x7f996f080ee0>

## 2. Create individuals associated with the samples

Remember that OpenCGA `individual` data model allows to have multiple samples assigned to the same individual. A typical use case can be found in cancer genetic screen, where usually two samples are taken from the same individual: one from the tumour (somatic sample), and one germinal.

Bearing that in mind, we can create some individual entities in the database. In this case, let's assume that `sample1` corresponds to `individual1`, whilst `sample2.1` and `sample2.2` correspond to `individual2`.

In [None]:
## Create Individuals


# Define phenotypes or diseases for individuals
------

# Define, add, edit and remove variable sets
----------

# Use Cases
------