**Title**: Uploading and Editing Metadata with Flywheel SDK

**Date**:  July 23rd 2020

**Description**:  

Topics that we will be covering in this webinars:
1. Creating the Project to host our data.
2. Creating the hierarchy of Subject/Session/Acquisition matching our data input.
3. Uploading the DICOM archive to each Acquisition.
4. Showing how to update metadata of a container.


<div class="alert alert-block alert-danger"><b>DISCLAIMER:</b> We assumed that you have Flywheel 12+ Version.</div>

***

# Setup Before Webinar 

1. Requirements 
2. Download some test data

## 1. Requirements

Before getting started, we want to make sure that you have the right permission to create a new project on your instance. Below, we will be calling the `check_user_permission` function to validate whether you meet the `min_reqs` that is defined below. 

## Install and import dependencies

In [None]:
# Install specific packages required for this notebook
!pip install flywheel-sdk 

In [None]:
import flywheel
from permission import check_user_permission # To check user permission on Flywheel Instance

## Flywheel API Key and Client

To find your API key, first click on the circle in the top right corner of Flywheel, then click on `Profile` as shown below: 

![profile_location](https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/finding_things_in_fw/assets/profile_location.png)

On your profile page, your API key will be located under the `Your API Key` as below:

![api_key_location](https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/finding_things_in_fw/assets/api_key_location.png)

Copy this key, run the cell below, paste the key into the `Enter API_KEY here:` box and press return. Note: you must include the full cell in `Your API Key`. For example, `ss.ce.flywheel.io:123456ABCDEF789zyxw`.

<div class="alert alert-block alert-info" style="color:black"><b>TIP: </b> While you can initialize your client as <code>fw</code> with <code>fw = flywheel.Client('your-api-key')</code>, it is essential to obscure credentials from your code, especially when sharing with others/commiting to a shared repository.</div>

<div class="alert alert-block alert-danger"><b>WARNING:</b> Do NOT share your API key with anyone for any reason - it is the same as sharing your password and constitutes a HIPAA violation.</div>


In [None]:
# Password prompt (good security practice)
API_KEY = getpass('Enter API_KEY here: ')

# Initialize the client
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

# Clean up the API_KEY
del API_KEY

## Check Permission

In [None]:
# Minimum requirements that you will need to create a new project
min_reqs = {
"site": "user",
"group": "admin"
}

Next step, please provide a Group ID that you are planning to use for this tutorial. Feel free to refer to the snippet below on how to find Group ID on the Flywheel Instance. 

![find-group-id](https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/find-group-id.png)


In [None]:
CHECK_GROUP_ID = input('Please enter the Group ID that you will be using to create the new project: ')

If you have the right permission on the Site and Group container, the function below will `True` otherwise, a message with a list of compatible Group(s) which you have the right permission for will be displayed.


Here is an example of how the message will look like.

![compatible-list](https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/expected-output-compatible-list.png)


In [None]:
check_user_permission(fw, min_reqs, group = CHECK_GROUP_ID)

## 2. Download some test data

First, we will be uploading images to a Flywheel Instance.  
To get started, your first need to download the test dataset that will be used throughout this webinars.

On mybinder.org or any Mac/Linux system, the following commands will download a zip archive and unzip the data into a folder called `data-upload-notebook` in your current directory:

In [None]:
!curl -L -o data.zip "https://drive.google.com/a/umn.edu/uc?export=download&id=1UhGymg0UgoKdigGEmHbN3EcWbG5KGQ1Q"
!unzip -qf data.zip -d data-upload-notebook

If the previous commands return an errors, download the file directly using the link provided to the `curl` command
above and extract the archive in the current working directory to a folder named `data-upload-notebook`

The file tree of `data-upload-notebook` should like this:
```
data-upload-notebook
├── anx_s1
│   └── anx_s1_anx_ses1_protA
│       └── T1_high-res_inplane_Ret_knk_0
│           └── 6879_3_1_t1.dcm.zip
├── anx_s2
│   └── anx_s2_anx_ses1_protA
│       └── T1\ high-res\ inplane\ FSPGR\ BRAVO_0
│           └── 4784_3_1_t1.dcm.zip
├── anx_s3
│   └── anx_s3_anx_ses1_protA
│       ├── T1_high-res_inplane_Ret_knk_0
│       │   └── 6879_3_1_t1.dcm.zip
│       └── fMRI\ Loc\ Word\ Face\ Obj
│           └── 4784_5_1_fmri.dcm.zip
├── anx_s4
│   └── anx_s4_anx_ses2_protB
│       └── T1_high-res_inplane_Ret_knk_1
│           └── 8403_4_1_t1.dcm.zip
├── anx_s5
│   └── anx_s5_anx_ses1_protA
│       └── T1_high-res_inplane_Ret_knk_1
│           └── 8403_4_1_t1.dcm.zip
└── participants.csv

```

# SETUP COMPLETE!

***

<div class="alert alert-block alert-warning" style="color:black"><b>NOTES:</b> Run the <code>Install and Import Dependencies</code> and <code>Flywheel API Key and Client</code> section in the beginning of the webinar.</div>

# Install and Import Dependencies

In [None]:
# Install specific packages required for this notebook
!pip install flywheel-sdk pandas

In [None]:
# Import packages
from getpass import getpass # Handle sensitive information securely eg: API key or password
import logging # To write status message 
import os # To interact with the Operating System
from pathlib import Path # Simpler method to interact with files on your local machine
import re # Regex
import time # To deal with time access
import pandas as pd
import pprint

import flywheel
from permission import check_user_permission # To check user permission on Flywheel Instance

In [None]:
# Instantiate a logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')

# Flywheel API Key and Client

Get your API_KEY. More on this at in the Flywheel SDK doc [here](https://flywheel-io.gitlab.io/product/backend/sdk/branches/master/python/getting_started.html#api-key).

In [None]:
API_KEY = getpass('Enter API_KEY here: ')

Instantiate the Flywheel API client

In [None]:
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

Show Flywheel logging information

In [None]:
log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

***

# *The webinar will begin from here*

# Initialize a few values

Now, we will be uploading data to a Project. The label of the Project will be defined by the `PROJECT_LABEL` variable defined below. 
Here we set it up to be `AnxietyStudy01` but feel free to change it to something that makes more sense to you. 

In [None]:
PROJECT_LABEL = 'AnxietyStudy01'

In Flywheel each project belongs to a Group. The label of the Group that will be used to create the Project is defined by the `GROUP_ID` variable below.


Specify the Group you have admin permission on and where the Project will be created:

In [None]:
GROUP_ID = input("Enter the Group ID here: ")

We also define a varibale that pointed to the root directory where the data got downloaded.<br>If you have followed the steps above to download your data, you should have all the data in a folder called `data-upload-notebook`. If that's not the case, edit the below variable accordingly.

In [None]:
# Define Path to your data with Path()
PATH_TO_DATA = Path('data-upload-notebook')

# Add a New Project

In this section, we will be creating a new project with label `PROJECT_LABEL` in the Group's `GROUP_ID`.


First, we will be getting the Group container using the `fw.lookup()` method. 

In [None]:
my_group = fw.lookup(GROUP_ID)

Before creating a new project, it is a good practice to check if the Project you are trying to create exists in the Flywheel instance or not. We can do this by checking if a Project with label=PROJECT_LABEL exists in the Group you have specified:

In [None]:
project = my_group.projects.find_first(f'label={PROJECT_LABEL}')

In [None]:
while project:
    log.info(f'Project {GROUP_ID}/{PROJECT_LABEL} already exists. Please update your PROJECT_LABEL variable.')
    PROJECT_LABEL = input('Please enter a new label for your new project: ')
    project = my_group.projects.find_first(f'label={PROJECT_LABEL}')

    
log.info(f'Project {GROUP_ID}/{PROJECT_LABEL} does not exist. Looking all good.') 


If the Project does not exist, it will return False and we can create it. 

In [None]:
if not project:
    project = my_group.add_project(label=PROJECT_LABEL)
    log.info(f'Project {PROJECT_LABEL} has successfully added to the group {GROUP_ID}.')

***

# Create Subjects, Sessions and Acquisitions and upload files

Now that we have a Project, we can create all the containers that are required to host our dataset.

## What's the plan?

Following the Flywheel Hierarchy, we will loop through each subject folders and create the subject containers. We will be doing the same step to create the Session and Acquisition containers. Once we get down to the Acquisition container, we will upload the corresponding DICOM archive to it

## Processing

In this notebook we will parse the Subject, Session and Acquisition labels from the folders and subfolder path directly. 

If we wanted to do more, we could use regular expression (aka REGEX) on the path. 

Example on how to use regular expression in Python:

```python
# To match a few date strings
regex = r"[a-zA-Z]+ \d+"
matches = re.findall(regex, "June 24, August 9, Dec 12")
for match in matches:
    # This will print:
    #   June 24
    #   August 9
    #   Dec 12
    print("Full match: %s" % (match))
```


<div class="alert alert-block alert-info" style="color: black"><b>Tip:</b> Use <a href="https://regex101.com/" style="color:white">Regex101</a>, an online regex tester and debugger, to write and test on example inputs before putting it in your code .</div>

<div class="alert alert-block alert-info" style="color: black"><b>Tip: </b><code>Path.glob(pattern)</code> is a Python built in module that returns any matching files with the given pattern parameters. For more information, you can visit the documentations <a href="https://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob" style="color:white">here</a> to learn more.</div>

We are now ready to walk our folders, create the containers accordingly and upload the DICOM zip archive to the Acquisition container.

In [None]:
log.info('Starting upload...')

# Get Subjects that have label starts with `anx`
for subj in PATH_TO_DATA.glob('anx*'):
    log.info('Processing subject %s', str(subj))
    subject = project.add_subject(label=subj.name)
    
    # Get Sessions folder that starts with `anx`
    for ses in subj.glob('anx*'):
        log.info('Processing session %s', str(ses))
        session = subject.add_session(label=ses.name)
        
        # Get Acquisition folder that starts with `T1`
        for acq in ses.glob('T1*'):            
            log.info('Processing acquisition %s', str(acq))            
            acquisition = session.add_acquisition(label=acq.name)
            # Upload file into each Acquisition container
            for file in acq.glob('*.dcm.zip'):
                acquisition.upload_file(file)

log.info('DONE')

Once the upload is done, you should have all your data available in your Flywheel Project, which should look like this:  

<img src="https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/python/assets/anxiety_project_session_view.png" align="center"/>

# Update Subject Metadata

## Overview

For sake of example, let's demonstarate how we can update the metadata for Subject `anx_s1`.

Let's first see how the subject container looks like on the Flywheel Instance. 

<img src="https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/subject-container-ui.png" align="center"/>

## Getting Started 

As you can see from the snippet above, we have 6 basic metadata for Human subject: `First Name`, `Last Name`, `Sex`, `Cohort`, `Race` and `Ethinicity`.
This list will vary depending on which subject type is selected.

Now, we will find that specific Subject by calling `flywheel.finder.find_first()`.

In [None]:
anx_s1 = project.subjects.find_first('label=anx_s1').reload()

<div class="alert alert-block alert-info" style="color: black" >
    <b>Tip:</b> Using <code>reload()</code> is <b>nessecary</b> to load the entire container.
</div>

We are going to update the firstname, lastname and the sex of this Subject in this tutorial. Let's check what we have currently:

In [None]:
print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')

We can update it with the `update` method of the container:

In [None]:
anx_s1.update(
            firstname='John',
            lastname='Doe',
            sex='male',
)    

Let's reload the subject from the database to make sure the update went through:

In [None]:
anx_s1 = project.subjects.find_first('label=anx_s1').reload()
print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')

## Custom Information

### Overview

Each container also contains a field called `info` (aka `Custom Information` on the Flywheel Instance) which can be used to stored unstructured information in a dictionary.

There is 5 different data types that you can use for the `info` section. 

Here is how it looks like on the Flywheel Instance:

<img src="https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/custom-info-ui.png" align="center"/>

How they look like on Python:

<img src="https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/example-data-type-python.png" align="center"/>


### To demostrate, we will be using the `object` data type. 

In [None]:
complicated_nested_dict = {'a_complicated_nested_dict': {'key1': [1, 2, 3, 4], 
                                                        'key2': [{'an': 'other', 'list': 'with'}, 
                                                                {'dictionaries': ['in', 'it']}]
                                                        }
                            }

In [None]:
anx_s1.update_info(complicated_nested_dict)

In [None]:
anx_s1 = project.subjects.find_first('label=anx_s1').reload()
pprint.pprint(f'Info field: {anx_s1.info}')

<b>What to expect on the Flywheel Instance:</b>

<img src="https://gitlab.com/flywheel-io/public/flywheel-tutorials/-/raw/master/webinars/upload_data_modify_metadata_w_fw/assets/complicated-nested-dict-ui.png" align="center"/>

# Update all Subject Metadata with a CSV file

Updating Subject Metadata/Info can be made by parsing CSV file or TSV file. By using this method, you can modify metadata for each Subject all at once. 

In this example, you will need to access the `participants.csv` file which can be found in the .zip folder you downloaded earlier. 

## First, you will need read the csv file with `pandas` (which imported as `pd`).

<div class="alert alert-block alert-info" style="color: black"><b>INFO: </b><code>pandas</code> is a Python library that takes data (like a CSV or TSC file, or a SQL database) and creates a Pyton object with rowls and columns called data frame that look very similar to table in Excel or SPSS and R. It is useful for data manipulation and data analysis.</div>

In [None]:
metadata = pd.read_csv(PATH_TO_DATA/'participants.csv')

In [None]:
# View the data in the csv file 
display(metadata)

We are going to loop through each Subjects in the Flywheel instance and check if there is any metadata stored in the `metadata` dataframe.

<div class="alert alert-block alert-info" style="color: black"><b>INFO: </b><code>any()</code> function looks through the elements in an iterable and return True if any item in the iterable is true, else it returns false.</div>

If the Subject is in the `metadata` dataframe, we will add the `age` and `treatment` information into the Subject container and update the `sex` metadata for each Subject. 

<div class="alert alert-block alert-info" style="color: black"><b>INFO: </b><code>dataframe.loc[]</code> method is being used to retrieve the row for the specific subject from the dataframe. While <code>dataframe.iloc[]</code> is used for integer-location based indexing/selection by position.</div>

In [None]:
# Iterate through each subjects 
for subj in project.subjects.iter():
    
    # Check if there is any matches for subj.label in the participant_id columns
    if (metadata["participant_id"] == subj.label).any():
        # Get data of the subject from the `metadata`
        tmp_info = metadata.loc[(metadata["participant_id"] == subj.label)]
        # Get the age and treatment for the subject
        # Convert the information to a dictionary with the value being stored in a list
        other_metadata = tmp_info[['age', 'treatment']].to_dict('l')
        # Update the metadata contains in the subject container
        sex = tmp_info.iloc[0]['sex']
        subj.update(type='human', sex = sex)
        subj.update_info(other_metadata)
        
    else:
        print(subj.label + ' does not have metadata stored in the CSV file.')

View the updated metadata in the Subject container

In [None]:
for subj in project.subjects.iter():
    subj = subj.reload()
    print(f'Subject Label: {subj.label}, Sex: {subj.sex}, Info: {subj.info}')

***

# Appendix 

Below is a few helpful functions that you can include in your script to streamline the process of getting or creating the subject/sessions/acquisition containers. 


## Helpful Functions

In [None]:
def get_or_create_subject(project, label, update=True, **kwargs):
    """Get the Subject container if it exists, else create a new Subject container.
    
    Args:
        project (flywheel.Project): A Flywheel Project.
        label (str): The subject label.
        update (bool): If true, update container with key/value passed as kwargs.
        kwargs (dict): Any key/value properties of subject you would like to update.

    Returns:
        (flywheel.Subject): A Flywheel Subject container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    subject = project.subjects.find_first(f'label={label}')
    if not subject:
        subject = project.add_subject(label=label)
        
    if update and kwargs:
        subject.update(**kwargs)

    if subject:
        subject = subject.reload()

    return subject

In [None]:
def get_or_create_session(subject, label, update=True, **kwargs):
    """Get the Session container if it exists, else create a new Session container.
    
    Args:
        subject (flywheel.Subject): A Flywheel Subject.
        label (str): The session label.
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Session you would like to update.

    Returns:
        (flywheel.Session): A flywheel Session container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    session = subject.sessions.find_first(f'label={label}')
    if not session:
        session = subject.add_session(label=label)
        
    if update and kwargs:
        session.update(**kwargs)

    if session:
        session = session.reload()

    return session

In [None]:
def get_or_create_acquisition(session, label, update=True, **kwargs):
    """Get the Acquisition container if it exists, else create a new Acquisition container.
    
    Args:
        session (flywheel.Session): A Flywheel Session.
        label (str): The Acquisition label.
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Acquisition you would like to update.

    Returns:
        (flywheel.Acquisition): A Flywheel Acquisition container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    acquisition = session.acquisitions.find_first(f'label={label}')
    if not acquisition:
        acquisition = session.add_acquisition(label=label)
        
    if update and kwargs:
        acquisition.update(**kwargs)

    if acquisition:
        acquisition = acquisition.reload()

    return acquisition

In [None]:
def upload_file_to_acquistion(acquistion, fp, update=True, **kwargs):
    """Upload file to Acquisition container and update info if `update=True`
    
    Args:
        acquisition (flywheel.Acquisition): A Flywheel Acquisition
        fp (Path-like): Path to file to upload
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Acquisition you would like to update.        
    """
    basename = os.path.basename(fp)
    if not os.path.isfile(fp):
        raise ValueError(f'{fp} is not file.')
        
    if acquistion.get_file(basename):
        log.info(f'File {basename} already exists in container. Skipping.')
        return
    else:
        log.info(f'Uploading {fp} to acquisition {acquistion.id}')
        acquistion.upload_file(fp)
        while not acquistion.get_file(basename):   # to make sure the file is available before performing an update
            acquistion = acquistion.reload()
            time.sleep(1)
            
    if update and kwargs:
        f = acquisition.get_file(basename)
        f.update(**kwargs)

Following the Flywheel Hierarchy, you can loop through each subject folders and either get the Subject if it exists in the Project already or create it if not ( this can be done by using the `get_or_create_subject` function above). You can do the same to get/create the Session and Acquisition containers. Once you get down to the Acqusition container, you can upload the corresponding DICOM archive to it by using the `upload_file_to_acquistion` function above.

Below is an example on how you can utilize the functions above to upload data into your project container.

*Feel free to modify the code below that correspond to your dataset*

In [None]:
log.info('Starting upload...')

# Get Subjects that have label starts with `anx`
for subj in PATH_TO_DATA.glob('anx*'):
    log.info('Processing subject %s', str(subj))
    subject = get_or_create_subject(project, subj.name, update=True, type='human', sex='female')  
    # Get Sessions folder that starts with `anx`
    for ses in subj.glob('anx*'):
        log.info('Processing session %s', str(ses))
        session = get_or_create_session(subject, ses.name)
        # Get Acquisition folder that starts with `T1` only
        for acq in ses.glob('T1*'):            
            log.info('Processing acquisition %s', str(acq))            
            acquisition = get_or_create_acquisition(session, acq.name)
            # Get DICOM File 
            for file in acq.glob('*.dcm.zip'):
                upload_file_to_acquistion(acquisition, file)
log.info('DONE')