# **Title**: Finding Your Stuff in Flywheel with the Flywheel Python SDK - Part 1
## **Date**:  May 12th, 2020
### **Description**:  
* This notebook is intended to accompany a two-part webinar series on finding and accessing containers in Flywheel using the Flywheel Python SDK.
* The code herein will not change any metadata in Flywheel - only "read" actions will be performed.

### **Requirements**:
    1. Read access to a Flywheel project that contains a DICOM acquisition file
    2. A Flywheel API key
    3. An environment in which to execute the notebook (Binder/Colab/JupyterLab/etc.)    

---

# 1. Install and import dependencies

In [2]:
# Install specific packages required for this notebook
!pip install flywheel-sdk



Resource: [Flywheel Python SDK Documentation](https://flywheel-io.gitlab.io/product/backend/sdk/branches/master/python/getting_started.html)

In [3]:
# Import packages
import logging
import os
import pprint
import re
from getpass import getpass
from IPython.display import Markdown as md
from IPython.display import display, HTML

import flywheel

In [4]:
# Instantiate a logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')

---

# 2. Flywheel client initialization with an API key

Any time you use the Flywheel SDK, you will need to initialize a client with your API key. 

To find your API key, first click on the circle in the top right corner of Flywheel, then click on `Profile` as shown below: 

![profile_location](assets/profile_location.png)

On your profile page, your API key will be located under the `Your API Key` as below:

![api_key_location](assets/api_key_location.png)

Copy this key, run the cell below, paste the key into the `Enter API_KEY here:` box and press return. Note: you must include the full cell in `Your API Key`. For example, `ss.ce.flywheel.io:123456ABCDEF789zyxw`.

*TIP*: While you can initialize your client as `fw` with `fw = flywheel.Client('<your api key>')`, it is essential to obscure credentials from your code, especially when sharing with others/commiting to a shared repository.

**WARNING: Do NOT share your API key with anyone for any reason - it is the same as sharing your password and constitutes a HIPAA violation.**

In [6]:
# Password prompt (good security practice)
API_KEY = getpass('Enter API_KEY here: ')

# Initialize the client
fw = flywheel.Client(API_KEY or os.environ.get('FW_KEY'))

# Clean up the API_KEY
del API_KEY

# Log information about the user and site
log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

Enter API_KEY here:  ····································


2020-05-11 14:38:17,631 INFO You are now logged in as kalebfischer@flywheel.io to https://ss.ce.flywheel.io:443/api


--------

# 3. Using `lookup` to find items in Flywheel

If you know the group **id** and container (project/subject/session/acquisition) labels, `fw.lookup()` can be used to return container and file objects via the SDK. In order to use lookup, we need to know how to format the **resolver path**, the Flywheel equivalent to a local file path.

Let's look at how to locate the components of a resolver path and some examples!

*HINT*: Data in Flywheel is organized into a hierarchy of containers. Resource: [Flywheel Hierarchy](https://docs.flywheel.io/hc/en-us/articles/360008411054-Getting-Started-Flywheel-Hierarchy)

## Locating group id and project label

It is important to distinguish between group ID, group label, and project label. This is illustrated in the image below:

![group_id_label](assets/group_id_label.png)


In the above image:
* group id is `scien`
* group label is `Scientific Solutions`
* project label is `MIDAS_veggies`

To retrieve the project via lookup, we can use the group id and project label, for example:

```python
fw.lookup('scien/MIDAS_veggies')
```

## Locating subject, session, and acquisition labels

![labels_file_name](assets/labels_file_name.png)

To retrieve subject `dragon_fruit` via lookup:

```python
fw.lookup('scien/MIDAS_veggies/dragon_fruit')
```

To retrieve subject `dragon_fruit`'s `session_1` via lookup:

```python
fw.lookup('scien/MIDAS_veggies/dragon_fruit/session_1')
```

To retrieve `acq1` from `dragon_fruit`'s `session_1` via lookup:

```python
fw.lookup('scien/MIDAS_veggies/dragon_fruit/session_1/acq1')
```

## Locating file name

To retrieve file `IM_0001` from `acq1` we need to add `files` to the path as below:

```python
fw.lookup('scien/MIDAS_veggies/dragon_fruit/session_1/acq1/files/IM_0001')
```

Files for any container can be retrieved by adding `/files/<your file name here>` to the container path.

## Putting it together

Now that we know how to format a **resolver path**, we create a function that wraps this **resolver path** and the connect to Flywheel to retrieve information about a container's label, type, and id.

In [None]:
def lookup_demo(fw, resolver_path, log):
    """ Lookup a container at the specified resolver path using Flywheel Client
    
    Args:
        fw (flywheel.Client): an instance of the Flywheel client
        resolver_path (str): the resolver path to a Flywheel container
        log (logging.Logger): the logger to use for storing output
        
    Returns:
        A Flywheel container object (the result of fw.lookup(resolver_path)

    """
    log.info(f'resolver_path:\t {resolver_path}')
    try:
        container = fw.lookup(resolver_path)
        label = container.get('label') or container.get('name')
        print_str = (f'{label} is a {container.container_type} with id {container.id}\n')
        log.info(print_str)
        return container
    
    except flywheel.rest.ApiException as e:
        log.error(
            'An exception was raised when trying to resolve the container at resolver path %s', 
            resolver_path,
            exc_info=True
        )
        return None

## TRY: 
In the code cells below, substitute the inputs for group id, container labels (project, subject, session, & acquisition), and file name variables with those from your data.

In [None]:
# Set group id and container labels (add your labels within the single quotes)
group_id = 'scien'
project_label = 'MIDAS_veggies'
subject_label = 'dragon_fruit'
session_label = 'session_1'
acquisition_label = 'acq1'
file_name = 'IM_0001'

In [None]:
# Lookup group
resolver_path = group_id
group = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup project
resolver_path = '/'.join([group_id, project_label])
project = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup subject
resolver_path = '/'.join([group_id, project_label, subject_label])
subject = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup session
resolver_path = '/'.join([group_id, project_label, subject_label, session_label])
session = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup acquisition
resolver_path = '/'.join([group_id, project_label, subject_label, session_label, acquisition_label])
acquisition = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup acquisition file 
resolver_path = '/'.join([group_id, project_label, subject_label, session_label, acquisition_label, 'files', file_name])
acquisition_file = lookup_demo(fw, resolver_path, log)

In [None]:
# Lookup non-existent resolver - we expect a traceback here since it does not exist!
resolver_path = 'group/does/not/exist'
none_result = lookup_demo(fw, resolver_path, log)

##### Pitstop: Questions? Comments?

----

# 4. Using `get` to retrieve containers

`fw.lookup()` works well when the resolver path is uniquely descriptive for the file/container in question. However, if you have multiple sessions/acquisitions/files with the same labels/resolver paths, it can be time consuming. Typically, `get` is faster than `lookup`, especially for large projects. Even more so, `get_<container type>` (for example, `get_session`) functions are typically faster than `get` and can be used when you know the type of container you are trying to retrieve.

## Locating container id to pass to `get` methods

In order to retrieve a container via `get`, we need to obtain its **id**, a value that is assigned by Flywheel to uniquely identify containers.

### Project id and session id

Within sessions view, the URL will contain project id to the right of `projects/` and session id to the right of `sessions/`

![project_session_id](assets/project_session_id.png)

In this example, we could use either of the following methods to retrieve the project container:

``` python
fw.get('5e90d7a5a3803400a8e63b13')
fw.get_project('5e90d7a5a3803400a8e63b13')
```

And to retrieve the session container:

``` python
fw.get('5e948782a38034010ce63ac7')
fw.get_session('5e948782a38034010ce63ac7')
```

### Subject id

To obtain the subject id from Flywheel, click on the person icon to enter the subject view, then click on the row with the subject label. The URL will now have `subjects/`, followed by the subject id:

![subject_id](assets/subject_id.png)

To retrieve the subject container in this example, we would use either of the following:

```python
fw.get('5e948782a38034010ce63ac6')
fw.get_subject('5e948782a38034010ce63ac6')
```

### Acquisition id

To obtain the acquisition id from Flywheel, we take the following three steps:

1. On an acquisition, right-click + "Inspect" - an inspection window is displayed (as shown in Step 3):

    ![inspect](assets/inspect.png)
    

2. Click the "kebab" menu to the right of the acquisition label, then select information:

    ![kebab_info](assets/kebab_info.png)
    
    
3. Select the "Network" tab within the inspection window and locate acquisition id:
    
    ![acquisition_id_network](assets/acquisition_id_network.png)
    
    
To retrieve the acquisition container in this example, we would use either of the following methods:

```python
fw.get('5e948799a380340100e63acb')
fw.get_acquisition('5e948799a380340100e63acb')
```

### Getting a file

Files cannot be obtained via `get` methods. Instead, you must get the parent container and use the `get_file` method to return the file object.

To illustrate, the following code would be used to retrieve acq1's IM_0001 file:

```python
acq = fw.get_acquisition('5e948799a380340100e63acb')

acq.get_file('IM_0001')
```

### TIP: Python object attributes

* When examining a python object, it is useful to examine its attributes with `dir`

```python
acq = fw.get_acquisition('5e948799a380340100e63acb')
print(dir(acq))
```
    
* Notebooks will display information about a function/method when followed by a `?`; for example, `dir?`
    * You can also print the docstring in other IDEs with `print(dir.__doc__)`

## Putting it together

Now that we know how to find a container id, we can use this to retrieve information about a container's label, type, and id.

## TRY: 
In the code cells below, substitute the input for container id variable with those from your data.

In [None]:
# Add your container id between the single quotes
container_id = '5e948799a380340100e63acb'
container = fw.get(container_id)
print(f'{container.label} is a {container.container_type} with id: {container.id}\n')

In [None]:
# What attributes are available?
print('dir(container):\n')
pprint.pprint(dir(container))
print('\n')

In [None]:
# Print out the container JSON
pprint.pprint(container.to_dict())

##### Pitstop: Questions? Comments?

---

# 5. Generate the URL for a project, subject, or session

In [11]:
def get_url_prefix(client_config_site_api_url):
    """Removes /api and port (i.e., :443) from client_config_site_api_url.
    
    Args:
        client_config_site_api_url (str): the value for client.get_config().site.api_url

    Returns:
        return_prefix (str): the url without /api or port information
        
    """
    remove_regex = r'(:[\d]+)?/api'
    return_prefix = re.sub(remove_regex, '', client_config_site_api_url)
    
    return return_prefix


def create_container_link(fw, container):
    """Create a link to a specified container in Flywheel.
    
    Args:
        fw (flywheel.Client): an instance of the Flywheel client
        container (str): a Flywheel container object
        
    Returns:
        A URL link to the Flywheel container
    
    """
    api_url = fw.get_config().site.api_url
    prefix = get_url_prefix(api_url)
    
    if container.container_type == 'project':
        return_url = '/'.join([prefix, '#', 'projects', container.id])
        
    elif container.container_type == 'session':
        return_url = '/'.join([prefix, '#', 'projects', container.project, 'sessions', container.id])
        
    elif container.container_type == 'subject':
        return_url = '/'.join([prefix, '#', 'projects', container.project, 'subjects', container.id])
        
    else:
        log.error('Link creation is not supported for %s containers', container.container_type)
        return_url = None
        
    return return_url


def display_link(link_address, link_description):
    """Formats HTML to display a link to link_address with link_description"""
    link_t = f'<a href=\'{link_address}\'> {link_description} </a>'
    html = HTML(link_t)
    display(html)

In [12]:
# Retrieve a project container link
project = fw.get_project('5e90d7a5a3803400a8e63b13')
project_link = create_container_link(fw, project)
display_link(project_link, 'Project Link')

In [None]:
# Retrieve a subject container link
subject = fw.get_subject('5e948782a38034010ce63ac6')
subject_link = create_container_link(fw, subject)
display_link(subject_link, 'Subject Link')

In [15]:
# Retrieve a session container link
session = fw.get_session('5e948782a38034010ce63ac7')
session_link = create_container_link(fw, session)
subject_link = create_container_link(fw, subject)
display_link(session_link, 'Session Link')

##### Pitstop: Questions? Comments?

---

# 6. Constructing advanced search queries
Advanced Search streamlines searching for files/containers that meet search criteria at multiple levels of the container hierarchy. 

## Query Interface
![adv_search_query](assets/adv_search_query.png)


## Query Results
![adv_search_results](assets/adv_search_results.png)

## Features
* suggestions for field(s) and value(s)
* search terms can be combined with nested AND/OR logic
* variety of comparators available:
    * IS/IS NOT
    * CONTAINS/DOES NOT CONTAIN
    * EXISTS/DOES NOT EXIST
* supports:
    * Sessions
    * Acquisitions
    * Files
    * Analyses
* all data option for site admins

## Limitations
* metadata must first be indexed in order to appear in advanced search - depending on site scale and rate of upload, this may not be immediate
* results are not paginated, so advanced search is not suited for locating all files meeting general criteria (i.e. all files of type DICOM)

## Code Example
To run a query formatted with the UI tool:

```python
query = 'file.type = dicom AND subject.label = dragon_fruit'
fw.search({'return_type': 'acquisition', 'structured_query': query})
```



In [13]:
# Get advanced search link from client
api_url = fw.get_config().site.api_url
prefix = get_url_prefix(api_url)
advanced_search_link = prefix + '/#/search/advanced'
display_link(advanced_search_link, 'Advanced Search Link')

## TRY:
Format your search query in the UI by following the link above, copy-paste to set `query` equal to the search string obtained from the UI in the cell below:

In [21]:
# Replace with your query between the single quotes
query = 'file.type = dicom AND subject.label = dragon_fruit'
# Set return_type to 'acquisition', 'session', 'file', or 'analysis' 
return_type = 'file'
fw.search({'return_type': return_type, 'structured_query': query})

[{'acquisition': None,
  'analysis': None,
  'collection': None,
  'file': {'classification': {},
           'created': datetime.datetime(2020, 4, 23, 14, 44, 21, 747000, tzinfo=tzutc()),
           'name': 'IM_0001',
           'size': 96454778,
           'type': 'dicom'},
  'group': {'id': 'scien', 'label': 'Scientific Solutions'},
  'parent': {'analyses': None,
             'collections': None,
             'created': None,
             'files': None,
             'id': '5ea1a9c02971c8017cf876c5',
             'info': None,
             'info_exists': None,
             'label': None,
             'modified': None,
             'notes': None,
             'parents': None,
             'permissions': None,
             'revision': None,
             'session': None,
             'tags': None,
             'timestamp': None,
             'timezone': None,
             'uid': None},
  'permissions': [{'access': 'admin', 'id': 'michaelperry@flywheel.io'},
                  {'access': '

##### Any remaining questions or comments?

# Stay tuned for Part 2!