**Title**: Dataviews - basic create, read, update, and delete operations  
**Date**:  10-Jun-2022  
**Description**:  
* This notebook shows the basic CRUD operations for a Dataview, which is a specification for retrieving data from Flywheel.
* Also shown are the create and read functions for a Dataview Execution, which represents a run of a dataview.


# Requirements
- Access to a Flywheel instance V16.8 or greater where you can create and run dataviews
- A project with data (Subjects, Sessions, Acquisitions, and Files)

The next 2 sections ([Setup](#Setup) and [Flywheel API Key and Client](#Flywheel-API-Key-and-Client)) should be found in any notebook and kept as consistent as possible.

# <a id='setup'>Setup</a>

Packages required for the execution of this notebook should be installed with `pip` (using the `!` jupyter operator to run shell commands). This is required to ensure that the notebook is "standalone" and to avoid any issue with undefined package requirements. It also allows the notebook to be run out of the box on jupyter third party-platforms such as [google collab](https://colab.research.google.com/) or [mybinder.org](https://mybinder.org/).

In [None]:
# Here is an example to install the flywheel SDK
!pip install flywheel-sdk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flywheel-sdk
  Downloading flywheel_sdk-16.8.2-py2.py3-none-any.whl (782 kB)
[K     |████████████████████████████████| 782 kB 10.0 MB/s 
[?25hCollecting requests-toolbelt
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 1.0 MB/s 
Installing collected packages: requests-toolbelt, flywheel-sdk
Successfully installed flywheel-sdk-16.8.2 requests-toolbelt-0.9.1


Once installed packages get imported. Import should first list Python standard packages and then Third-party packages.

In [None]:
# Python standard package come first
from getpass import getpass
import logging
import os

# Third party packages come second
import flywheel


If useful, a logger can be instantiated to display information during notebook execution (e.g. useful to keep track of runtime). 

In [None]:
# Instantiate a logger
logging.basicConfig(level=logging.INFO)
log = logging.getLogger('root')

# Flywheel API Key and Client

Tutorials based on Jupyter notebooks aim at illustrating interactions with a Flywheel instance using the Flywheel SDK.  
To communicate with a Flywheel instance your first need to authenticate with the Flywheel API which required getting an API_KEY for your account. You can get you API_KEY by following the steps described in the Flywheel SDK doc [here](https://flywheel-io.gitlab.io/product/backend/sdk/branches/master/python/getting_started.html#api-key).

<div class="alert alert-block alert-danger">
<b>DANGER:</b> 
    Do NOT share your API key with anyone for any reason - it is the same as sharing your password and constitutes a HIPAA violation. ALWAYS obscure credentials from your code, especially when sharing with others/commiting to a shared repository.
</div>

In [None]:
API_KEY = getpass('Enter API_KEY here: ')

Enter API_KEY here: ··········


Instantiate the Flywheel API client either using the API_KEY provided by the user input above or by reading it from the environment variable `FW_KEY`.

In [None]:
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

You can check which Flywheel instance you have been authenticated against with the following:

In [None]:
print('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

You are now logged in as %s to %s filipmulier@flywheel.io https://rc.qa.flywheel.io/api


# Dataview Example

## Constants

This notebook requires access to an existing project with data and it will create and execute some dataviews.  The project is specified by providing the path to the project like `group/project` where group is the group id and project is the project label. 




In [None]:
PROJECT_PATH = 'prod/Alzheimers'

## Dataview CRUD

This section covers create, read, update, and delete operations for the Dataview.  Each of these starts with specifying the project.

### Create a Dataview

In [None]:
# select the project
project = fw.lookup(PROJECT_PATH)

In [None]:
#pick the columns
columns = ['subject.label',
           'subject.mlset',
           'session.info.age_years',
           'acquisition.id',
           'file.file_id',
           'file.name' ] 

In [None]:
# Specify the dataview
builder = flywheel.ViewBuilder(label='SDK Data View', #label of the dataview as shown in Flywheel
                              columns = columns,
                              container='acquisition', #Needed for file metadata at acquisition level
                              filename='*.*', # Needed for file metadata
                              match='all',
                              process_files=False,
                              include_ids=False,
                              include_labels=False,
                              sort=False,
                              )

In [None]:
# Create the dataview specification
sdk_dataview = builder.build()

In [None]:
# Create the Dataview in Flywheel
view_id = fw.add_view(project.id, sdk_dataview)

You should now see a dataview called 'SDK Data View' in the list of project dataviews.

### Read Dataviews

In [None]:
# select the project
project = fw.lookup(PROJECT_PATH)

In [None]:
# get all the data views in the project
view_list = fw.get_views(project.id)

#Build a dictionary to be able to lookup the ID given a dataview label
#Note this assumes the list of dataview labels are unique
dv_to_id = dict(list(map(lambda x: (x['label'],x['_id']),view_list))) 

In [None]:
view = fw.get_view(dv_to_id['SDK Data View'])
print(view)

{'columns': [{'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'subject.label',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'subject.mlset',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'session.info.age_years',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'acquisition.id',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'file.file_id',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'file.name',
              'type': None}],
 'description': None,
 'error_column': True,
 'file_

### Update the Dataview

In [None]:
# Here are the keys we can update
view.keys()

dict_keys(['parent', 'label', 'description', 'columns', 'groupBy', 'filter', 'fileSpec', 'includeIds', 'includeLabels', 'errorColumn', 'missingDataStrategy', 'sort', '_id', 'origin'])

In [None]:
# change the description
changes = {'description':"This is a test DV"}
fw.modify_view(view.id, changes)

{'modified': 1}

In [None]:
# Check that its changed
view = fw.get_view(dv_to_id['SDK Data View'])
view.description

'This is a test DV'

In [None]:
#Lets change a column next 
new_cols = {}
#Isolate all the columns
new_cols['columns'] = view.columns 

In [None]:
# Show the second column
new_cols['columns'][1]

{'accumulator': None,
 'dst': None,
 'expr': None,
 'src': 'subject.mlset',
 'type': None}

In [None]:
# Modify the column
new_cols['columns'][1]['src'] = "subject.cohort"
fw.modify_view(view.id, new_cols)

{'modified': 1}

In [None]:
# Check that its changed
view = fw.get_view(dv_to_id['SDK Data View'])
view

{'columns': [{'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'subject.label',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'subject.cohort',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'session.info.age_years',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'acquisition.id',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'file.file_id',
              'type': None},
             {'accumulator': None,
              'dst': None,
              'expr': None,
              'src': 'file.name',
              'type': None}],
 'description': 'This is a test DV',
 'error_column

### Delete a Dataview

In [None]:
# We just need the id to delete a dataview
fw.delete_view(dv_to_id['SDK Data View'])

{'deleted': 1}

In this script, we will be retrieving the Project ID and a message will be printed to notify the user whether the project exist or not.

In [None]:
project_id = get_project_id(fw, PROJECT_LABEL)
if project_id:
    print(f'Project ID is: {project_id}.')
else:
    print(f'No Project with label {PROJECT_LABEL} found.')

## Dataview Executions - Create, Read
Beginning with Flywheel V16.8 Dataview execution is managed in a queue. This section cover creating a Dataview, executing it, reading the data into a dataframe, and reading the queue.

## Create a Dataview Execution
 The operations in this section cover creating a Dataview and executing it.  

In [None]:
# select the project
project = fw.lookup(PROJECT_PATH)

In [None]:
#pick the columns
columns = ['subject.label',
           'subject.mlset',
           'session.info.age_years',
           'acquisition.id',
           'file.file_id',
           'file.name' ] 

In [None]:
# Specify the dataview
builder = flywheel.ViewBuilder(label='SDK Data View', #label of the dataview as shown in Flywheel
                              columns = columns,
                              container='acquisition', #Needed for file metadata at acquisition level
                              filename='*.*', # Needed for file metadata
                              match='all',
                              process_files=False,
                              include_ids=False,
                              include_labels=False,
                              sort=False,
                              )

In [None]:
# Create the dataview specification
sdk_dataview = builder.build()

In [None]:
# Create the Dataview in Flywheel, execute it, and wait for return.
df = fw.read_view_dataframe(sdk_dataview, project.id)

In [None]:
display(df)

Unnamed: 0,subject.label,subject.mlset,session.info.age_years,acquisition.id,file.file_id,file.name,errors
0,1,,,626c368f9d2d9e35cd43b882,626c44a2844917499743b502,AAHScout.dcm.zip,
1,1,,,626c368f9d2d9e35cd43b882,626c37d7db43ed5a0243a842,AAHScout.nii.gz,
2,1,,,626c368f7a26f3654cd160ef,626c54aec345a39433d1592f,BIAS_64CH.dcm.zip,
3,1,,,626c368f7a26f3654cd160ef,626c3c12db43ed5a0243aa01,BIAS_64CH.nii.gz,
4,1,,,626c368f9d2d9e35cd43b883,626c3f7d7a26f3654cd164e0,BIAS_BC.dcm.zip,
...,...,...,...,...,...,...,...
1803,10,,,626c36aa221ec26437d16874,626c3bc9d5aa2a45dbd1aaf0,rfMRI_REST_PA_SBRef.nii.gz,
1804,10,,,626c36aa9d2d9e35cd43b915,626c3e58d5aa2a45dbd1abf4,rfMRI_REST_PA.dcm.zip,
1805,10,,,626c36aa9d2d9e35cd43b915,626c52717a26f3654cd169ac,rfMRI_REST_PA.nii.gz,
1806,10,,,626c36aa221ec26437d16875,626c51c19d2d9e35cd43bfdc,rfMRI_REST_PA_SBRef.dcm.zip,


### Dataview Execution Read

In [None]:
# Get the list of all dataview executions on the project
executions_list = fw.get_all_data_view_executions()

In [None]:
# Here is the last one on the list
last_execution = executions_list[-1]
last_execution

{'created': datetime.datetime(2022, 6, 10, 20, 1, 47, 199000, tzinfo=tzlocal()),
 'data_view_id': None,
 'expires_on': datetime.datetime(2022, 7, 10, 20, 1, 47, 199000, tzinfo=tzlocal()),
 'id': '62a3a32b21b90b162ac27752',
 'modified': datetime.datetime(2022, 6, 10, 20, 1, 59, 42000, tzinfo=tzlocal()),
 'project_id': '626c3557221ec26437d1675e',
 'revision': 4,
 'state': 'completed',
 'storage_file_id': '62a3a337300bdd1b3235277b',
 'task_id': '42e08468-684e-4032-b358-9ead2faa136e',
 'timestamp_ran': datetime.datetime(2022, 6, 10, 20, 1, 47, 248000, tzinfo=tzlocal()),
 'user_id': 'filipmulier@flywheel.io'}