# Batch file upload via API

This document outlines how to upload batch form data as CSV files to NACC using the Flywheel API.

In [1]:
from getpass import getpass
import logging
import os

from flywheel import Client, FileSpec

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')

## Configuring API Access to Flywheel

To use the Flywheel API with the Python SDK you will need an API key.

### Figure out where you are going to store your API Key

An API key should be kept secret.
Treat it like you would a password to access data at your center.

Be aware that a common way for secrets to be exposed is by hardcoding them into software or configuration files. 
Once the code is checked into a version control repository, the secrets remain in the history even after they are removed from the current version of the code.
So, before you start using your key in your scripts, please be sure you have a good secret management strategy in place.

Here, we just prompt the user for the key.
And, while this avoids storing the key, it is not practical for automated transfers and you will need a different strategy.

### Finding your API key

Each API key is associated with a particular user. 
To get the API key, login as the user to the NACC Flywheel instance.

1. Find the "avatar" in the upper right corner (generally a circle with your initials).
2. Click the avatar dropdown, and select "Profile".
3. Under "Flywheel Access" at the bottom of the resulting page, click "Generate API Key".
4. Choose a key name relevant to upload, set the expiration date, and create the API Key.
5. Copy the API Key since you wont be able to access the value later.
6. Keep the key secret   

### Load secret key in script

In this demonstration script, we just prompt for the key:

In [2]:
API_KEY = getpass('Enter API_KEY here: ')

## Connecting to Flywheel

Once you have the API Key loaded, it can be used to connect to Flywheel.
With the Python SDK we create an SDK client that we will use throughout the rest of the notebook.

In [3]:
fw = Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

2024-04-07 22:54:41,141 INFO You are now logged in as bjkeller@washington.edu to https://naccdata.flywheel.io/api


## Figuring out where to upload

### Identify group for center

Each center is associated with a Flywheel group.

Historically, NACC has used an ADC ID to represent centers, but in Flywheel a group has a symbolic ID.
You can find this ID either using a lookup table using the ADCID.


Group information is stored as metadata in a Flywheel project `fw://nacc/metadata`.
The following function performs the lookup for the ADCID.

In [4]:
from typing import Optional


def get_center_id(fw: Client, adcid: int) -> Optional[str]:
    """
    Looks up the center group ID for a given ADCID.
    
    Args:
        adcid (int): The ADCID of the center.
    
    Returns:
        Optional[str]: The group ID of the center, or None if not found.
    """
    metadata = fw.lookup("nacc/metadata")
    if not metadata:
        log.error("Failed to find nacc/metadata project")
    metadata = metadata.reload()
    if 'centers' not in metadata.info:
        log.error("No 'centers' key in nacc/metadata")
        return None
    
    if str(adcid) not in metadata.info['centers']:
        log.error("No center with ADCID %s in nacc/metadata", adcid)
        return None

    return metadata.info['centers'][str(adcid)]['group']

Then we would use this function to get the group ID for an ADCID.
For instance, the ID the Sample Center (ADCID 0):

In [5]:
group_id = get_center_id(fw=fw, adcid=0)
log.info("Group ID for ADCID 0 is %s", group_id)

2024-04-07 22:54:50,699 INFO Group ID for ADCID 0 is sample-center


### Identify project for upload

The following function will get the appropriate project for uploading data depending on the datatype, pipeline type, and study ID.
The defaults for the function will return the project for submitting form data to the sandbox pipeline for the ADRC Program (study ID `uds`).
The sandbox pipeline is used for test submissions, and the data will not be included in the NACC-released dataset.
To submit data for inclusion in the released data set, the pipeline type should be set to `'ingest'`.

In [6]:
from typing import Literal


def get_project(fw: Client,
                group_id: str, datatype: Literal['form','dicom'] = 'form', 
                pipeline_type: Literal['ingest', 'sandbox'] = 'sandbox', 
                study_id: str = 'uds'):
    """
    Looks up the project for a given center, study, and datatype.
    
    Args:
        group_id (str): The group ID of the center.
        datatype (str): The datatype to look up.
        pipeline_type (str): The type of the pipeline.
        study_id (str): The study ID for the project.
    Returns:
        Project: The project for the given center, study, and datatype.
    """
    suffix = f"-{study_id}" if study_id != 'uds' else ''
    project_label = f"{pipeline_type}-{datatype}{suffix}"
    project = fw.lookup(f"{group_id}/{project_label}")
    if not project:
        log.error("Failed to find project %s", project_label)

    return project

Here we want to upload form data to the sandbox pipeline for the center identified earlier.
To do this, we just pass the center `group_id` and use the defaults:

In [7]:
upload_project = get_project(fw=fw, group_id=group_id)
log.info("Using project %s/%s", upload_project.group, upload_project.label)

2024-04-07 22:54:58,517 INFO Using project sample-center/sandbox-form


## Upload file

For data files that have data for many participants, such as forms, the data is submitted as a CSV where each line is a data record for a participant.
These files are attached to the ingest (or sandbox) project as shown here.

> Data files that have a one to one relationship with participant, such as images, are uploaded differently.


### For files on disk

If you have a file on disk, you can upload it directly to the project using code like the following.

In [8]:
filename = "form-data.csv"
file_path = f"../data/{filename}"
file_type = 'text/csv'

if upload_project:
    upload_project.upload_file(file_path)

> If you end up wanting to write a program that simply uploads files from disk, you might consider instead using the [Flywheel CLI utility](https://flywheel-io.gitlab.io/tools/app/cli/fw-beta/). A guide for using this tool to upload to NACC's Flywheel instance is provided elsewhere.

### For files in memory

If, on the otherhand, you generate the file contents in memory, create a `flywheel.FileSpec` object that references the contents and then upload the file.

In [27]:
with open("../data/form-data.csv", "r") as f:
    contents = f.read()

filename = "form-data.csv"
file_type = 'text/csv'
file_spec = FileSpec(filename, contents=contents, content_type=file_type)
if upload_project:
    upload_project.upload_file(file_spec)