# Cloud Data Transfer Speeds Benchmarking Workflow

Add overview of workflow

## Step 0: Load Required Setup Packages & Classes

Installs required workflow setup packages and calls UI generation script. If one or more of the packages don't exist in your `base` environment, they will install for you. Note that if installation is required, this cell will take a few minutes to complete execution.

In [1]:
while True:
    try:
        import ipywidgets as widgets
        from varname import nameof
        from jupyter_ui_poll import ui_events
        break
    except:
        ! conda install -y -c conda-forge ipywidgets
        ! conda install -y -c conda-forge varname
        ! pip install jupyter-ui-poll
import time
import os
import subprocess
import json
%run -i input_ui.py

## Step 1: Define Workflow Inputs

**NOTE: Do not attempt to rerun the cells in Step 1 while `In[*]:` is visible in the top left corner next to the cell. If you do this, the kernel will need to be restarted.**

### Cloud Resource & Storage Information

Run the following cell to generate input fields for resource names and cloud object store URIs. The resource name(s) you specify will write and read data to and from the cloud object storage location(s) you provide. If using the PW cloud storage feature, you may also input an aboslute path specifying the mount point of your cloud object storage.

In [2]:
# Call resource input UI
print('1. Input desired resource name(s):')
placeholder = 'Input resource name'
description = 'Resource'
resources = getStrings.run_ui(placeholder, description)
print('\033[1m' + 'Saved resource name(s):', resources, '\n\n' + '\033[0m')

# Call storage location input UI
print('2. Input desired cloud object store URI(s) or mount point(s):')
placeholder = 'Input URI or absolute path'
description = 'Storage'
storage = getStrings.run_ui(placeholder, description)
print('\033[1m' + 'Saved cloud object store locations:', storage, '\n\n' + '\033[0m')
print('Execution complete. Rerun cell to reset inputs.')

1. Input desired resource name(s):


Box(children=(Button(description='Add field', style=ButtonStyle()), Button(description='Remove field', style=B…

VBox(children=(Text(value='', description='Resource 1:', placeholder='Input resource name'),))

[1mSaved resource name(s): ('gcptestnew',) 

[0m
2. Input desired cloud object store URI(s) or mount point(s):


Box(children=(Button(description='Add field', style=ButtonStyle()), Button(description='Remove field', style=B…

VBox(children=(Text(value='', description='Storage 1:', placeholder='Input URI or absolute path'),))

[1mSaved cloud object store locations: ('/home/jgreen/clouddatabenchmarks',) 

[0m
Execution complete. Rerun cell to reset inputs.


### Randomly Generated File Options

This workflow is designed to work with user inputs of CSV, NetCDF, and/or binary file formats but also includes the option to define randomly generated data sets of each type, respectively. Run the following cell to generate input fields for defining these datasets. 

**NOTE: This cell must be run and submitted for each execution of the workflow, even if you do not wish to use randomly generated files. In future versions, there will be a unified UI for user input.**

In [23]:
# Call randonly generated file options UI
print('3. Randomly Generated File Options:')
rand_options = randOpts.run_ui(resources)

# Populate variables to be exported
rand_resource = rand_options[0]
rand_filetype = rand_options[1]
rand_filesize = rand_options[2]

# Print confirmation messages
print('\033[1m' + 'Will use', rand_resource, 'to write:\n')
for i in range(len(rand_filetype)):
    message = str(rand_filesize[i]) + ' GB ' + rand_filetype[i]
    print('- ' + message)
print('\nfiles to provided cloud object stores.' + '\033[0m')
print('\n\nExecution complete. Rerun cell to reset randomly generated file option inputs.')

3. Randomly Generated File Options:


Dropdown(description='Resource to write files with:', options=('gcptestnew',), value='gcptestnew')

HBox(children=(Checkbox(value=False, description='CSV'), Checkbox(value=False, description='NetCDF4'), Checkbo…

HBox(children=(FloatText(value=0.0), FloatText(value=0.0), FloatText(value=0.0)))

Button(description='Submit', style=ButtonStyle())

[1mWill use ('gcptestnew',) to write:

- 0.1 GB CSV

files to provided cloud object stores.[0m


Execution complete. Rerun cell to reset randomly generated file option inputs.


## Step 2: Notebook Setup

Run the cell below when you are finished with all input. 

**IF YOU NEED TO CHANGE ANY INPUT, DO SO BEFORE RUNNING THIS CELL. THIS IS THE POINT OF NO RETURN.**

In [28]:
print('Setting up workflow...')

# Gather all user input

# Set cloud resource & storage environment variables
os.environ["resources"] = json.dumps(resources)
os.environ["benchmark_storage"] = json.dumps(storage)

# Set randomly generated file option environment variables
os.environ["randgen_files"] = json.dumps(rand_filetype)
os.environ["randgen_sizes"] = json.dumps(rand_filesize)
os.environ["randgen_resource"] = json.dumps(rand_resource)

env_variables = ("resources", "benchmark_storage", "randgen_files", "randgen_sizes", "randgen_resource")

# Execute Setup Script
for n in env_variables:
    command = "export " + n
    subprocess.run([command], shell=True)

! chmod u+x workflow_notebook_setup.sh
! ./workflow_notebook_setup.sh

print('Workflow setup complete.')

Setting up workflow...
Generating random files...
Shredding run_rand_files.sh
Workflow setup complete.
