# Cloud Data Transfer Speeds Benchmarking Workflow

Add overview of workflow

## Step 1: Define Workflow Inputs

### Download required UI dependencies

In [None]:
! conda install -y -c conda-forge ipywidgets
! pip install jupyter-ui-poll

### Import required libraries and setup UI

In [None]:
import ipywidgets as widgets
from jupyter_ui_poll import ui_events
import time
import os
import subprocess
import json

def confirmation_click(btn):
    global clicked
    clicked = True

class get_info:
    def add_names(self, btn):
        self.nameWidget.append(widgets.Text(placeholder='Input name'))
        display(self.nameWidget[-1])
        
    def del_names(self, btn):
        if len(self.nameWidget) > 1:
            self.nameWidget[-1].close()
            self.nameWidget.pop()
            
    def first_name(self):
        self.nameWidget = []
        self.nameWidget.append(widgets.Text(
                        placeholder='Input name'))
        display(self.nameWidget[0])
            
getInfo = get_info()

### Cloud Resource & Storage Information

Run the following cell to generate interactive widgets that allow you to choose the number and names of resources and cloud storage locations you would like to use in the benchmarks. **You should only rerun this cell if you entered the incorrect number of resources or cloud object stores. The names of these respective inputs can be changed without rerunning.**

In [None]:
clicked = False
button_purpose = ['Add field', 'Remove field', 'Submit']
buttons = []
for i in range(len(button_purpose)):
    buttons.append(widgets.Button(description=button_purpose[i]))
    display(buttons[i])
    
buttons[0].on_click(getInfo.add_names)
buttons[1].on_click(getInfo.del_names)
buttons[2].on_click(confirmation_click)

getInfo.first_name()

with ui_events() as poll:
    while not clicked:
        poll(10)
        time.sleep(0.1)
print('Resource names accpeted and recorded. Rerun cell to reset.')

### Randomly Generated File Options

Any desired randomly generated files will be written to all cloud storage locations specified in the **Cloud Resource Information** section. The ensures that all the cloud object stores that the benchmarks test will have identical randomly generated files for a fair comparison.

- `<filetype> (dict) = {'Create': <bool>, 'SizeGB': <int/float>]`

   1. Filetype will always correspond to three different formats: CSV, NetCDF, and a general binary file.
   
   2. `'Create': <bool>` - **True** creates file, **False** does not.
   
   3. `'SizeGB': <int/float>` - *Optional* argument that determines the file size of the randomly generated file. Can be omitted if ` If not explicitly stated and `'Create'` is set to **True**, default value of `SizeGB` will be **1**.

In [None]:
csv = {'Create': True, 'SizeGB': 1} # Will generate a 1 GB .csv file
netcdf = {'Create': False, 'SizeGB': 0}
binary = {'Create': False}

## Step 2: Notebook Setup

In [None]:
print('Setting up workflow...')

# Set cloud resource & storage environment variables
for i in range(len(resources)):
    resources[i] = resources[i].value
for i in range(len(storage)):
    storage[i] = storage[i].value
os.environ["resources"] = json.dumps(bench_resources)
os.environ["benchmark_storage"] = json.dumps(bench_storage)

# Set randomly generated file option environment variables
rfiles = [csv, netcdf, binary]
for i in range(len(rfiles)):
    if len(rfiles[i]) > 2:
        print('Too many arguments in', )


rand_filetype = list(map(str, [csv[0], netcdf[0], binary[0]]))
rand_filesize = list(map(str, [csv[1], netcdf[1], binary[1]]))
os.environ["randgen_files"] = json.dumps(rand_filetype)
os.environ["randgen_sizes"] = json.dumps(rand_filesize)

# Execute Setup Script
env_variables = ["resources", "benchmark_storage", "randgen_files", "randgen_sizes"]
for n in env_variables:
    command = "export " + n
    subprocess.run([command], shell=True)

! chmod u+x workflow_notebook_setup.sh
! ./workflow_notebook_setup.sh

print('Workflow setup complete.')