#  FABRIC Storage: Reserve and Benchmark

FABRIC has three types of reservable storage:
 - Local Disk (VM)
 - NVMe (dedicated PCI device)
 - Network Storage -- Coming Soon

This example notebook will demonstrate how to reserve and use each type of storage. The example will also benchmark each type of storage in order to show its performance.

## Configure the Environment

In [None]:
import os
from getpass import getpass

os.environ['FABRIC_CREDMGR_HOST']='cm.fabric-testbed.net'
os.environ['FABRIC_ORCHESTRATOR_HOST']='orchestrator.fabric-testbed.net'
os.environ['FABRIC_TOKEN_LOCATION']=os.environ['HOME']+'/work/fabric_token.json'

os.environ['FABRIC_BASTION_USERNAME']='pruth'
os.environ['FABRIC_BASTION_KEY_LOCATION']=os.environ['HOME']+'/work/.ssh/id_rsa_fabric'

os.environ['FABRIC_SLICE_PRIVATE_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa'
os.environ['FABRIC_SLICE_PUBLIC_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa.pub'
print('Please input private key passphrase. Press enter for no passphrase.')
os.environ['FABRIC_SLICE_PRIVATE_KEY_PASSPHRASE']=getpass()

os.environ['FABRIC_BASTION_HOST'] = 'bastion-1.fabric-testbed.net'
os.environ['FABRIC_BASTION_HOST_PRIVATE_IPV4'] = '192.168.11.226'
os.environ['FABRIC_BASTION_HOST_PRIVATE_IPV6'] = '2600:2701:5000:a902::c'

## Setup the Experiment

#### Import FABRIC API

In [None]:
import json
import traceback

from fabrictestbed_extensions.fablib.fablib import fablib

## Create a Node

The cell below creates a slice that contains a single node with a 1TB NVMe device. 


### Set the Slice Name and FABRIC Site

In [None]:
from fabrictestbed.slice_editor import ComponentModelType

slice_name="MySliceNVME"
site="MAX"
node_name='Node1'
username='centos'
image = 'default_centos_8'
image_type = 'qcow2'
cores = 2
ram = 8
disk = 100

nvme_name='nvme1'

In [None]:
try:
    #Create Slice
    slice = fablib.new_slice(name=slice_name)

    # Add node
    node = slice.add_node(name=node_name, site=site)
    node.set_capacities(cores=cores, ram=ram, disk=disk)
    node.set_image(image, username)
    
    #Add an NVME Drive
    node.add_component(model='NVME_P4510', name=nvme_name)

    #Submit Slice Request
    slice.submit(wait_progress=True)
except Exception as e:
    print(f"Slice Fail: {e}")
    traceback.print_exc()

## Get the Slice

In [None]:
try:
    slice = fablib.get_slice(name=slice_name)
    print(f"Slice: {slice.get_name()}")
except Exception as e:
    print(f"Get Slice Error: {e}")

## Get the Node

Retrieve the node information and save the management IP address.


In [None]:
try:
    node = slice.get_node(node_name) 
    print(f"Node Name        : {node.get_name()}")
    print(f"Management IP    : {node.get_management_ip()}")
    print(f"SSH Command      : {node.get_ssh_command()}")
    print()

    nvme1 = node.get_component(nvme_name)
    print(f"NVMe Name        : {nvme1.get_name()}")
    print(f"Details          : {nvme1.get_details()}")
    print(f"Disk (G)         : {nvme1.get_disk()}")
    print(f"Units            : {nvme1.get_unit()}")
    print(f"PCI Address      : {nvme1.get_pci_addr()}")
    print(f"Model            : {nvme1.get_model()}")
    print(f"Type             : {nvme1.get_type()}")  
except Exception as e:
    print(f"Error: {e}")

## Configure the NVMe PCI Device

NVMe storage is provided as bare PCI block devices and will likely need to be partitioned, formated, and mounted before use.

In [None]:
try:
    nvme1.configure_nvme()
except Exception as e:
    print(f"Error: {e}")

## Configure and Benchmark the Storage Devices

We will use <code>dd</code> to perform a simple benchmark of the different storage devices.  

Note that this is not a complete evaluation of  FABRIC storage devices and is meant more as a exercise for learning about using FABRIC storage and its performance.

### Local Disk

As seen above the local disk (`/dev/vda1`) is mounted at `/`.  We can read/write to the `/tmp` directory to benchmark this disk.

We can verify that `/tmp` is part of `/dev/vda1` by issuing the command below.

In [None]:
command = 'df /tmp'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

Now lets try writing a modest 1G file to the local disk using a simple `dd` command. We will use a 1G block size to simulate a full 1G file being written. If you would like to simulate a lot of smaller files you should reduce the value of `bs` to the size of the files and increase `count` to the number of files.

In [None]:
command='dd if=/dev/zero of=/tmp/output bs=1G count=1'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

The 1G file was probably written at over 1 GB/s. This is much faster than the local disk. How did this happen? The answer is that there the OS and VM hypervisor both have buffers in memory that allow bursts of file writes to be quickly written memory and, later, transfered to disk. This optimization helps many applications but is limited by the memory available for disk caching. 

Typically, storage benchmarking aims to test performance of the stoarge, not the OS buffering system. Some `dd` options can help with this.  Let's try again but add the `oflag=direct` option that will skip the file system buffer used by the VM's OS.

In [None]:
command='dd if=/dev/zero of=/tmp/output bs=1G count=1 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

What did you find? Often, this produces a counterintuitive result. It is not surprising to see speeds over 2.5 GB/s.  What is happening here?  

The increased performance is due to the hypervisor's virtual block device cache. Although this cache was also used in the previous step, it seems that it can enable higher bandwidth on its own than when used in combination with the VM's file system cache. 

Skipping this cache and writing directly to the physical block device is not possible without re-configuring the hypervisor. You will not be able to perform this test on FABRIC. You can, however, write a large enough file that the cache fills early in the write and the amortized performance approaches the block devices' write performance.

Try writing larger files. Note these tests could take tens of minutes so be patient.

Write a 10G file:

In [None]:
command='dd if=/dev/zero of=/tmp/output bs=5G count=1 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

Write a 20G file:

In [None]:
command='dd if=/dev/zero of=/tmp/output bs=25G count=1 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

The maximum value of `bs` is limited by the size of the VM's memory. Very large write tests must have reasonable values of `bs` and increased `count`. 

Try larger write tests. These tests may take some time. Be patient. 

In [None]:
command='dd if=/dev/zero of=/tmp/output bs=25G count=16 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

If you tests were large enough you will approach the write bandwidth of the disk. In our tests, this should be a bit under 200 MB/s.

### NVMe storage.

Now we can use `dd` command to benchmark the NVMe drive just like we did the local disk. Unlike the local disk, our VMs have direct control of the NVMe PCI devices so there is no hypervisor cache. Using `oflag=direct` will get closet to the actual performance of the NVMe block device.
 
Try a 1G file without `oflag=direct`:

In [None]:
command='sudo dd if=/dev/zero of=/mnt/nvme_mount/output bs=1G count=1'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

Try a 1G file with `oflag=direct`:

In [None]:
command='sudo dd if=/dev/zero of=/mnt/nvme_mount/output bs=1G count=1 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

Try a 5G file with `oflag=direct`:

In [None]:
command='sudo dd if=/dev/zero of=/mnt/nvme_mount/output bs=5G count=1 oflag=direct'
try:
    stdout, stderr = node.execute_script(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

Try a larger files with `oflag=direct`:

In [None]:
command='sudo dd if=/dev/zero of=/mnt/nvme_mount/output bs=1G count=100 oflag=direct'
try:
    stdout, stderr = node.execute(command)
    print(f"stdout: {stdout}")
    print(f"stderr: {stderr}")
except Exception as e:
    print(f"Fail: {e}")

You should be seeing much higher bandwidths than with the local disk.

## Cleanup Your Experiment

In [None]:
try:
    slice = fablib.get_slice(name=slice_name)
    slice.delete()
except Exception as e:
    print(f"Fail: {e}")
    traceback.print_exc()