# Functional Test 6.1.2 - Flash ESnet code into an FPGA persistent flash or RAM and validate

This Jupyter notebook will allow you to flash experimenter FPGA code based on [ONS shell](https://github.com/Xilinx/open-nic-shell) into the FPGA persistent flash or RAM. If the persistent flash is used, the end result is an FPGA that even after a cold reboot of the server retains its programming with a standard Xilinx XRT shell. If the program is flashed into RAM, a warm reboot of the server will activate it. 

This procedure can be used to reset the FPGA at a given site after experiments or initialize a newly installed device. It generally follows the procedures described by [ESnet](https://github.com/esnet/esnet-smartnic-fw/blob/main/sn-stack/README.INSTALL.md#running-the-firmware) for U280 devices.

It is assumed you are operating as part of the FABRIC Maintenance project and have access to the persistent volume named `fpga-tools` created on EDC where releavent tools are downloaded.

This notebook is broken up into steps and does not have to be executed in sequence.

- Step 2 Has multiple cells that initialize fablib and variables and creates a new slice
  - The cells initializing the state __must always be executed__.
  - Creating a slice is optional and can be skipped if slice already exists
- Step 3 is a re-entrant cell and can be used any time you want to refresh information about the existing slice
- Step 4 builds 3 docker images and saves them to Storage VM:
  - Xilinx Labtools
  - DPDK
  - esnet-smartnic-fw based on experimenter-provided bitfile artifact
  - Only the last one needs to be rebuilt often (for each user) the rest are already saved on the storage VM
- Step 5 retrieves existing docker images built in Step 4 at a prior time. This is useful if you are reflashing multiple FPGAs with the same code - you can execute Step 5 once and the skip it for all further VMs simply retrieving the artifacts from Storage VM in this step
- Step 6 can be done after Step 5 or Step 6 - that's the actual flashing and it involves identifying and rebooting the underlying server.
- Step 7 is an optional 'extend the slice' step that can be excuted if Steps 2 and 3 have been executed.
- Step 8 is 'delete the slice' that can be executed any time after Steps 2 and 3 have been executed.
 

## Step 0: Re-create a VM attached to fpga-tools volume on EDC

In order to have access to necessary tools execute the notebook to [re-create a Storage VM attached](../../fablib_api/fabric_fpgas/fpga_tools_storage.ipynb) to the `fpga-tools` persistent storage. You must execute it as a member of FABRIC Staff project. 

## Step 1: Identify and isolate the worker node

Unless the whole site is already in maintenance, using administrator tools identify the worker node with FPGA and put it in maintenance making sure it does not have experimenter VMs on it. You can check the [aggregate ads in JSON](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON) to make sure you are targeting the right worker.

## Step 2: Provision a VM on the desired worker with attached FPGA and FABNetv4 connection

Create a slice with a VM attached to the FPGA on the desired site and a FABNetv4 interface to reach the Storage VM in Step 0.

### Initialize fablib and variables

In [None]:
# Initialize FABlib

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

Define slice parameters - re-execute as needed to run any of the steps in this notebook.

In [None]:
# user artifact should be deposited Storage VM into /mnt/fpga_tools/static/artifacts/<owner username>/<version> since names of artifacts may be similar or same.
artifact_owner_username = 'msada'
artifact_version = 'v2'

# edit the name of the user-provided artifact and labtools and DPDK docker images stored in Storage VM as needed
artifact = 'artifacts.au280.p4_only.0.zip'
dpdk_image = 'smartnic-dpdk-docker.tar.gz'
labtools_image = 'xilinx-labtools-docker-2023.2_1013_2256.tar.gz'

# the image built from artifact usually has the same name regardless of the artifact
artifact_image = 'esnet-smartnic-fw.tar.gz'
sn_stack_tar = 'sn-stack.tar.gz'

# Xilins labtools package in Storage VM
labtools_package = 'Vivado/Vivado_Lab_Lin_2023.2_1013_2256.tar.gz'
sc_package = 'sc-fw-downloads/loadsc_v2.3.zip'
sc_fw_package = 'sc-fw-downloads/SC_U280_4_3_31.zip'

# setup site name
site='CLEM'
node_name='fpga-node'

# FABNetv4 of storage VM - consult the Storage VM slice for this FABNetv4 IP address
storage_vm_ip = "10.132.129.2"
# username and password used in storage VM
nginx_user = "fpga_tools"
nginx_password = "secret-password"

#
# should not need to edit below
FPGA_CHOICE='FPGA_Xilinx_U280'

# name the slice and the node 
slice_name=f'Persistent esnet-smartnic slice with {FPGA_CHOICE} on {site}'
print(f'Will create slice "{slice_name}" with node "{node_name}"')

# don't edit - convert from FPGA type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "FPGA_Xilinx_U280": "fpga_u280_available",
}

column_name = choice_to_column.get(FPGA_CHOICE, "Unknown")

#fablib.get_image_names()

### Create a new slice 

Create a slice with FPGA component on selected site and access to FABNetv4 network.

__NOTE:__ It is important to use a Docker-enabled image so that Docker can properly build docker images on IPv6-enabled sites.

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)
image = 'docker_ubuntu_20'

# Add node with a 200G drive and 8 of CPU cores using Ubuntu 20 image
node = slice.add_node(name=node_name, site=site, cores=8, disk=200, image=image)
node.add_component(model=FPGA_CHOICE, name='fpga1')
# be sure to add FABNetv4 so we can communicate with the slice that has the tools
node.add_fabnet()

# use the postboot script from docker examples
node.add_post_boot_upload_directory('../../fablib_api/docker_containers/node_tools','.')
node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node.add_post_boot_upload_directory('node_config','.')
node.add_post_boot_execute(f'chmod a+x node_config/ipv6-and-docker-plugins.sh && node_config/ipv6-and-docker-plugins.sh')

# Submit Slice Request
slice.submit();

Add storage VM into /etc/hosts for convenience. __Consult the storage slice for the FABNetv4 IPv4 address of that VM.__

In [None]:
slice = fablib.get_slice(slice_name)
node = slice.get_node(name=node_name)   

commands = list()
commands.append(f"echo {storage_vm_ip} fpga-tools-host | sudo tee -a /etc/hosts")
commands.append(f"echo 127.0.0.1 {node_name} | sudo tee -a /etc/hosts")

for command in commands:
    stdout, stderr = node.execute(command)

## Step 3: Inspect the slice
Note that nat64 configuration is done at boot time.

In [None]:
slice = fablib.get_slice(slice_name)

node = slice.get_node(name=node_name)              

node_addr = node.get_interface(network_name=f'FABNET_IPv4_{node.get_site()}').get_ip_addr()

slice.show()
slice.list_nodes()
slice.list_networks()
print(f'Node FABNetV4 IP Address is {node_addr}')

## Step 4: Build dockers

In this step you can freshly build three docker images - Xilinx Linux Labtools, smartnic DPDK and smartnic-esnet-fw (this last one is based on the experimenter-provided artifact). Normally the labtools and dpdk should not be rebuilt even for each new artifact - they are static save for the possible changes in the way ESnet workflow works. 

The only artifact that needs to be regularly built is the one based on the user artifact. 

In each step docker images are produced and all are saved back to Storage VM for efficiency. 

### Build Xilinx Linux Labtools docker

It is built once and stored on the storage VM. You should first skip to Step 6 and see if the image already exists. Only if it doesn't you can rebuild it.

In [None]:
# checkout the repo
tools_docker_repo = 'https://github.com/esnet/xilinx-labtools-docker.git'

stdout, stderr = node.execute(f'git clone {tools_docker_repo}')

Fetch the Linux tools from Xilinx and place into appropriate location

In [None]:
import os.path

tools_location = '~/xilinx-labtools-docker/vivado-installer'
sc_location = '~/xilinx-labtools-docker/sc-fw-downloads'

commands = [f'mkdir -p {tools_location} {sc_location}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{labtools_package}  > {tools_location}/{os.path.basename(labtools_package)}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{sc_package} > {sc_location}/{os.path.basename(sc_package)}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{sc_fw_package} > {sc_location}/{os.path.basename(sc_fw_package)}']

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)

Build the Xilinx tools docker.

__NOTE:__ this takes a long time and produces a large Docker image with all logs going to `labtools_docker_build.log` in the notebook directory. 

In [None]:
command = 'cd ~/xilinx-labtools-docker/ && docker build --pull -t xilinx-labtools-docker:${USER}-dev . && docker image ls'

print('For the log of the build check labtools_docker_build.log file in the notebook folder')

# this will produce a huge amount of output, so best not to log it to the notebook, but to a file
node_thread = node.execute_thread(command, output_file='labtools_docker_build.log')

stdout, stderr = node_thread.result()

Save the docker image into a tar file, and compress it.
__This takes a long time too__.

In [None]:
# parse package name to get version
import re

pattern = re.compile('(Xilinx_)?Vivado_Lab_Lin_([\d]{4}.[\d]_[\d]{4}_[\d]{4}).tar.gz')
m = pattern.match(os.path.basename(labtools_package))
if m:
    version = m[2]
else:
    version = 'Unknown'

# save the image
commands = list()
commands.append(f'docker save -o xilinx-labtools-docker-{version}.tar xilinx-labtools-docker')
commands.append(f'gzip -f9 xilinx-labtools-docker-{version}.tar')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

In [None]:
# use WebDAV on Storage VM

commands = list()
commands.append(f"curl -k -u {nginx_user}:{nginx_password} -T xilinx-labtools-docker-{version}.tar.gz https://fpga-tools-host/fpga-tools/smartnic-docker-images/")

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Building DPDK docker image

This builds a DPDK docker image which normally should only be done once and stored on the Storage VM. You should first skip to Step 6 and see if the image already exists. Only if it doesn't you can rebuild it.

In [None]:
# checkout the repo
dpdk_docker_repo = 'https://github.com/esnet/smartnic-dpdk-docker.git'

commands = list()
commands.append(f'git clone {dpdk_docker_repo}')
commands.append(f'cd smartnic-dpdk-docker && git submodule update --init --recursive')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Build the container logging into `dpdk_docker_build.log` file.

In [None]:
commands = list()

command = 'cd smartnic-dpdk-docker && docker build --pull -t smartnic-dpdk-docker:${USER}-dev .'

# this will produce a huge amount of output, so best not to log it to the notebook, but to a file
node_thread = node.execute_thread(command, output_file='dpdk_docker_build.log')

stdout, stderr = node_thread.result()

Save the image to file, compress. 

In [None]:
# save the image
commands = list()
commands.append(f'docker save -o smartnic-dpdk-docker.tar smartnic-dpdk-docker')
commands.append(f'gzip -f9 smartnic-dpdk-docker.tar')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Ship it to storage VM assuming standard location.

In [None]:
# Use WebDAV storage VM
commands = list()
commands.append(f"curl -k -u {nginx_user}:{nginx_password} -T smartnic-dpdk-docker.tar.gz https://fpga-tools-host/fpga-tools/smartnic-docker-images/")

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Building the firmware image using the hardware artifact provided by experimenter

Here we build the firware image based on user-provided hardware artifact, following the process [here](https://github.com/esnet/esnet-smartnic-fw/tree/main). This generally needs to be done once for each version of the artifact provided by the experimenter. Then the same image can be reused on multiple sites. This section assumes you have already put the experimenter hardware artifact into Storage VM on `/mnt/fpga_tools/static/artifacts/<artifact owner username>/<artifact version>/`. 

In [None]:
# checkout the repo and submodules
tools_docker_repo = 'https://github.com/esnet/esnet-smartnic-fw.git'

commands = list()
commands.append(f'git clone {tools_docker_repo}')
commands.append(f'cd esnet-smartnic-fw && git checkout c064d4ac775ed1a4c50ec72dea3615f9c644433e && git submodule init && git submodule update')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Next transfer the artifact from Storage VM into a folder in this repo. This file will be called `artifacts.<board>.<app_name>.0.zip` and should be placed in the `sn-hw/` directory in git source tree before starting the firmware build.

In [None]:
commands = list()

commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact}  > esnet-smartnic-fw/sn-hw/{artifact}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Provide the `.env` file to build, then build. 

In [None]:
# if the artifact file is called artifacts.au280.p4_only.0.zip then it translates into
# the following environment parameters
env_file = """
SN_HW_VER=0
SN_HW_BOARD=au280
SN_HW_APP_NAME=p4_only
"""

# transfer the .env to the node
command = f"echo '{env_file}' | sudo tee ~/esnet-smartnic-fw/.env"
stdout, stderr = node.execute(command)


Now we are ready to build the docker with bitfiles (esnet-smartnic-fw). The log will go into `esnet-smartnic-fw-docker.log`.

In [None]:
command = "cd esnet-smartnic-fw && ./build.sh"

# this will produce a huge amount of output, so best not to log it to the notebook, but to a file
node_thread = node.execute_thread(command, output_file='esnet-smartnic-fw-docker.log')

stdout, stderr = node_thread.result()

Save the image, compress. 

In [None]:
# save the image
commands = list()
commands.append(f'docker save -o esnet-smartnic-fw.tar esnet-smartnic-fw')
commands.append(f'gzip -f9 esnet-smartnic-fw.tar')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Update a .env file for deployment under `sn-stack/` folder. Note in FABRIC all FPGAs are currently __always__ listed under BDF `0000:00:1f` in the VM.

In [None]:
commands = list()
# append FPGA location to .env that should've been created by the build step
commands.append(f'echo "FPGA_PCIE_DEV=0000:00:1f" >> esnet-smartnic-fw/sn-stack/.env')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Save the `sn-stack/` folder with docker compose files.

In [None]:
commands = list()

commands.append(f'tar -zcf {sn_stack_tar} esnet-smartnic-fw/sn-stack/')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Ship the image to Storage VM along with meta-data directory that contains the docker compose files. __These are stored under /mnt/fpga_tools/static/artifacts/username/version/ - make sure this directory is world or nginx writable__.

In [None]:
# use WebDAV to move the docker image file into Storage VM into same folder where the artifact came from

commands = list()
commands.append(f"curl -k -u {nginx_user}:{nginx_password} -T {artifact_image} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/")
commands.append(f"curl -k -u {nginx_user}:{nginx_password} -T {sn_stack_tar} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/")

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

## Step 5: Retrieve docker images from Storage VM and install locally

If you prebuilt the images and are just flashing a new FPGA, cells in this step will help you retrieve the previously built images from Storage VM and then install them into Docker on this VM.

Most of the time you don't want to rebuild the DPDK and the Labtools images. You may need to build the image from user artifact, but if it is needed on multiple sites, it should be reused. You can select steps below as needed.

### Retrieve and install DPDK Docker image

Retrieve previously built DPDK docker image and install it. Image names are set in the second cell of this notebook.

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/smartnic-docker-images/{dpdk_image}  > {dpdk_image}')
commands.append(f'docker load < {dpdk_image}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Retrieve and install Xilinx Labtools docker image

Retrieve previously built labtools docker image and install it. Image names are set in the second cell of this notebook.

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/smartnic-docker-images/{labtools_image}  > {labtools_image}')
commands.append(f'docker load < {labtools_image}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Retrieve and install firmware image built from experimenter artifact
Retrieve previously built firmware image and install it. 

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact_image}  > {artifact_image}')
commands.append(f'docker load < {artifact_image}')
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{sn_stack_tar}  > {sn_stack_tar}')
commands.append(f'tar -zxf {sn_stack_tar}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Check loaded docker images
Prior to flashing/loading bitfile into RAM there should be three docker images which you either built directly or retrieved from Storage VM - __xilinx-labtools-docker, smartnic-dpdk-docker, and esnet-smartnic-fw__. They are always named this way so the docker compose in the flashing step can find them.

In [None]:
command = 'docker image ls'

stdout, stderr = node.execute(command)

## Step 6: Flash the card (into persistent flash or RAM)

Here we continue to follow the instructions [here](https://github.com/esnet/esnet-smartnic-fw/tree/main) and [here](https://github.com/esnet/esnet-smartnic-fw/blob/main/sn-stack/README.INSTALL.md#running-the-firmware).

__Note:__ Prior to proceeding please be sure to follow these steps to open the iDrac interface of the server you are working on as it will need to be __cold rebooted__ (in the case of a persistent reflash) or __warm rebooted__ (in the case of a RAM reflash):

1. Login to FABRIC VPN
2. Login to the head node of the cluster on which this VM is created
3. Login to the specific worker from the head node
    - Use Step 3 of this notebook to locate the worker using `Host` column of the sliver table
4. Run the following command to retrieve the asset tag: `cat /sys/class/dmi/id/chassis_serial` 
5. Login to the iDrac by using the [following formula](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/1490812935/iDRAC+Setup): `192.168.<`[site index](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/168624158/List+of+Hanks+FABRIC+Site+Information)`>.<100 + worker index>`.
    - indi-w2.fabric-testbed.net -> 192.168.36.102 
6. In the 'System Information' pane of iDrac compare the 'Service Tag' to the output of 4 above. This is to make sure we are rebooting the right server.

Now we flash the card - write to __persistent flash__, which is generally what you want (takes 20 minutes)

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose --profile smartnic-flash run --rm smartnic-flash-rescue && docker compose down -v --remove-orphans'

stdout, stderr = node.execute(command)

__Alternatively we flash the RAM__.

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose up'

stdout, stderr = node.execute(command)

__PCI rescan can be carried out with new rescan_pci API__
Please use the reboot option when this does not show the required number of PCI devices

In [None]:
# List PCI devices
stdout, stderr = node.execute("lspci")

In [None]:
# Rescan PCI devices
node.rescan_pci(component_name="fpga1")

In [None]:
# List PCI devices
stdout, stderr = node.execute("lspci")

After the process completes, shutoff the VM and __cold-reboot__ the worker node if you flashed persistent flash or __warm-reboot__ if flashing was into RAM. 

In [None]:
command = 'sudo /sbin/halt'

stdout, stderr = node.execute(command)

1. Use the iDrac console to perform a reboot (cold or warm)
2. After the reboot completes delete the slice using Step 9
3. Login to the worker node as root and validate that Xilinx programming has been activated. Generally you should see something like:
```
ubuntu@fpga-node:~$ lspci | grep Xilinx
25:00.0 Network controller: Xilinx Corporation Device 903f
25:00.1 Network controller: Xilinx Corporation Device 913f
```


## Step 7: Check CMAC (Optional)

Follow the steps in this section to check and verify CMAC/PHY status. 
These steps can be used after the server is cold-rebooted. 
1. Flash the FPGA, delete the slice.
2. Cold-reboot the server.
3. Create a new slice and follow Steps 1,2,3,5 (exclude Step 6 for flashing).
4. Check CMAC status.

Configure environment for the profile

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && echo "COMPOSE_PROFILES=smartnic-mgr-vfio-unlock" >> .env && docker compose up -d'

stdout, stderr = node.execute(command)

Start containers

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose ps -a'

stdout, stderr = node.execute(command)

Enable CMAC

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose exec smartnic-fw sn-cli cmac enable'

stdout, stderr = node.execute(command)

Check CMAC 

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose exec smartnic-fw sn-cli cmac status'

stdout, stderr = node.execute(command)

Reference output is below. In case "DOWN" status is reported, first step is re-flashing the card, second step is checking the cables physically.
```
CMAC0
  Tx (MAC Enabled/RS-FEC Off/PHY UP -> UP)  
  Rx (MAC Enabled/RS-FEC Off/PHY UP -> UP)  

CMAC1
  Tx (MAC Enabled/RS-FEC Off/PHY UP -> UP)  
  Rx (MAC Enabled/RS-FEC Off/PHY UP -> UP)
```


Stop containers

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose down'

stdout, stderr = node.execute(command)

## Step 8: Extend Slice

Get slice details and extend the slice. This cell is optional and can be executed as-needed.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

Renew by 14 days

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Step 9: Delete the slice

Delete the slice after completing the programming.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.delete()