# Functional Test 6.1.3 - Flash NEU/OCT code into an FPGA persistent flash or RAM and validate

This Jupyter notebook will allow you to flash experimenter FPGA code based on [ONS shell](https://github.com/Xilinx/open-nic-shell) into the FPGA persistent flash or RAM. If the persistent flash is used, the end result is an FPGA that even after a cold reboot of the server retains its programming with a standard Xilinx XRT shell. If the program is flashed into RAM, a warm reboot of the server will activate it. 

This procedure can be used to reset the FPGA at a given site after experiments or initialize a newly installed device. It generally follows the procedures described by [ESnet](https://github.com/esnet/esnet-smartnic-fw/blob/main/sn-stack/README.INSTALL.md#running-the-firmware) for U280 devices.

It is assumed you are operating as part of the FABRIC Maintenance project and have access to the persistent volume named `fpga-tools` created on EDC where releavent tools are downloaded.

This notebook is broken up into steps and does not have to be executed in sequence.

- Step 2 Has multiple cells that initialize fablib and variables and creates a new slice
  - The cells initializing the state __must always be executed__.
  - Creating a slice is optional and can be skipped if slice already exists
- Step 3 is a re-entrant cell and can be used any time you want to refresh information about the existing slice
- Step 4 builds 3 docker images and saves them to Storage VM:
  - Xilinx Labtools
  - DPDK
  - esnet-smartnic-fw based on experimenter-provided bitfile artifact
  - Only the last one needs to be rebuilt often (for each user) the rest are already saved on the storage VM
- Step 5 retrieves existing docker images built in Step 4 at a prior time. This is useful if you are reflashing multiple FPGAs with the same code - you can execute Step 5 once and the skip it for all further VMs simply retrieving the artifacts from Storage VM in this step
- Step 6 can be done after Step 5 or Step 6 - that's the actual flashing and it involves identifying and rebooting the underlying server.
- Step 7 is an optional 'extend the slice' step that can be excuted if Steps 2 and 3 have been executed.
- Step 8 is 'delete the slice' that can be executed any time after Steps 2 and 3 have been executed.
 

## Step 0: Re-create a VM attached to fpga-tools volume on EDC

In order to have access to necessary tools execute the notebook to [re-create a Storage VM attached](../../fablib_api/fabric_fpgas/fpga_tools_storage.ipynb) to the `fpga-tools` persistent storage. You must execute it as a member of FABRIC Staff project. 

## Step 1: Identify and isolate the worker node

Unless the whole site is already in maintenance, using administrator tools identify the worker node with FPGA and put it in maintenance making sure it does not have experimenter VMs on it. You can check the [aggregate ads in JSON](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON) to make sure you are targeting the right worker.

## Step 2: Provision a VM on the desired worker with attached FPGA and FABNetv4 connection

Create a slice with a VM attached to the FPGA on the desired site and a FABNetv4 interface to reach the Storage VM in Step 0.

### Initialize fablib and variables

In [None]:
# Initialize FABlib

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

Define slice parameters - re-execute as needed to run any of the steps in this notebook.

In [None]:
# user artifact should be deposited Storage VM into /mnt/fpga_tools/static/artifacts/<owner username>/<version> since names of artifacts may be similar or same.
artifact_owner_username = 'han_zha'
artifact_version = 'v1'

# edit the name of the user-provided artifact and labtools and DPDK docker images stored in Storage VM as needed
artifact = 'open_nic_shell.mcs'

# Xilinx labtools package in Storage VM
labtools_package = 'Vivado/Vivado_Lab_Lin_2023.2_1013_2256.tar.gz'

# setup site name
site='CLEM'
node_name='fpga-node'

# FABNetv4 of storage VM - consult the Storage VM slice for this FABNetv4 IP address
storage_vm_ip = "10.132.129.2"
# username and password used in storage VM
nginx_user = "fpga_tools"
nginx_password = "secret-password"

#
# should not need to edit below
FPGA_CHOICE='FPGA_Xilinx_U280'

# name the slice and the node 
slice_name=f'Persistent esnet-smartnic slice with {FPGA_CHOICE} on {site}'
print(f'Will create slice "{slice_name}" with node "{node_name}"')

# don't edit - convert from FPGA type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "FPGA_Xilinx_U280": "fpga_u280_available",
}

column_name = choice_to_column.get(FPGA_CHOICE, "Unknown")

#fablib.get_image_names()

### Create a new slice 

Create a slice with FPGA component on selected site and access to FABNetv4 network.

__NOTE:__ It is important to use a Docker-enabled image so that Docker can properly build docker images on IPv6-enabled sites.

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)
image = 'docker_ubuntu_20'

# Add node with a 200G drive and 8 of CPU cores using Ubuntu 20 image
node = slice.add_node(name=node_name, site=site, cores=8, disk=200, image=image)
node.add_component(model=FPGA_CHOICE, name='fpga1')
# be sure to add FABNetv4 so we can communicate with the slice that has the tools
node.add_fabnet()

# use the postboot script from docker examples
node.add_post_boot_upload_directory('../../fablib_api/docker_containers/node_tools','.')
node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node.add_post_boot_upload_directory('node_config','.')
node.add_post_boot_execute(f'chmod a+x node_config/ipv6-and-docker-plugins.sh && node_config/ipv6-and-docker-plugins.sh')

# Submit Slice Request
slice.submit();

Add storage VM into /etc/hosts for convenience. __Consult the storage slice for the FABNetv4 IPv4 address of that VM.__

In [None]:
slice = fablib.get_slice(slice_name)
node = slice.get_node(name=node_name)   

commands = list()
commands.append(f"echo {storage_vm_ip} fpga-tools-host | sudo tee -a /etc/hosts")
commands.append(f"echo 127.0.0.1 {node_name} | sudo tee -a /etc/hosts")

for command in commands:
    stdout, stderr = node.execute(command)

## Step 3: Inspect the slice
Note that nat64 configuration is done at boot time.

In [None]:
slice = fablib.get_slice(slice_name)

node = slice.get_node(name=node_name)              

node_addr = node.get_interface(network_name=f'FABNET_IPv4_{node.get_site()}').get_ip_addr()

slice.show()
slice.list_nodes()
slice.list_networks()
print(f'Node FABNetV4 IP Address is {node_addr}')

## Step 4: Fetch Tools and Install Vivado Lab

Fetch the Linux tools from Xilinx and place into appropriate location

Clone OCT-FPGA repo

In [None]:
# checkout the repo
tools_neu_repo = 'https://github.com/OCT-FPGA/P4OpenNIC_Public.git'

commands = list()
commands.append(f'[ ! -d ~/fpga-tools ] && mkdir -p ~/fpga-tools')
commands.append(f'cd fpga-tools && git clone {tools_neu_repo}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Clone pcimem repo

In [None]:
# checkout the repo

# checkout the repo
pcimem_repo = 'https://github.com/billfarrow/pcimem.git'

commands = list()
commands.append(f'[ ! -d ~/fpga-tools ] && mkdir -p fpga-tools')
commands.append(f'cd fpga-tools && git clone {pcimem_repo}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Next transfer the Xilinx Vivado Lab package from the Storage VM

In [None]:
import os.path

tools_location = '~/xilinx-labtools/vivado-installer'

commands = [f'[ ! -d {tools_location} ] && mkdir -p {tools_location}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{labtools_package}  > {tools_location}/{os.path.basename(labtools_package)}']

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

Next transfer the artifact from Storage VM into a folder in this repo. This file will be called `<application_name>.mcs` and should be placed in the `~/bitfile` directory before starting the firmware build.

In [None]:
bitfile_location = '~/fpga-bitfile'

commands = [f'[ ! -d {bitfile_location} ] && mkdir -p {bitfile_location}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact}  > {bitfile_location}/{artifact}']


for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Install
#### Install Vivado Lab Tools

Extract and install Xilinx Vivado Lab package

In [None]:
# parse package name to get version
import re

pattern = re.compile('(Xilinx_)?Vivado_Lab_Lin_([\d]{4}.[\d]_[\d]{4}_[\d]{4}).tar.gz')
m = pattern.match(os.path.basename(labtools_package))
if m:
    version = m[2]
else:
    version = 'Unknown'


vivado_install_dir = "/tools/Xilinx"
 
commands = list()
commands.append(f'tar xf {tools_location}/{os.path.basename(labtools_package)} -C {tools_location}')
commands.append(f'sudo {tools_location}/Vivado_Lab_Lin_{version}/xsetup --agree 3rdPartyEULA,XilinxEULA --batch Install --edition "Vivado Lab Edition (Standalone)" --location {vivado_install_dir}')


for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)

print('Done')

Install necessary libs

In [None]:
commands = list()
commands.append(f'sudo apt install -y lsb &> /tmp/fpga-apt-install.out')
commands.append(f'sudo {tools_location}/Vivado_Lab_Lin_{version}/installLibs.sh &> /tmp/fpga-installLibs.out')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)

print('Done')

Install JTAG drivers

In [None]:
import re

pattern = re.compile('([\d]{4}.[\d])_([\d]{4}_[\d]{4})')
m = pattern.match(os.path.basename(version))
if m:
    version_major = m[1]
else:
    version_major = 'Unknown'


commands = list()
commands.append(f'sudo {vivado_install_dir}/Vivado_Lab/{version_major}/data/xicom/cable_drivers/lin64/install_script/install_drivers/install_drivers')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

#### Reboot

In [None]:
reboot = 'sudo reboot'
try:
    print(reboot)
    node.execute(reboot)
    
    slice.wait_ssh(timeout=360,interval=10,progress=True)

    print("Now testing SSH abilites to reconnect...",end="")
    slice.update()
    slice.test_ssh()
    print("Reconnected!")

except Exception as e:
    print(f"Fail: {e}")  

### <FIXME> manage hardcoded enp7s0 with fablib api
node.execute("sudo ip link set dev enp6s0 up")
node.config()
print("Done")    

#### Get the JTAG ID


In [None]:
commands = list()
commands.append(f'source {vivado_install_dir}/Vivado_Lab/{version_major}/settings64.sh && vivado_lab -mode batch -source ~/P4OpenNIC_Public/FABRIC-P4/scripts/get_jtag.tcl | grep -o "Xilinx/[^[:space:]]*" | cut -d/ -f2 > /tmp/fpga-jtag.id')
commands.append(f'echo "0000:1f:00.0" > /tmp/fpga-bdf.id')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)

print('Done')    

## Step 5: Flash the card 

Here we continue to follow the instructions [here](https://github.com/OCT-FPGA/P4OpenNIC_Public/tree/main/FABRIC-P4).

__Note:__ Prior to proceeding please be sure to follow these steps to open the iDrac interface of the server you are working on as it will need to be __cold rebooted__ (in the case of a persistent reflash) or __warm rebooted__ (in the case of a RAM reflash):

1. Login to FABRIC VPN
2. Login to the head node of the cluster on which this VM is created
3. Login to the specific worker from the head node
    - Use Step 3 of this notebook to locate the worker using `Host` column of the sliver table
4. Run the following command to retrieve the asset tag: `cat /sys/class/dmi/id/chassis_serial` 
5. Login to the iDrac by using the [following formula](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/1490812935/iDRAC+Setup): `192.168.<`[site index](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/168624158/List+of+Hanks+FABRIC+Site+Information)`>.<100 + worker index>`.
    - indi-w2.fabric-testbed.net -> 192.168.36.102 
6. In the 'System Information' pane of iDrac compare the 'Service Tag' to the output of 4 above. This is to make sure we are rebooting the right server.

Now we flash the card - write to __persistent flash__, which is generally what you want (takes 20 minutes)

After the process completes, shutoff the VM and __cold-reboot__ the worker node if you flashed persistent flash or __warm-reboot__ if flashing was into RAM. 

In [None]:
command = 'sudo /sbin/halt'

stdout, stderr = node.execute(command)

1. Use the iDrac console to perform a reboot (cold or warm)
2. After the reboot completes delete the slice using Step 9
3. Login to the worker node as root and validate that Xilinx programming has been activated. Generally you should see something like:
```
ubuntu@fpga-node:~$ lspci | grep Xilinx
25:00.0 Network controller: Xilinx Corporation Device 903f
25:00.1 Network controller: Xilinx Corporation Device 913f
```


## Step 6: Extend Slice

Get slice details and extend the slice. This cell is optional and can be executed as-needed.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

Renew by 14 days

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Step 7: Delete the slice

Delete the slice after completing the programming.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.delete()