# Functional Test 6.1.1 - Return the FPGA back to XRT golden image using ESnet tools

This Jupyter notebook will allow you to flash the FPGA from previously flashed ONS-based state to XRT-based state. It relies on ESnet workflow dockers to do so. It should not matter which experimenter artifact the dockers are based on so long as it is valid. After the completion a cold-boot is required to return FPGA to XRT state. 

This procedure can be used to reset the FPGA at a given site after experiments or initialize a newly installed device if the next user wants to use XRT and not ESnet workflow. It generally follows the procedures described by [ESnet](https://github.com/esnet/esnet-smartnic-fw/blob/main/sn-stack/README.INSTALL.md#optional-remove-the-esnet-smartnic-flash-image-from-the-fpga-card-to-revert-to-factory-image) for U280 devices. Note it refers to it as reverting to 'Factory Image', which for U280 is XRT. 

It is assumed you are operating as part of the FABRIC Maintenance project and have access to the persistent volume named `fpga-tools` created on EDC where releavent tools are downloaded.

This notebook is broken up into steps and does not have to be executed in sequence. Note that unlike the [6.1.2 notebook](<../Acceptance Tests 6.1.2/fpga_flash_esnet.ipynb>) this one does not offer the ability to build ESnet docker images - if you need to do that, execute the appropriate steps in that other notebook first. 

- Step 2 Has multiple cells that initialize fablib and variables and creates a new slice
  - The cells initializing the state __must always be executed__.
  - Creating a slice is optional and can be skipped if slice already exists
- Step 3 is a re-entrant cell and can be used any time you want to refresh information about the existing slice
- Step 4 retrieves existing docker images built in Step 4 at a prior time. This is useful if you are reflashing multiple FPGAs with the same code - you can execute Step 5 once and the skip it for all further VMs simply retrieving the artifacts from Storage VM in this step
- Step 5 that's the actual flashing and it involves identifying and rebooting the underlying server.
- Step 6 is an optional 'extend the slice' step that can be excuted if Steps 2 and 3 have been executed.
- Step 7 is 'delete the slice' that can be executed any time after Steps 2 and 3 have been executed.
 

## Step 0: Re-create a VM attached to fpga-tools volume on EDC

In order to have access to necessary tools execute the notebook to [re-create a Storage VM attached](../../fablib_api/fabric_fpgas/fpga_tools_storage.ipynb) to the `fpga-tools` persistent storage. You must execute it as a member of FABRIC Staff project. 

## Step 1: Identify and isolate the worker node

Unless the whole site is already in maintenance, using administrator tools identify the worker node with FPGA and put it in maintenance making sure it does not have experimenter VMs on it. You can check the [aggregate ads in JSON](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON) to make sure you are targeting the right worker.

## Step 2: Provision a VM on the desired worker with attached FPGA and FABNetv4 connection

Create a slice with a VM attached to the FPGA on the desired site and a FABNetv4 interface to reach the Storage VM in Step 0.

### Initialize fablib and variables

In [None]:
# Initialize FABlib

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

Define slice parameters - re-execute as needed to run any of the steps in this notebook.

In [None]:
# user artifact should be deposited Storage VM into /mnt/fpga_tools/static/artifacts/<owner username>/<version> since names of artifacts may be similar or same.
artifact_owner_username = 'msada'
artifact_version = 'v2'

# edit the name of the user-provided artifact and labtools and DPDK docker images stored in Storage VM as needed
artifact = 'artifacts.au280.p4_only.0.zip'
dpdk_image = 'smartnic-dpdk-docker.tar.gz'
labtools_image = 'xilinx-labtools-docker-2023.1_0507_1903.tar.gz'

# the image built from artifact usually has the same name regardless of the artifact
artifact_image = 'esnet-smartnic-fw.tar.gz'
sn_stack_tar = 'sn-stack.tar.gz'

# Xilins labtools package in Storage VM
labtools_package = 'Xilinx_Vivado_Lab_Lin_2023.1_0507_1903.tar.gz'

# setup site name
site='SRI'
node_name='fpga-node'

# FABNetv4 of storage VM - consult the Storage VM slice for this FABNetv4 IP address
storage_vm_ip = "10.132.142.2"
# username and password used in storage VM
nginx_user = "fpga_tools"
nginx_password = "secret-password"

#
# should not need to edit below
FPGA_CHOICE='FPGA_Xilinx_U280'

# name the slice and the node 
slice_name=f'Persistent esnet-smartnic slice with {FPGA_CHOICE} on {site}'
print(f'Will create slice "{slice_name}" with node "{node_name}"')

# don't edit - convert from FPGA type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "FPGA_Xilinx_U280": "fpga_u280_available",
}

column_name = choice_to_column.get(FPGA_CHOICE, "Unknown")

#fablib.get_image_names()

### Create a new slice 

Create a slice with FPGA component on selected site and access to FABNetv4 network.

__NOTE:__ It is important to use a Docker-enabled image so that Docker can properly build docker images on IPv6-enabled sites.

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)
image = 'docker_ubuntu_20'

# Add node with a 200G drive and 8 of CPU cores using Ubuntu 20 image
node = slice.add_node(name=node_name, site=site, cores=8, disk=200, image=image)
node.add_component(model=FPGA_CHOICE, name='fpga1')
# be sure to add FABNetv4 so we can communicate with the slice that has the tools
node.add_fabnet()

# use the postboot script from docker examples
node.add_post_boot_upload_directory('../../fablib_api/docker_containers/node_tools','.')
node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node.add_post_boot_upload_directory('node_config','.')
node.add_post_boot_execute(f'chmod a+x node_config/ipv6-and-docker-plugins.sh && node_config/ipv6-and-docker-plugins.sh')

# Submit Slice Request
slice.submit();

Add storage VM into /etc/hosts for convenience. __Consult the storage slice for the FABNetv4 IPv4 address of that VM.__

In [None]:
slice = fablib.get_slice(slice_name)
node = slice.get_node(name=node_name)   

commands = list()
commands.append(f"echo {storage_vm_ip} fpga-tools-host | sudo tee -a /etc/hosts")
commands.append(f"echo 127.0.0.1 {node_name} | sudo tee -a /etc/hosts")

for command in commands:
    stdout, stderr = node.execute(command)

## Step 3: Inspect the slice
Note that nat64 configuration is done at boot time.

In [None]:
slice = fablib.get_slice(slice_name)

node = slice.get_node(name=node_name)              

node_addr = node.get_interface(network_name=f'FABNET_IPv4_{node.get_site()}').get_ip_addr()

slice.show()
slice.list_nodes()
slice.list_networks()
print(f'Node FABNetV4 IP Address is {node_addr}')

## Step 4: Retrieve docker images from Storage VM and install locally

If you prebuilt the images and are just flashing a new FPGA, cells in this step will help you retrieve the previously built images from Storage VM and then install them into Docker on this VM.

Most of the time you don't want to rebuild the DPDK and the Labtools images. You may need to build the image from user artifact, but if it is needed on multiple sites, it should be reused. You can select steps below as needed.

### Retrieve and install DPDK Docker image

Retrieve previously built DPDK docker image and install it. Image names are set in the second cell of this notebook.

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/smartnic-docker-images/{dpdk_image}  > {dpdk_image}')
commands.append(f'docker load < {dpdk_image}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Retrieve and install Xilinx Labtools docker image

Retrieve previously built labtools docker image and install it. Image names are set in the second cell of this notebook.

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/smartnic-docker-images/{labtools_image}  > {labtools_image}')
commands.append(f'docker load < {labtools_image}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Retrieve and install firmware image built from experimenter artifact
Retrieve previously built firmware image and install it. 

In [None]:
# Retrieve the image and install it

commands = list()
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact_image}  > {artifact_image}')
commands.append(f'docker load < {artifact_image}')
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{sn_stack_tar}  > {sn_stack_tar}')
commands.append(f'tar -zxf {sn_stack_tar}')

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node.execute(command)
    
print('Done')

### Check loaded docker images
Prior to flashing/loading bitfile into RAM there should be three docker images which you either built directly or retrieved from Storage VM - __xilinx-labtools-docker, smartnic-dpdk-docker, and esnet-smartnic-fw__. They are always named this way so the docker compose in the flashing step can find them.

In [None]:
command = 'docker image ls'

stdout, stderr = node.execute(command)

## Step 5: Flash the card (revert to factory XRT image)

Here we continue to follow the instructions [here](https://github.com/esnet/esnet-smartnic-fw/blob/main/sn-stack/README.INSTALL.md#optional-remove-the-esnet-smartnic-flash-image-from-the-fpga-card-to-revert-to-factory-image)

__Note:__ Prior to proceeding please be sure to follow these steps to open the iDrac interface of the server you are working on as it will need to be __cold rebooted__:

1. Login to FABRIC VPN
2. Login to the head node of the cluster on which this VM is created
3. Login to the specific worker from the head node
    - Use Step 3 of this notebook to locate the worker using `Host` column of the sliver table
4. Run the following command to retrieve the asset tag: `cat /sys/class/dmi/id/chassis_serial` 
5. Login to the iDrac by using the [following formula](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/1490812935/iDRAC+Setup): `192.168.<`[site index](https://fabric-testbed.atlassian.net/wiki/spaces/FP/pages/168624158/List+of+Hanks+FABRIC+Site+Information)`>.<100 + worker index>`.
    - indi-w2.fabric-testbed.net -> 192.168.36.102 
6. In the 'System Information' pane of iDrac compare the 'Service Tag' to the output of 4 above. This is to make sure we are rebooting the right server.

Verify the card is flashed with ONS - you should see two devices identified as 'Xilinx Corporation Device' 903f and 913f.

In [None]:
command = 'lspci | grep Xilinx'
stdout, stderr = node.execute(command)

__Only flash the card back to XRT if ONS is present, otherwise abort by deleting the slice__

In [None]:
command = 'cd esnet-smartnic-fw/sn-stack/ && docker compose --profile smartnic-flash run --rm smartnic-flash-remove && docker compose down -v --remove-orphans'

stdout, stderr = node.execute(command)

After the process completes, shutoff the VM and __cold-reboot__ the worker node. 

In [None]:
command = 'sudo /sbin/halt'

stdout, stderr = node.execute(command)

1. Use the iDrac console to perform a reboot (cold or warm)
2. After the reboot completes delete the slice using Step 9
3. Check on the worker node (no need to create a new VM). Generally you should see something like (Alveo Golden Image === XRT)
```
$ lspci | grep Xilinx
25:00.0 Processing accelerators: Xilinx Corporation Alveo U280 Golden Image
```

## Step 7: Extend Slice

Get slice details and extend the slice. This cell is optional and can be executed as-needed.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

Renew by 14 days

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Step 8: Delete the slice

Delete the slice after completing the programming.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.delete()