# Functional Test 6.1.1 - Flash XRT into an FPGA and validate

THIS REQUIRES SWITCHING TO Q35 CHIPSET DEFINITION IN OPENSTACK BEFORE THIS WORKS 

This Jupyter notebook will allow you to flash XRT shell into the FPGA persistent flash. The end result is an FPGA that even after a cold reboot of the server retains its programming with a standard Xilinx XRT shell. This procedure can be used to reset the FPGA at a given site after experiments or initialize a newly installed device.

It generally follows the procedures described in Xilinx [UG1301](https://docs.xilinx.com/r/en-US/ug1301-getting-started-guide-alveo-accelerator-cards/Introduction) for U280 devices.

It is assumed you are operating as part of the FABRIC Maintenance project and have access to the persistent volume named `fpga-tools` created on EDC where XRT and other releavent tools are downloaded. 

## Step 0: Re-create a VM attached to fpga-tools volume on EDC

In order to have access to necessary tools execute the notebook to [re-create a VM attached](../../fablib_api/fabric_fpgas/fpga_tools_storage.ipynb) to the `fpga-tools` persistent storage. You must execute it as a member of FABRIC Staff project. 

## Step 1: Identify and isolate the worker node

Unless the whole site is already in maintenance, using administrator tools identify the worker node with FPGA and put it in maintenance making sure it does not have experimenter VMs on it. You can check the [aggregate ads in JSON](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON) to make sure you are targeting the right worker.

## Step 2: Provision a VM on the desired worker with attached FPGA

Create another slice with a VM attached to the FPGA on the desired site and a FABNetv4 interface to reach the tools VM in Step 0.

In [None]:
# Initialize FABlib

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

Define slice parameters - re-execute as needed to run any of the steps in this notebook.

In [None]:
# setup parameters including site name
site='INDI'
FPGA_CHOICE='FPGA_Xilinx_U280'

# name the slice and the node 
slice_name=f'Persistent XRT FPGA Slice with {FPGA_CHOICE} on {site}'
node_name='fpga-node'

# username and password used in storage VM
nginx_user = "fpga_tools"
nginx_password = "vewyvweysecret"

# should not need to edit below
print(f'Will create slice "{slice_name}" with node "{node_name}"')

# don't edit - convert from FPGA type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "FPGA_Xilinx_U280": "fpga_u280_available",
}

column_name = choice_to_column.get(FPGA_CHOICE, "Unknown")

fablib.get_image_names()

Create a slice with FPGA component on selected site and access to FABNetv4 network

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)

# Add node with a 200G drive and 8 of CPU cores using Ubuntu 20 image
node = slice.add_node(name=node_name, site=site, cores=8, disk=200, image='default_ubuntu_20')
node.add_component(model=FPGA_CHOICE, name='fpga1')
# be sure to add FABNetv4 so we can communicate with the slice that has the tools
node.add_fabnet()

#Submit Slice Request
slice.submit();

## Inspect the slice and optionally add IPv4/IPv6 NAT64 and /etc/hosts entry for storage VM

In [None]:
slice = fablib.get_slice(slice_name)

node = slice.get_node(name=node_name)              

node_addr = node.get_interface(network_name=f'FABNET_IPv4_{node.get_site()}').get_ip_addr()

slice.show()
slice.list_nodes()
slice.list_networks()
print(f'Node FABNetV4 IP Address is {node_addr}')

To be able to reach GitHub and other IPv4 resources, you should execute this once when creating the slice. You may see `sudo: unable to resolve host fpga-node: Temporary failure in name resolution` - ignore it.

In [None]:
from ipaddress import ip_address, IPv6Address    

isipv6_site = False
# If the node is an IPv6 Node then configure NAT64
if type(ip_address(node.get_management_ip())) is IPv6Address:
    isipv6_site = True
    print(f'Node {node.get_name()} has an IPv6 management address, will update DNS configuration')

# this code will be executed if the node uses an IPv6 site. See the notebook 
# 'Access non-IPv6 services (i.e. GitHub) from IPv6 FABRIC nodes' for more details

if isipv6_site:
    node.upload_file('../../fablib_api/accessing_ipv4_services_from_ipv6_nodes/nat64.sh', 'nat64.sh')
    stdout, stderr = node.execute(f'chmod +x nat64.sh && ./nat64.sh')
    print(f'Uploaded and executed NAT64 DNS setup script to node {node.get_name()}')

Add storage VM into /etc/hosts for convenience. Consult the storage slice for the FABNetv4 IPv4 address of that VM

In [None]:
storage_vm_ip = "10.132.129.2"

commands = list()
commands.append(f"echo {storage_vm_ip} fpga-tools-host | sudo tee -a /etc/hosts")
commands.append(f"echo 127.0.0.1 {node_name} | sudo tee -a /etc/hosts")

for command in commands:
    stdout, stderr = node.execute(command)

## Fetch XRT software into the node from storage VM and install them

Fill in the appropriate version - this is written for Ubuntu 20.04, using XRT 2023.1.

In [None]:
xilinx_packages = ['xrt_202220.2.14.354_20.04-amd64-xrt.deb', 
                   'xilinx-u280-gen3x16-xdma_2022.2_2022_1015_0317-all.deb.tar.gz']

commands = list()
for package in xilinx_packages:
    command = f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{package}  > {package}'
    commands.append(command)
    
print('Fetching Xilinx packages')
for command in commands:
    stdout, stderr = node.execute(command)

Next we install the packages (reminder we are following [Xilinx documentation](https://docs.xilinx.com/r/en-US/ug1301-getting-started-guide-alveo-accelerator-cards/Installing-the-Deployment-Software))

In [None]:
commands = list()

print('Update DEBs')
commands.append('sudo apt update -y')

for command in commands:
    stdout, stderr = node.execute(command)

Reboot

In [None]:
reboot = 'sudo reboot'

print(reboot)
node.execute(reboot)

slice.wait_ssh(timeout=360,interval=10,progress=True)

print("Now testing SSH abilites to reconnect...",end="")
slice.update()
slice.test_ssh()
print("Reconnected! Resetting network configuration")

node.execute("sudo ip link set dev ens8 up")
node.config()
print("Done")

Continue installing

In [None]:
commands = list()
commands.append('sudo apt install -y linux-headers-`uname -r`')

print('Installing Kernel headers')
for command in commands:
    stdout, stderr = node.execute(command)

Reboot again if needed

In [None]:
reboot = 'sudo reboot'

print(reboot)
node.execute(reboot)

slice.wait_ssh(timeout=360,interval=10,progress=True)

print("Now testing SSH abilites to reconnect...",end="")
slice.update()
slice.test_ssh()
print("Reconnected! Resetting network configuration")

node.execute("sudo ip link set dev ens8 up")
node.config()
print("Done")

Install Xilinx packages (XRT also builds and installs two kernel modules - xocl and xclmgmt)

In [None]:
deploy_dir = 'xrt_deploy'

commands = list()
commands.append('sudo apt install -y pciutils usbutils')
commands.append('sudo apt install -y ./xrt*.deb')
commands.append(f'mkdir -p {deploy_dir} && tar -zxf {xilinx_packages[1]} -C {deploy_dir}')
commands.append(f'cd {deploy_dir} && sudo apt install -y ./*.deb')

print('Installing Xilinx packages')
for command in commands:
    stdout, stderr = node.execute(command)

## Flash the card

Here we continue instructions here to flash the card with XRT, cold reboot the server node (which may be triggered by the flash).

## Extend Slice

Get slice details

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

Renew by 14 days

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Delete the slice

Delete when no longer needed.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.delete()