# Functional Test 6.1.4 - Flash XDMA shell into an FPGA persistent flash or RAM and validate

This Jupyter notebook will allow you to flash the FPGA based on XDMA shell (TBD info/link). If the persistent flash is used, the end result is an FPGA that even after a cold reboot of the server retains its programming with a standard [Xilinx XDMA shell](https://github.com/Xilinx/dma_ip_drivers/tree/master/XDMA). If the program is flashed into RAM, a warm reboot of the server will activate it. 

It is assumed you are operating as part of the FABRIC Maintenance project and have access to the persistent volume named `fpga-tools` created on EDC where releavent tools are downloaded.

Preparation of the FPGA for XDMA shell experiments follow the steps below:
Also read [FIP-1522](https://fabric-testbed.atlassian.net/browse/FIP-1522?focusedCommentId=25987)

1. Create a slice, reset the FPGA with the revert_to_golden.mcs - delete the slice

   - Use the notebook and procedure in 6.1.3, use the revert_to_golden.mcs file to program the FPGA
   - Relevant information can be found on Xilinx KnowledgeBase article [71757 - Alveo Data Center Accelerator Card - Reverting Card to Factory image](https://adaptivesupport.amd.com/s/article/71757?language=en_US) 

2. Cold-reboot the server

3. Create a slice, install XRT and deployment target platform - flash the FPGA

   - Xilinx Runtime (XRT) xrt_202310.2.15.225_22.04-amd64-xrt.deb
   - Deployment Target Platform (Packages in xilinx-u280-gen3x16-xdma_2023.2_2023_1014_0238-all.deb.tar.gz)
   - Flash the FPGA (/opt/xilinx/xrt/bin/xbmgmt program --base -d 1f:00.0)

5. Delete the slice

6. Cold-reboot the server

7. Create a slice, install XRT, and run the application



## Step 0: Re-create a VM attached to fpga-tools volume on EDC

In order to have access to necessary tools execute the notebook to [re-create a Storage VM attached](../../fablib_api/fabric_fpgas/fpga_tools_storage.ipynb) to the `fpga-tools` persistent storage. You must execute it as a member of FABRIC Staff project. 

## Step 1: Identify and isolate the worker node

Unless the whole site is already in maintenance, using administrator tools identify the worker node with FPGA and put it in maintenance making sure it does not have experimenter VMs on it. You can check the [aggregate ads in JSON](https://github.com/fabric-testbed/aggregate-ads/tree/main/JSON) to make sure you are targeting the right worker.

## Step 2: Provision a VM on the desired worker with attached FPGA and FABNetv4 connection

Create a slice with a VM attached to the FPGA on the desired site and a FABNetv4 interface to reach the Storage VM in Step 0.

### Initialize fablib and variables

In [None]:
# Initialize FABlib

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

Define slice parameters - re-execute as needed to run any of the steps in this notebook.

In [75]:
# user artifact should be deposited Storage VM into /mnt/fpga_tools/static/artifacts/<owner username>/<version> since names of artifacts may be similar or same.
artifact_owner_username = 'han_zha'
artifact_version = 'v3'

# edit the name of the user-provided artifact and labtools and DPDK docker images stored in Storage VM as needed

artifact = 'hello_world'
artifact_xclbin = 'vadd.xclbin'

artifact_golden = 'revert_to_golden.mcs'


# Xilinx xrt package in Storage VM
xrt_package = 'xrt/xrt_202310.2.15.225_22.04-amd64-xrt.deb'


# Xilinx deployment target package in Storage VM
deployment_tgt_package = 'alveo-packages/xilinx-u280-gen3x16-xdma_2023.2_2023_1014_0238-all.deb.tar.gz'

# Xilinx xbflash2 package in Storage VM
xbflash2_package = 'alveo-packages/xrt_202210.2.13.466_20.04-amd64-xbflash2.deb'


# FABNetv4 of storage VM - consult the Storage VM slice for this FABNetv4 IP address
storage_vm_ip = "10.132.136.2"
# username and password used in storage VM
nginx_user = "fpga_tools"
nginx_password = "secret-password"

#
# should not need to edit below
FPGA_CHOICE='FPGA_Xilinx_U280'


# don't edit - convert from FPGA type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "FPGA_Xilinx_U280": "fpga_u280_available",
}

column_name = choice_to_column.get(FPGA_CHOICE, "Unknown")

fpga_bdf = "0000:1f:00.0"

#fablib.get_image_names()

In [76]:
# setup site name
site='MICH'
node_name='fpga-node'

# name the slice and the node 
slice_name=f'Persistent slice with {FPGA_CHOICE} on {site}'
print(f'Will create slice "{slice_name}"')

Will create slice "Persistent slice with FPGA_Xilinx_U280 on MICH"


### Create a new slice 

Create a slice with FPGA component on selected site and access to FABNetv4 network.

__NOTE:__ It is important to use a Docker-enabled image so that Docker can properly build docker images on IPv6-enabled sites.

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)
image = 'default_ubuntu_22'
###image = 'docker_ubuntu_20'

# Add node with a 200G drive and 8 of CPU cores using Ubuntu 20 image
node = slice.add_node(name=node_name, site=site, cores=8, disk=200, image=image)
node.add_component(model=FPGA_CHOICE, name='fpga1')
# be sure to add FABNetv4 so we can communicate with the slice that has the tools
node.add_fabnet()

# use the postboot script from docker examples
node.add_post_boot_upload_directory('../../fablib_api/docker_containers/node_tools','.')
node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node.add_post_boot_upload_directory('node_config','.')
node.add_post_boot_execute(f'chmod a+x node_config/ipv6-and-docker-plugins.sh && node_config/ipv6-and-docker-plugins.sh')

# Submit Slice Request
slice.submit();

Add storage VM into /etc/hosts for convenience. __Consult the storage slice for the FABNetv4 IPv4 address of that VM.__

In [None]:
slice = fablib.get_slice(slice_name)
node = slice.get_node(name=node_name)   

commands = list()
commands.append(f"echo {storage_vm_ip} fpga-tools-host | sudo tee -a /etc/hosts")
commands.append(f"echo 127.0.0.1 {node_name} | sudo tee -a /etc/hosts")

for command in commands:
    stdout, stderr = node.execute(command)

## Step 3: Inspect the slice
Note that nat64 configuration is done at boot time.

In [None]:
slice = fablib.get_slice(slice_name)

node = slice.get_node(name=node_name)              

node_addr = node.get_interface(network_name=f'FABNET_IPv4_{node.get_site()}').get_ip_addr()

slice.show()
slice.list_nodes()
slice.list_networks()
print(f'Node FABNetV4 IP Address is {node_addr}')

## Step 4: Fetch Tools



### 4.1 Artifacts

In [None]:
bitfile_location = '~/fpga-bitfile'

commands = [f'[ ! -d {bitfile_location} ] && mkdir -p {bitfile_location}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact}  > {bitfile_location}/{artifact}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/artifacts/{artifact_owner_username}/{artifact_version}/{artifact_xclbin}  > {bitfile_location}/{artifact_xclbin}',
            f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/xrt/{artifact_golden}  > {bitfile_location}/{artifact_golden}']


for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')

### 4.2 Alveo Packages
https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/alveo/u280.html

1. Xilinx Runtime

   The Xilinx runtime (XRT) is a low level communication layer (APIs and drivers) between the host and the card.
   - [xrt_202320.2.16.204_20.04-amd64-xrt.deb](https://www.xilinx.com/bin/public/openDownload?filename=xrt_202320.2.16.204_20.04-amd64-xrt.deb)

 
2. Deployment Target Platform

   The deployment target platform is the communication layer physically implemented and flashed into the card.
   - [xilinx-u280-gen3x16-xdma_2023.2_2023_1014_0238-all.deb.tar.gz](https://www.xilinx.com/bin/public/openDownload?filename=xilinx-u280-gen3x16-xdma_2023.2_2023_1014_0238-all.deb.tar.gz)


#### 4.2.1 Download - Xilinx Runtime, Deployment Target Platform, xbflash2

In [None]:
import os.path

tools_location = '~/xilinx-labtools/alveo-packages'

commands = list()
commands.append(f'[ ! -d {tools_location} ] && mkdir -p {tools_location}')
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{xbflash2_package}  > {tools_location}/{os.path.basename(xbflash2_package)}')
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{xrt_package}  > {tools_location}/{os.path.basename(xrt_package)}')
commands.append(f'curl -k -u {nginx_user}:{nginx_password} https://fpga-tools-host/fpga-tools/{deployment_tgt_package}  > {tools_location}/{os.path.basename(deployment_tgt_package)}')

for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')

#### 4.2.2 Install - xbflash2 -> (Not Tested)

In [2]:
#tools_location = '~/xilinx-labtools/alveo-packages'

#commands = list()
#commands.append(f'sudo apt-get install -y {tools_location}/xrt_202210.2.13.466_20.04-amd64-xbflash2.deb &> /tmp/fpga-apt-xbflash2.log')

#for command in commands:
#   print(f'--- Node {node_name}: Executing command: {command}')
#   stdout, stderr = node.execute(command)
#print('--- Done')

#### 4.2.3 Install - Xilinx Runtime

In [None]:
import os.path

tools_location = '~/xilinx-labtools/alveo-packages'

commands = list()
commands.append(f'(sudo apt -y update) &> /tmp/fpga-apt-update.log')
commands.append(f'cd {tools_location} && sudo apt install -y ./{os.path.basename(xrt_package)} &> /tmp/fpga-apt-xrt.log')


for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')

#### 4.2.4 Install - Deployment Target Platform 

In [None]:
import os.path

tools_location = '~/xilinx-labtools/alveo-packages'

commands = list()
commands.append(f'(sudo apt -y update) &> /tmp/fpga-apt-update.log')
commands.append(f'[ ! -d {tools_location}/xrt_platform ] && mkdir -p {tools_location}/xrt_platform')
commands.append(f'sudo tar -zxvf {tools_location}/{os.path.basename(deployment_tgt_package)} -C {tools_location}/xrt_platform')
commands.append(f'cd {tools_location}/xrt_platform && sudo apt-get install -y ./*.deb &> /tmp/fpga-apt-xrt-deployment.log')


for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')


## Step 5: Flash the Card 

<div class="alert alert-block alert-info">
<b>Pre-requisite:</b> FPGA is already flashed with the golden image (revert_to_golden.mcs) and the server is cold-rebooted  
</div>

<div class="alert alert-block alert-info">
<b>Stage-1:</b> Execute 5.1 > delete slice > cold-reboot the server
</div>

<div class="alert alert-block alert-info">
<b>Stage-2:</b> After cold-rebooting the server, create a slice > Execute 5.2 > Execute 5.3
</div>

### 5.1 Program the base partition with [xbmgmt](https://xilinx.github.io/XRT/master/html/xbmgmt.html#xbmgmt-program)

In [None]:
import os.path

tools_location = '~/xilinx-labtools/alveo-packages'

commands = list()
commands.append(f'source /opt/xilinx/xrt/setup.sh && sudo /opt/xilinx/xrt/bin/xbmgmt program --base -d 1f:00.0')


for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')


<div class="alert alert-block alert-success">
<b>5.1 Desired Output</b> We should see an output similar to the following
</div>

```
root@fpga-node:~# sudo /opt/xilinx/xrt/bin/xbmgmt program --base -d 1f:00.0
----------------------------------------------------
Device : [0000:1f:00.0]

Current Configuration
  Platform             : xilinx_u280_GOLDEN_8
  SC Version           : INACTIVE
  Platform ID          : N/A


Incoming Configuration
  Deployment File      : partition.xsabin
  Deployment Directory : /lib/firmware/xilinx/283bab8f654d8674968f4da57f7fa5d7
  Size                 : 135,050,024 bytes
  Timestamp            : Sat Feb 22 04:42:05 2025

  Platform             : xilinx_u280_gen3x16_xdma_base_1
  SC Version           : 4.3.28
  Platform UUID        : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
----------------------------------------------------
Actions to perform:
  [0000:1f:00.0] : Program base (FLASH) image
----------------------------------------------------
Are you sure you wish to proceed? [Y/n]: Y

[0000:1f:00.0] : Updating base (e.g., shell) flash image...
Bitstream guard installed on flash @0x1002000
Persisted 716128 bytes of meta data to flash 0 @0x7f51274
Extracting bitstream from MCS data:
...............................................
Extracted 48844316 bytes from bitstream @0x1002000
Writing bitstream to flash 0:
...............................................
Bitstream guard removed from flash
INFO     : Base flash image has been programmed successfully.
----------------------------------------------------
Report
  [0000:1f:00.0] : Factory or Recovery image detected. Reflash the device after the reboot to update the SC firmware.
  [0000:1f:00.0] : Successfully flashed the base (e.g., shell) image

Device flashed successfully.
****************************************************
Cold reboot machine to load the new image on device.
****************************************************
```

### 5.2 Check the FPGA

In [None]:
commands = list()

commands.append(f'sudo lsmod | grep xocl')
commands.append(f'sudo lsmod | grep xcl')
commands.append(f'sudo /opt/xilinx/xrt/bin/xbmgmt examine')
commands.append(f'sudo /opt/xilinx/xrt/bin/xbmgmt examine -r platform --device 0000:1f:00.0')

for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')

<div class="alert alert-block alert-success">
<b>5.2 Desired Output</b> We should see an output similar to the following
</div>

```
--- Node fpga-node: Executing command: sudo lsmod | grep xocl
xocl                 1945600  0
libcrc32c              16384  8 nf_conntrack,nf_nat,xclmgmt,xocl,openvswitch,btrfs,nf_tables,raid456
drm                   622592  4 drm_kms_helper,xocl,virtio_gpu
--- Node fpga-node: Executing command: sudo lsmod | grep xcl
xclmgmt              1126400  0
libcrc32c              16384  8 nf_conntrack,nf_nat,xclmgmt,xocl,openvswitch,btrfs,nf_tables,raid456
--- Node fpga-node: Executing command: sudo /opt/xilinx/xrt/bin/xbmgmt examine
System Configuration
  OS Name              : Linux
  Release              : 5.15.0-127-generic
  Version              : #137-Ubuntu SMP Fri Nov 8 15:21:01 UTC 2024
  Machine              : x86_64
  CPU Cores            : 8
  Memory               : 7936 MB
  Distribution         : Ubuntu 22.04.5 LTS
  GLIBC                : 2.35
  Model                : OpenStack Compute

XRT
  Version              : 2.15.225
  Branch               : 2023.1
  Hash                 : adf27adb3cfadc6e4c41d6db814159f1329b24f3
  Hash Date            : 2023-05-03 10:13:19
  XOCL                 : 2.15.225, adf27adb3cfadc6e4c41d6db814159f1329b24f3
  XCLMGMT              : 2.15.225, adf27adb3cfadc6e4c41d6db814159f1329b24f3

Devices present
BDF             :  Shell                            Platform UUID                         Device ID        Device Ready*
--------------------------------------------------------------------------------------------------------------------------
[0000:1f:00.0]  :  xilinx_u280_gen3x16_xdma_base_1  283BAB8F-654D-8674-968F-4DA57F7FA5D7  mgmt(inst=7936)  Yes


* Devices that are not ready will have reduced functionality when using XRT tools
--- Node fpga-node: Executing command: sudo /opt/xilinx/xrt/bin/xbmgmt examine -r platform --device 0000:1f:00.0

-------------------------------------------------
[0000:1f:00.0] : xilinx_u280_gen3x16_xdma_base_1
-------------------------------------------------
Flash properties
  Type                 : spi
  Serial Number        : 21770329D011

Device properties
  Type                 : u280
  Name                 : ALVEO U280 PQ
  Config Mode          : 0x7
  Max Power            : 225W

Flashable partitions running on FPGA
  Platform             : xilinx_u280_gen3x16_xdma_base_1
  SC Version           : 4.3.28
  Platform UUID        : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
  Interface UUID       : FB2B2C5A-19ED-6359-3FEA-95F51FBC8EB9

Flashable partitions installed in system
  <none found>


  Mac Address          : 00:0A:35:0E:26:30
                       : 00:0A:35:0E:26:31

WARNING  : No shell is installed on the system.

--- Done
```

### 5.3 Run Test Application

In [None]:
bitfile_location = '~/fpga-bitfile'

commands = list()

commands.append(f'cd {bitfile_location} && chmod +x {artifact}')
commands.append(f'')
commands.append(f'source /opt/xilinx/xrt/setup.sh && {bitfile_location}/hello_world {bitfile_location}/vadd.xclbin')

for command in commands:
   print(f'--- Node {node_name}: Executing command: {command}')
   stdout, stderr = node.execute(command)
print('--- Done')

<div class="alert alert-block alert-success">
<b>5.3 Desired Output</b> We should see an output similar to the following
</div>

```
--- Node fpga-node: Executing command: cd ~/fpga-bitfile && chmod +x hello_world
--- Node fpga-node: Executing command:
--- Node fpga-node: Executing command: source /opt/xilinx/xrt/setup.sh && ~/fpga-bitfile/hello_world ~/fpga-bitfile/vadd.xclbin
Autocomplete enabled for the xbutil command
Autocomplete enabled for the xbmgmt command
XILINX_XRT        : /opt/xilinx/xrt
PATH              : /opt/xilinx/xrt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH   : /opt/xilinx/xrt/lib
PYTHONPATH        : /opt/xilinx/xrt/python
Found Platform
Platform Name: Xilinx
INFO: Reading /home/ubuntu/fpga-bitfile/vadd.xclbin
Loading: '/home/ubuntu/fpga-bitfile/vadd.xclbin'
Trying to program device[0]: xilinx_u280_gen3x16_xdma_base_1
Device[0]: program successful!
TEST PASSED
--- Done
```

## Step 6: Extend Slice

Get slice details and extend the slice. This cell is optional and can be executed as-needed.

In [None]:
slice = fablib.get_slice(name=slice_name)
slice.show();

Renew by 14 days

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Step 7: Delete the slice

Delete the slice after completing the programming.

In [None]:
try:
    slice = fablib.get_slice(name=slice_name)
    slice.delete()
except Exception as e:
    print(f"Exception: {e}")