# Exercise 2: Optimize _hls4ml_ Conversion
Optimizing the conversion of a Neural Network into firmware is always a trade-off between how many resources we can occupy on the hardware, how fast we want our algorithm to run and how accurate we want the performance to be.

In this exercise we will explore a few options to control how _hls4ml_ translates the Neural Network into a firmware.

## Objectives
In this exercise, you will:
1. Explore different ways of optimizing the _hls4ml_ conversion
2. Compare the optimized models

## Instructions
Complete the code cells marked with `# TODO` comments. Follow the hints provided.

## Part 1: Environment Setup and Data Loading
Run the following cells to set up the environment (no changes needed).

In [None]:
# Load libraries and import packages (no changes needed)

# General imports
import os

# Numpy and plotting
import numpy as np
import matplotlib.pyplot as plt

# TimeSeries to hold our data
from gwpy.timeseries import TimeSeries

# Keras, numpy and matplotlib
from keras.models import load_model

# Local file with some useful methods
from utils import *

# hls4ml
import hls4ml
from hls4ml.model.profiling import numerical, get_ymodel_keras

# Set the correct libraries path for hls4ml
os.environ['XILINX_HLS']    = '/opt/tools/Xilinx/Vitis_HLS/2023.2'
os.environ['XILINX_VIVADO'] = '/opt/tools/Xilinx/Vivado/2023.2'
os.environ['XILINX_VITIS']  = '/opt/tools/Xilinx/Vitis/2023.2'
os.environ['PATH'] = os.environ["PATH"] + ":" \
                   + os.environ['XILINX_HLS'] + "/bin:" \
                   + os.environ['XILINX_VIVADO'] + "/bin:" \
                   + os.environ['XILINX_VITIS'] + "/bin:"

print("Environment setup correctly!")

Load the model and the test data

In [None]:
# TODO: load the "small encoder" you trained in the previous exercise

# Load the encoder model previously trained
# YOUR CODE HERE
model = load_model(''' YOUR CODE HERE ''')  # Update with correct name

# Load the test data previously saved
X_test = np.load("X_test.npy")
print(f"Loaded {X_test.shape[0]} test data with shape {X_test[0].shape}")

# Make sure the output directory to store plots and results exists:
if not os.path.exists('inference'):
    os.makedirs('inference')

## Part 2: Optimization

### Task 2.1: Precision
You can control how the model parameters are implemented (bit-representation) via the `Precision` parameter in the `hls_config`.

In [None]:
# TODO: define a config with a custom precision and explore its performance

# Define a default hls config
precision_config = hls4ml.utils.config_from_keras_model(model, granularity='model')

# YOUR CODE HERE: change the Model precision
# Hint: the config is basically a python dictionary, how do you update its keys?

In [None]:
# TODO: define the HLS model

# Main config
main_cfg = hls4ml.converters.create_config(
    board = 'alveo-u55c',            # Target boad and FPGA part
    part = 'xcu55c-fsvh2892-2L-e',   #
    clock_period = 3,                # Clock period in ns -> 3 ns = ~333 MHz
    backend = 'VivadoAccelerator'    # Backend to convert NN -> firmware
)

# Few more customizations
# YOUR CODE HERE
main_cfg['HLSConfig'] = None                                                      # Use the updated config
main_cfg['IOType'] = 'io_parallel'
main_cfg['AcceleratorConfig']['Platform'] = 'xilinx_u55c_gen3x16_xdma_3_202210_1'
main_cfg['KerasModel'] = None                                                     # Use the loaded model
main_cfg['OutputDir'] = None                                                      # Change the output name

print("="*20, "Main Config", "="*20)
plotting.print_dict(main_cfg)
print("="*50)

# Define the hls model
# YOUR CODE HERE
precision_model = None   # Replace with correct code

# Compile it
precision_model.compile()

You can now run inference with this updated HLS model

In [None]:
# Run inference

# YOUR CODE HERE
y_precision = None # Replace with correct code

# Save predictions for later comparisons
np.save('y_precision.npy', y_precision)
print("Precision config predictions saved as 'y_precision.npy'.")

and compare its performance with the keras model:

In [None]:
# TODO: Calculate and visualize comparison metrics
# 1. MSE per sample: mean of (y_cpu - y_hls)^2 along axis 1
# 2. Overall MSE: mean of all MSE per sample
# 3. MAE: mean of absolute differences
# 4. RMSE: square root of overall MSE

# Load the previously saved inferences
y_cpu = np.load("y_cpu.npy")

# YOUR CODE HERE
mse_per_sample = None  # Calculate MSE for each sample
overall_mse = None     # Calculate overall MSE
mae = None             # Calculate MAE
rmse = None            # Calculate RMSE

# Print the metrics
print(f"\n=== Software vs Precision Reconstruction Metrics ===")
print(f"Overall MSE           : {overall_mse:.6f}")
print(f"Average MSE per sample: {np.mean(mse_per_sample):.6f}")
print(f"Min MSE               : {np.min(mse_per_sample):.6f}")
print(f"Max MSE               : {np.max(mse_per_sample):.6f}")
print(f"Mean Absolute Error   : {mae:.6f}")
print(f"RMSE                  : {rmse:.6f}")

# Visualize comparison for first 3 samples (no changes needed)
n_examples = 3
fig, axes = plt.subplots(n_examples, 3, figsize=(15, 3*n_examples))

for i in range(n_examples):
    # CPU predictions
    axes[i, 0].plot(y_cpu[i])
    axes[i, 0].set_title(f'CPU Prediction {i}')
    axes[i, 0].set_ylabel('Amplitude')
    axes[i, 0].grid(True)
    
    # HLS predictions
    axes[i, 1].plot(y_precision[i])
    axes[i, 1].set_title(f'Precision Prediction {i}')
    axes[i, 1].grid(True)
    
    # Difference
    axes[i, 2].plot(y_cpu[i] - y_precision[i])
    axes[i, 2].set_title(f'Error (MSE: {mse_per_sample[i]:.4f})')
    axes[i, 2].grid(True)

plt.tight_layout()
plt.savefig('cpu_hls_comparison.png')
print("\nComparison plot saved as 'cpu_hls_precision_comparison.png'")

# Visualize error distribution (no changes needed)
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.hist(mse_per_sample, bins=20, edgecolor='black')
plt.xlabel('MSE per Sample')
plt.ylabel('Frequency')
plt.title('Distribution of Prediction Errors')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.boxplot(mse_per_sample)
plt.ylabel('MSE')
plt.title('MSE Distribution (Boxplot)')
plt.grid(True)

plt.tight_layout()
plt.savefig('precision_error_distribution.png')
print("Error distribution plot saved as 'precision_error_distribution.png'")

**Questions**
1. How does the new model compare with the keras model?
2. How does the new model compare with the previous HLS config?
3. Is this what you expected based on the Precision chosen?

### Task 2.2: Profiling
When the `granularity` parameter of the config is set to `name`, _hls4ml_ allows to finely control the bit representation of all the outputs, biases and weights for each layer singularly.

By default all values is set to `auto`, which is an overly-conservative value chosen in order to avoid overflow and truncation issues.
We can profit of the _hls4ml_ "Profiling" method to manually adjust the configuration and explicitly set the specific widths.

Let's define and explore a config with `name` granularity:

In [None]:
# Define an hls config with "name" granularity (no changes needed)
profiling_config = hls4ml.utils.config_from_keras_model(model, granularity='name')

print("="*20, "Profiling Config", "="*20)
plotting.print_dict(profiling_config)
print("="*50)

We can now use the "Profiling" tool to explore how well our hls model bit-representation actually "covers" the model parameters:

> This method plots the distribution of the weights (and biases) as a box and whisker plot. The grey boxes show the values which can be represented with the data types used in the hls_model. Generally, you need the box to overlap completely with the whisker ‘to the right’ (large values) otherwise you’ll get saturation & wrap-around issues. It can be okay for the box not to overlap completely ‘to the left’ (small values), but finding how small you can go is a matter of trial-and-error.

In [None]:
# TODO: Profiling

# Define a minimal model to explore the profiling method
# YOUR CODE HERE
profiling_model = hls4ml.converters.convert_from_keras_model(
    None,            # Your Keras model
    hls_config=None  # Your Profiling config
)

# Run the numerical profiling
# YOUR CODE HERE
prof = numerical(model=None, hls_model=None) # Your Keras model + Your Profiling model

# Bonus:
# To also see the layers' Activation Functions bit representation:
#  - activate tracing in the layers
#  - pass the input data to the numerical() methos

#for layer in profiling_config['LayerName'].keys():
#    profiling_config['LayerName'][layer]['Trace'] = True

#prof = numerical(model=model, hls_model=hls_model, X=X_test)

**Excercise:**
- Adjust the weights and biases of the layers
- Re-run the profiling to check if the new definition is correct
- Repeat iteratively untill you are satisfied with the config

Hint: to update the layer you have to modify the profiling config, which is still a python dictionary...

**Questions:**
1. Do you remember the `ap_fixed` notation?
2. Will the optimized model use more or less resources with respect the "deafult config"?

In [None]:
# TODO: compare performances with previous models

# Main config
main_cfg = hls4ml.converters.create_config(
    board = 'alveo-u55c',            # Target boad and FPGA part
    part = 'xcu55c-fsvh2892-2L-e',   #
    clock_period = 3,                # Clock period in ns -> 3 ns = ~333 MHz
    backend = 'VivadoAccelerator'    # Backend to convert NN -> firmware
)

# Few more customizations
# YOUR CODE HERE
main_cfg['HLSConfig'] = None                                                      # Use the updated config
main_cfg['IOType'] = 'io_parallel'
main_cfg['AcceleratorConfig']['Platform'] = 'xilinx_u55c_gen3x16_xdma_3_202210_1'
main_cfg['KerasModel'] = None                                                     # Use the loaded model
main_cfg['OutputDir'] = None                                                      # Change the output name

print("="*20, "Main Config", "="*20)
plotting.print_dict(main_cfg)
print("="*50)

# Define the hls model
# YOUR CODE HERE
profiled_model = None   # Replace with correct code

# Compile it
profiled_model.compile()

# Run inference
# YOUR CODE HERE
y_profiled = None   # Replace with correct code

# Save predictions for later comparisons
np.save('y_profiled.npy', y_profiled)
print("Profiled config predictions saved as 'y_profiled.npy'.")

In [None]:
# TODO: Calculate and visualize comparison metrics

# YOUR CODE HERE
mse_per_sample = None  # Calculate MSE for each sample
overall_mse = None     # Calculate overall MSE
mae = None             # Calculate MAE
rmse = None            # Calculate RMSE

# Print the metrics
print(f"\n=== Software vs Profiled Reconstruction Metrics ===")
print(f"Overall MSE           : {overall_mse:.6f}")
print(f"Average MSE per sample: {np.mean(mse_per_sample):.6f}")
print(f"Min MSE               : {np.min(mse_per_sample):.6f}")
print(f"Max MSE               : {np.max(mse_per_sample):.6f}")
print(f"Mean Absolute Error   : {mae:.6f}")
print(f"RMSE                  : {rmse:.6f}")

# Visualize comparison for first 3 samples (no changes needed)
n_examples = 3
fig, axes = plt.subplots(n_examples, 3, figsize=(15, 3*n_examples))

for i in range(n_examples):
    # CPU predictions
    axes[i, 0].plot(y_cpu[i])
    axes[i, 0].set_title(f'CPU Prediction {i}')
    axes[i, 0].set_ylabel('Amplitude')
    axes[i, 0].grid(True)
    
    # HLS predictions
    axes[i, 1].plot(y_profiled[i])
    axes[i, 1].set_title(f'Precision Prediction {i}')
    axes[i, 1].grid(True)
    
    # Difference
    axes[i, 2].plot(y_cpu[i] - y_profiled[i])
    axes[i, 2].set_title(f'Error (MSE: {mse_per_sample[i]:.4f})')
    axes[i, 2].grid(True)

plt.tight_layout()
plt.savefig('cpu_hls_comparison.png')
print("\nComparison plot saved as 'cpu_hls_profiled_comparison.png'")

# Visualize error distribution (no changes needed)
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.hist(mse_per_sample, bins=20, edgecolor='black')
plt.xlabel('MSE per Sample')
plt.ylabel('Frequency')
plt.title('Distribution of Prediction Errors')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.boxplot(mse_per_sample)
plt.ylabel('MSE')
plt.title('MSE Distribution (Boxplot)')
plt.grid(True)

plt.tight_layout()
plt.savefig('profiled_error_distribution.png')
print("Error distribution plot saved as 'profiled_error_distribution.png'")

### Bonus - Tracing
Use the "Trace" method to collect the model outputs at each layer and identify if/where a too aggressive bit-representation was applied:

In [None]:
# Get HLS predictions and trace
hls4ml_pred, hls4ml_trace = profiled_model.trace(X_test[:1000])

# Get trace from keras using get_ymodel_keras
# YOUR CODE HERE
keras_trace = get_ymodel_keras('''YOUR CODE HERE''', X_test[:1000]) # Fix name of the loaded keras model

# Print and compare
print("Keras layer 'encoder1', first sample:")
print(keras_trace['encoder1'][0])
print("hls4ml layer 'encoder1', first sample:")
print(hls4ml_trace['encoder1'][0])

### Task 2.3: Other Model Optimizations
There are other "Model-Level" optimization parameters that you can explore:
- `Strategy`: refers to the implementation of core matrix-vector multiplication routine, which can be latency-oriented, resource-saving oriented, or specialized.
  - Possible values: `"Latency"`, `"Resource"` 
- `ReuseFactor`: this defines the pipeline interval or initiation interval, i.e. how many times each resource can/will be re-used in the implementation
  - The ReuseFactor can also be defined on a per-layer base
  - Possible values: powers of `2`
- `BramFactor`: Contols which layers will be implemented as BRAM elements
  - Example: setting `BramFactor=100`, only layers with more than 100 weights will be exposed as external BRAM.

All these parameters are usually customized together to get to the desired firmware implementation.

In [None]:
# TODO: explore the Strategy/ReuseFactor/BramFactor parameters

# Hint: Follow the same flow we used so far:

# 1. Define a new config
# YOUR CODE HERE
optimal_config = None  # Use correct code here

# Hint: remember we need 1 hls config + 1 main config!

# 2. Update one (or more) parameter
# YOUR CODE HERE

# 3. Deine a new HLS model
# YOUR CODE HERE
optimal_model = None  # Use correct code here

# 4. Run inference and 
y_optimal = optimal_model.predict(X_test)

# 5. Compare with previous results (metrics, plots...)

## Part 4: Check Resources
So far, we have only checked how the accuracy of the translated model was, but a crucial point of optimizing the HLS configs is ensuring that we can fit our model into the FPGA resources and that it is running fast enough!  

**Exercise**
Choose a few of the models you explored above and run the synthesis.

In [None]:
# TODO: syntesize model and check the report

# Hints and suggestions:
# - Remember to define a different output directory for each model
# - Remember that each synthesis will take 5-10 minutes, so choose carefully!

# Run the synthesis
# YOUR CODE HERE
my_model.build(csim=False) # -- Fix the model to be syntesised

# Print the reports
# YOUR CODE HERE
print("Resource usage and latency:")
print_report(''' YOUR CODE HERE ''') # -- use the chose model output directory