FIFO depth optimization #509

nicologhielmetti · 2022-03-21T16:37:49Z

The PR optimize the depth of the FIFOs implemented in io_stream based models by estimating the actual usage of the FIFOs through cosimulation. For more details consider this presentation -> https://docs.google.com/presentation/d/1ItM-8EAlNRdjRk4Cu1LY7gVvCavD90wrE95SsH2wQQw/edit?usp=sharing

How to use it:
https://gist.github.com/nicologhielmetti/3a268be32755448920e9f7d5c78a76d8

jmduarte · 2022-03-21T21:54:04Z

Hi @nicologhielmetti, thanks for putting together this important PR!

Can you edit your first comment describe what the purpose of the PR is and/or provide a link to a presentation on it that you've already given? This is to help others understand, who may not be so familiar with this development.

Also, recall the general contributing guidelines: https://github.com/fastmachinelearning/hls4ml/blob/master/CONTRIBUTING.md

Thanks!

hls4ml/model/graph.py

hls4ml/backends/vivado/passes/fifo_depth_optimization.py

thesps · 2022-03-22T11:07:33Z

Right now the pass is an optimizer pass that is part of the vivado:extras flow (simply because it is not part of another flow). This means that it's in the default flow for Vivado and VivadoAccelerator backends, but I don't think we want that. I think the right approach is to make this pass part of a new flow, that requires the vivado:ip flow and is not part of the default flow.

hls4ml/backends/vivado/passes/fifo_depth_optimization.py

thesps

Making one optimizer pass for each backend is a good solution, but now there is code duplicated across the two, which is not good for maintainability. The code in the passes should be factorized a bit more so that there is less duplication (ideally none). Perhaps it can help even if there are multiple passes that make up this flow?

hls4ml/templates/vivado/build_prj.tcl

…into fifo_depth_merge

thesps · 2022-04-04T10:03:21Z

To document the status of this: the optimizer pass is in a good state now. We're holding off merging since the integration as a 'flow' is not working as expected, and may require some changes to the framework. After that we'll see if anything has to change for this PR - but it would likely be limited to the way it's registered as a flow and any flow requirements.

- Update FIFO opt optimizers to depend on the write flow - Don't call write from the opt, instead apply the writer flow after

hls4ml/writer/vivado_accelerator_writer.py

hls4ml/writer/vivado_writer.py

hls4ml/backends/vivado/passes/fifo_depth_optimization.py

…into fifo_depth_merge

vandenBergArthur · 2023-05-02T11:17:13Z

Hi,
I was testing out the code from the gist, but I ran into the some issues.

When running this code;

import numpy as np
from qkeras import QDense, quantized_bits, quantized_relu, QActivation
from tensorflow.python.keras.layers import Activation
from tensorflow.python.keras.regularizers import l1

seed = 0
np.random.seed(seed)
import tensorflow as tf

tf.random.set_seed(seed)

from tensorflow.keras.models import Sequential

model = Sequential()
model.add(QDense(64, input_shape=(16,), name='fc1',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu1'))
model.add(QDense(32, name='fc2',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu2'))
model.add(QDense(32, name='fc3',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu3'))
model.add(QDense(5, name='output',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='softmax', name='softmax'))

import hls4ml

output_dir = 'test-vivado'
config = hls4ml.utils.config_from_keras_model(model, granularity='model')
config['Flows'] = ['vivado:fifo_depth_optimization']

hls_model = hls4ml.converters.convert_from_keras_model(model, io_type='io_stream',
                                                       hls_config=config,
                                                       output_dir=output_dir,
                                                       # board='pynq-z2',
                                                       part='xc7z020clg400-1',
                                                       backend='Vivado'
                                                       )

hls_model.build(reset=False, csim=True, synth=True, cosim=True, validation=True, export=True, vsynth=True)

I get a NotImplementedError:

Layer ModuleWrapper was created by passing
non-serializable argument values in `__init__()`,
and therefore the layer must override `get_config()` in
order to be serializable. Please implement `get_config()`.

Example:

class CustomLayer(keras.layers.Layer):
    def __init__(self, arg1, arg2, **kwargs):
        super().__init__(**kwargs)
        self.arg1 = arg1
        self.arg2 = arg2

    def get_config(self):
        config = super().get_config()
        config.update({
            "arg1": self.arg1,
            "arg2": self.arg2,
        })
        return config

However, if I change the model to the model that's used in the documentation

model = Sequential()
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
model.add(Dense(32, name='fc2', activation='relu'))
model.add(Dense(32, name='fc3', activation='relu'))
model.add(Dense(5, name='out', activation='softmax'))
model.summary()

The error is gone, and I can run the optimization using the Vivado backend.

But, when running the code from the VivadoAccelerator backend:

output_dir = 'test-vivadoaccel'
config = hls4ml.utils.config_from_keras_model(model, granularity='model')
config['Flows'] = ['vivadoaccelerator:fifo_depth_optimization']
print("-----------------------------------")

hls4ml.model.optimizer.get_optimizer('vivado:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)

hls_model = hls4ml.converters.convert_from_keras_model(model, io_type='io_stream',
                                                       hls_config=config,
                                                       output_dir=output_dir,
                                                       board='pynq-z2',
                                                       # part='xc7z020clg400-1',
                                                       backend='VivadoAccelerator'
                                                       )

hls_model.build(reset=False, csim=False, synth=True, cosim=False, validation=False, export=True, vsynth=True, bitfile=True)

I have error messages related to resource constraints (the full log file is added below):

ERROR: [DRC UTLZ-1] Resource utilization: CARRY4 over-utilized in Top Level Design (This design requires more CARRY4 cells than are available in the target device. This design requires 79047 of such cell types but only 13300 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: FDRE over-utilized in Top Level Design (This design requires more FDRE cells than are available in the target device. This design requires 166407 of such cell types but only 106775 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: LUT as Logic over-utilized in Top Level Design (This design requires more LUT as Logic cells than are available in the target device. This design requires 264856 of such cell types but only 53200 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)
ERROR: [DRC UTLZ-1] Resource utilization: LUT2 over-utilized in Top Level Design (This design requires more LUT2 cells than are available in the target device. This design requires 128948 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: Register as Flip Flop over-utilized in Top Level Design (This design requires more Register as Flip Flop cells than are available in the target device. This design requires 167341 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: Slice LUTs over-utilized in Top Level Design (This design requires more Slice LUTs cells than are available in the target device. This design requires 265364 of such cell types but only 53200 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)
ERROR: [DRC UTLZ-1] Resource utilization: Slice Registers over-utilized in Top Level Design (This design requires more Slice Registers cells than are available in the target device. This design requires 167341 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)

vivado.log

I am using the latest stable release v0.7.0: delphinium which should support the FIFO depth optimization.

Could someone look into this?

Thanks in advance!

vloncar · 2023-05-02T12:20:39Z

Please don't hijack old pull requests and other issues with unrelated problems. Your issue comes from tensorflow, not hls4ml. Compare what you do differently in the two snippets that you shared. hint: it's in the very first few lines you shared

nicologhielmetti added 2 commits March 3, 2022 18:56

FIFO depth optimization almost adapted, wrapper fifos have to be handled

3559053

FIFO depth optimization adapted and tested with the new flow framework

118fbe0

nicologhielmetti requested a review from thesps March 21, 2022 16:37

thesps requested changes Mar 22, 2022

View reviewed changes

hls4ml/backends/vivado/passes/fifo_depth_optimization.py Outdated Show resolved Hide resolved

nicologhielmetti and others added 3 commits March 23, 2022 23:14

Added some improvements - from github PR discussion

505a818

Copy flows to config

0f39da7

Added some missing modifications

f4b5c87

thesps requested changes Mar 24, 2022

View reviewed changes

hls4ml/templates/vivado/build_prj.tcl Outdated Show resolved Hide resolved

nicologhielmetti and others added 4 commits March 25, 2022 14:22

Improved from SE perspective

2f7a586

Merge branch 'master' of https://github.com/fastmachinelearning/hls4ml …

57485ca

…into fifo_depth_merge

Fix setup.py after merge from master

e9e9c14

build(...) function fixed in vivado_accelerator_backend.py

e6bf861

ekellim mentioned this pull request Apr 12, 2022

FIFO resource consumption #524

Open

jmduarte mentioned this pull request Apr 29, 2022

MLPerf Tiny developments #503

Open

7 tasks

Merge branch 'master' into fifo_depth_merge

86f72b9

vloncar mentioned this pull request Jul 3, 2022

Update flow dependencies #588

Merged

6 tasks

vloncar added 3 commits July 7, 2022 17:57

Merge remote-tracking branch 'upstream/master' into fifo_opt_flow

7da98cf

Update for the new flows ordering

62cf4bc

- Update FIFO opt optimizers to depend on the write flow - Don't call write from the opt, instead apply the writer flow after

Follow up in VivadoAcc backend

9014136

thesps reviewed Jul 8, 2022

View reviewed changes

hls4ml/writer/vivado_accelerator_writer.py Outdated Show resolved Hide resolved

thesps requested changes Jul 8, 2022

View reviewed changes

hls4ml/writer/vivado_writer.py Outdated Show resolved Hide resolved

thesps requested changes Jul 8, 2022

View reviewed changes

hls4ml/backends/vivado/passes/fifo_depth_optimization.py Show resolved Hide resolved

Cleanups on FIFO depth optimization

d072262

thesps mentioned this pull request Jul 13, 2022

Override parent backend optimizer passes with derived backend passes #597

Merged

thesps added 2 commits July 19, 2022 15:22

Merge branch 'main' of https://github.com/fastmachinelearning/hls4ml …

290f2c8

…into fifo_depth_merge

Use name instead of cppname

d0ec0f8

thesps approved these changes Jul 21, 2022

View reviewed changes

thesps merged commit 5485c95 into fastmachinelearning:main Jul 21, 2022

fastmachinelearning locked as off-topic and limited conversation to collaborators May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIFO depth optimization #509

FIFO depth optimization #509

nicologhielmetti commented Mar 21, 2022 •

edited

jmduarte commented Mar 21, 2022

thesps commented Mar 22, 2022

thesps left a comment

thesps commented Apr 4, 2022

vandenBergArthur commented May 2, 2023

vloncar commented May 2, 2023

FIFO depth optimization #509

FIFO depth optimization #509

Conversation

nicologhielmetti commented Mar 21, 2022 • edited

jmduarte commented Mar 21, 2022

thesps commented Mar 22, 2022

thesps left a comment

Choose a reason for hiding this comment

thesps commented Apr 4, 2022

vandenBergArthur commented May 2, 2023

vloncar commented May 2, 2023

nicologhielmetti commented Mar 21, 2022 •

edited