Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIFO depth optimization #509

Merged
merged 16 commits into from Jul 21, 2022
Merged

FIFO depth optimization #509

merged 16 commits into from Jul 21, 2022

Conversation

nicologhielmetti
Copy link
Contributor

@nicologhielmetti nicologhielmetti commented Mar 21, 2022

The PR optimize the depth of the FIFOs implemented in io_stream based models by estimating the actual usage of the FIFOs through cosimulation. For more details consider this presentation -> https://docs.google.com/presentation/d/1ItM-8EAlNRdjRk4Cu1LY7gVvCavD90wrE95SsH2wQQw/edit?usp=sharing

How to use it:
https://gist.github.com/nicologhielmetti/3a268be32755448920e9f7d5c78a76d8

@jmduarte
Copy link
Member

Hi @nicologhielmetti, thanks for putting together this important PR!

Can you edit your first comment describe what the purpose of the PR is and/or provide a link to a presentation on it that you've already given? This is to help others understand, who may not be so familiar with this development.

Also, recall the general contributing guidelines: https://github.com/fastmachinelearning/hls4ml/blob/master/CONTRIBUTING.md

Thanks!

hls4ml/model/graph.py Outdated Show resolved Hide resolved
hls4ml/backends/vivado/passes/fifo_depth_optimization.py Outdated Show resolved Hide resolved
hls4ml/backends/vivado/passes/fifo_depth_optimization.py Outdated Show resolved Hide resolved
hls4ml/backends/vivado/passes/fifo_depth_optimization.py Outdated Show resolved Hide resolved
hls4ml/backends/vivado/passes/fifo_depth_optimization.py Outdated Show resolved Hide resolved
@thesps
Copy link
Contributor

thesps commented Mar 22, 2022

Right now the pass is an optimizer pass that is part of the vivado:extras flow (simply because it is not part of another flow). This means that it's in the default flow for Vivado and VivadoAccelerator backends, but I don't think we want that. I think the right approach is to make this pass part of a new flow, that requires the vivado:ip flow and is not part of the default flow.

Copy link
Contributor

@thesps thesps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making one optimizer pass for each backend is a good solution, but now there is code duplicated across the two, which is not good for maintainability. The code in the passes should be factorized a bit more so that there is less duplication (ideally none). Perhaps it can help even if there are multiple passes that make up this flow?

hls4ml/templates/vivado/build_prj.tcl Outdated Show resolved Hide resolved
@thesps
Copy link
Contributor

thesps commented Apr 4, 2022

To document the status of this: the optimizer pass is in a good state now. We're holding off merging since the integration as a 'flow' is not working as expected, and may require some changes to the framework. After that we'll see if anything has to change for this PR - but it would likely be limited to the way it's registered as a flow and any flow requirements.

@jmduarte jmduarte mentioned this pull request Apr 29, 2022
7 tasks
@vloncar vloncar mentioned this pull request Jul 3, 2022
6 tasks
- Update FIFO opt optimizers to depend on the write flow
- Don't call write from the opt, instead apply the writer flow after
hls4ml/writer/vivado_writer.py Outdated Show resolved Hide resolved
@thesps thesps merged commit 5485c95 into fastmachinelearning:main Jul 21, 2022
@vandenBergArthur
Copy link

Hi,
I was testing out the code from the gist, but I ran into the some issues.

When running this code;

import numpy as np
from qkeras import QDense, quantized_bits, quantized_relu, QActivation
from tensorflow.python.keras.layers import Activation
from tensorflow.python.keras.regularizers import l1

seed = 0
np.random.seed(seed)
import tensorflow as tf

tf.random.set_seed(seed)

from tensorflow.keras.models import Sequential

model = Sequential()
model.add(QDense(64, input_shape=(16,), name='fc1',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu1'))
model.add(QDense(32, name='fc2',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu2'))
model.add(QDense(32, name='fc3',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu3'))
model.add(QDense(5, name='output',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='softmax', name='softmax'))

import hls4ml

output_dir = 'test-vivado'
config = hls4ml.utils.config_from_keras_model(model, granularity='model')
config['Flows'] = ['vivado:fifo_depth_optimization']

hls_model = hls4ml.converters.convert_from_keras_model(model, io_type='io_stream',
                                                       hls_config=config,
                                                       output_dir=output_dir,
                                                       # board='pynq-z2',
                                                       part='xc7z020clg400-1',
                                                       backend='Vivado'
                                                       )

hls_model.build(reset=False, csim=True, synth=True, cosim=True, validation=True, export=True, vsynth=True)

I get a NotImplementedError:

Layer ModuleWrapper was created by passing
non-serializable argument values in `__init__()`,
and therefore the layer must override `get_config()` in
order to be serializable. Please implement `get_config()`.

Example:

class CustomLayer(keras.layers.Layer):
    def __init__(self, arg1, arg2, **kwargs):
        super().__init__(**kwargs)
        self.arg1 = arg1
        self.arg2 = arg2

    def get_config(self):
        config = super().get_config()
        config.update({
            "arg1": self.arg1,
            "arg2": self.arg2,
        })
        return config

However, if I change the model to the model that's used in the documentation

model = Sequential()
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
model.add(Dense(32, name='fc2', activation='relu'))
model.add(Dense(32, name='fc3', activation='relu'))
model.add(Dense(5, name='out', activation='softmax'))
model.summary()

The error is gone, and I can run the optimization using the Vivado backend.

But, when running the code from the VivadoAccelerator backend:

output_dir = 'test-vivadoaccel'
config = hls4ml.utils.config_from_keras_model(model, granularity='model')
config['Flows'] = ['vivadoaccelerator:fifo_depth_optimization']
print("-----------------------------------")

hls4ml.model.optimizer.get_optimizer('vivado:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)

hls_model = hls4ml.converters.convert_from_keras_model(model, io_type='io_stream',
                                                       hls_config=config,
                                                       output_dir=output_dir,
                                                       board='pynq-z2',
                                                       # part='xc7z020clg400-1',
                                                       backend='VivadoAccelerator'
                                                       )

hls_model.build(reset=False, csim=False, synth=True, cosim=False, validation=False, export=True, vsynth=True, bitfile=True)

I have error messages related to resource constraints (the full log file is added below):

ERROR: [DRC UTLZ-1] Resource utilization: CARRY4 over-utilized in Top Level Design (This design requires more CARRY4 cells than are available in the target device. This design requires 79047 of such cell types but only 13300 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: FDRE over-utilized in Top Level Design (This design requires more FDRE cells than are available in the target device. This design requires 166407 of such cell types but only 106775 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: LUT as Logic over-utilized in Top Level Design (This design requires more LUT as Logic cells than are available in the target device. This design requires 264856 of such cell types but only 53200 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)
ERROR: [DRC UTLZ-1] Resource utilization: LUT2 over-utilized in Top Level Design (This design requires more LUT2 cells than are available in the target device. This design requires 128948 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: Register as Flip Flop over-utilized in Top Level Design (This design requires more Register as Flip Flop cells than are available in the target device. This design requires 167341 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: Slice LUTs over-utilized in Top Level Design (This design requires more Slice LUTs cells than are available in the target device. This design requires 265364 of such cell types but only 53200 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)
ERROR: [DRC UTLZ-1] Resource utilization: Slice Registers over-utilized in Top Level Design (This design requires more Slice Registers cells than are available in the target device. This design requires 167341 of such cell types but only 106400 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)

vivado.log

I am using the latest stable release v0.7.0: delphinium which should support the FIFO depth optimization.

Could someone look into this?

Thanks in advance!

@vloncar
Copy link
Contributor

vloncar commented May 2, 2023

Please don't hijack old pull requests and other issues with unrelated problems. Your issue comes from tensorflow, not hls4ml. Compare what you do differently in the two snippets that you shared. hint: it's in the very first few lines you shared

@fastmachinelearning fastmachinelearning locked as off-topic and limited conversation to collaborators May 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants