Skip to content

Conversation

@jsmonson
Copy link
Contributor

Due to mistake in updating main we had to reset and remove a few commits from develop and remerge main into develop. Now we need to re-add Jakoba's PR.

@jsmonson jsmonson requested a review from a team as a code owner August 29, 2025 17:17
@jsmonson jsmonson merged commit b2e4c44 into develop Aug 29, 2025
2 checks passed
jsmonson added a commit that referenced this pull request Aug 29, 2025
* Created branch, added codeowners

* Initial migration from internal repo (#5)

* Initial commit

* finn flow: pass absolute path names to finn

* Added scripts for roofline analysis

* Making the output save in the current directory

* release v0.2.0

Enable 4 bits

* Bringing up a branch that is just the plugin framework for the BERT ops that have been added

* Initial cleanup script. Performs some simplification and does some surgery to remove the Dropout layer. For some reason the IdentityOps are not being removed

* Added a simple input arg

* Moving to bert_build

* Added a transformation to reorder the inputs so that the remove IdentityOP transformation is effective.

* Initial cut and laying the groundwork for plugin-based shuffle convert_to_hw operator

* Getting stubs up for shuffle op and starting to populate some

* Cleanup and some more asserts to check permutation list and shapes match up

* Initial helper functions for shuffle work

* Adding the input_generator for the cases where the inner dimension is not migrating.

* Adding latest version of the onnx model and combining cleanup and bringup scripts into a single build script with multiple steps.

* Added the infer QuantSoftMax to the pipecleaner build script, renamed the brevitas script

* First cut at shuffle specialise layer

* Registering Shuffle_hls

* Added convert step that is currently skipped

* Added a step that attempts to specialise layers on the pipecleaner model

* Using fpgapart from the config instead

* fixed model

* adding some streamlining steps to the build flow which are passing through on the modified input model

* Initial commit

* finnbrainsmith integration

* Added a simple README for now

* fixing typoe thanks @auphelia

* Initial build shuffle tests up"

* populating member functions for getting the dtype and instream/outstream width for HLS generation

* Adding the loop_coeffs to the attribute types dict

* Needed to give nodes unique names to start generating hardware

* Adding a custom HLSBackend where the tcl generation is overridden so that we can include the hlsextension directory

* Fixing some portname issues in the generated HLS code

* IP successfully building

* Added cppsim support, passed suspiciously easily

* Added some temporary stop-gaps with a brainsmith_templates so that we can support vector inputs before they appear in finn/dev

* Fixing loop bound/coefficient zipping ordering

* Reshaping now happening properly and avoiding cppsim segfault

* removing IPgen step... for now...

* Adding testing from pytorch for the shuffles

* cppsim from pytorch to hw is passing

* Ramping up testing for all the shuffle types

* Removing redundant reshape in testing

* First cut at rtlsim support for shuffles

* First shuffle RTLSim tests passing

* cleaning up the test a little

* Cleaning up the InferShuffle transformation

* shuffle cppsim codegen cleanup

* fixing bug with shape of output when a reshape was present

* Needed to increase liveness threshold to get all the rtlsim's to pass'

* Bigger bump needed?

* [BugFix] Fixed issue with using old Brevitas API for quant_act_scale.

* Was including the file from the location

* Using the plugin's template now

* Removing test that doesn't make sense anymore

* Removing INT16 for now focusing testing on INT8 for EoY goal

* Adding the latest Brevitas bert build script and starting work on the cleanup scripts

* Datatype name fix

* cppsim integration

* Fixing issues with the decapitation step

* Added model tail removal custom step

* Cleaning up the cleanup script

* Removing redundant cleanup step

* Adding an endtoend script and updating the README

* Ensuring hash's and branches are consistent on the README

* Added a minimal initial endtoend test

* test fixed

* Added a switch to end2end test to attempt IP generation (this is currently failing)

* Extended the test to track how many ops have been successfully specialised and what percentage

* Have the end2end test export a json dashboard file instead for tracking progress.

* refactoring the endtoend test a bit to use fixtures and track progress through the build process

* Updated testing to track various bits

* RTLSim for QuantSoftMax

* Removing prepare_rtlsim stub

* QuantSoftMax RTLSim bugfixes (working now)

* fix issue of passing datatypes instead of datatype strings

* Adding template types to the treereduction operation

* cppsim compiling, for the half it required some casting that I was not quite sure about.

* ensure that the context array is np.float32

* Getting stuff working with the latest changes

* Clean up remove head and add streamlining steps

* Add streamlining steps for softmax

* add gather to crop

* Fixing linker library paths and include directories for 2024.2 compatibility

* Cleanup

* tracking individual steps now with fixtures dependencies, also added the ability to dump data to the dashboard json file

* Refactored testing so that each step in the build flow is a separate pytest fixture. If we want to add a test at any point in the build flow we can just pass the step fixture in as an argument and then the cached build at that specific point will be picked up"

* Starting to bring in the default steps

* Generate a test for each step added automatically

* Trying as much of the default flow as possible

* removing tests that don't make sense right now

* fixing the custom steps

* Remove call to default convert_to_hw

* Reverting back to old specialise layers

* need dataflow partition, comment out for now

* Removing duplication of the custom steps for BERT and duplicated scripts

* updating endtoend script to include some of the default steps

* commenting out the last few steps for now

* Add a check at the end to see if hls synth went okay

* dashboard json data update

* Cleaning up the custom steps

* Docstring explanations of the custom_steps required for BERT also cleaned up the flow a bit

* bringing up validation testing of some of the steps

* Adding python execution model for the shuffle

* Added a small function for validation that when a test fails will examine the contexts and show what is the same and what differs

* Silly mistake with the shuffle execute, it was not writing the result back into the context but was returning it

* Elemwise integration

* Adding UINT8 testcase which is the same as the BERT model

* Increasing the timeout on softmax tests

* Changing paths to match new 2024.2 directory structure

* keep things float32 for now

* Fixing case issue on SIMD attribute allowed the compilation to go further

* boilerplate prepare_rtl sim is okay now, removing overridden version

* Input int8, 2024.2 update

* FuncLayerNorm bugfix and FLOAT32 testcase

* "exec_mode" fix and code cleanup

* Merge feature/plugin/layernorm_stf

* support multiple lines

* Added template parameter to enable/disable the quant stage at the end of the softmax

* Adjusting the nodeattr for shuffle so that it is compatible with the set_target_fps transformation

* QuantSoftMax nodeattr compatibility with set_fps_target transformation

* Adding nodeattr so that layernorm is compatible with set_target_fps transformations

* simd to SIMD

* Non Quant softmax passing cppsim

* Validation is having a lot more success with HWSoftMax rather than QuantSoftMax

* reintroducing some essential streamlining steps, validation looking a lot better

* Endtoend up without fps_target yet

* integer cycles to stop issue in set_fifo_depths

* Using the v80 part number for the softmax tests

* Fix for the issue causing the stitched rtl sim stall

* Setting reasonable fps target for initial pipecleaning

* Fix for infering the datatypes in the shuffle node thanks @auphelia

* Adding some configuration files for the bert end2end flow

* Added some expected input and output npy files

* Removing start step

* Adding correct expected output

* Adding an RTLSim node-by-node test to the pytests. Adjusting the configuration for a default build flow.

* Adding more rtlsim based testing to the end2end pytests

* Saving the context of the node-by-node runs under a different dir name

* generate a reference IO each time due to randomly generated weights in brevitas script

* Adding a custom step that generates the reference IO for each run for validation

* SIMD parameter for shuffles in testing is now properly being set, some tests are now failing cppsim and need fixing

* Not every loop coeff should be divided by simd

* Fixed the shuffle SIMD issue

* Making more command line arguments available for the parameter sweeping for the bert_build demo scripts

* Woops left in note

* Removing the custom debugging steps from the build flow

* Adding an example bash script to sweep over some parameters.

* Added a simple script to print the results of param sweep

* Cleaning up to remove c++17 warning

* Tidying up comments / warnings for demos

* Using board instead of fpga_part

* Making the output look a bit neater

* Removing unused validation steps

* fix param sweep

* Slight tweak to example param sweep script

* Adding a makefile and configs for some single layer and three layer configurations.

* We have some large fifos in these builds that need to be split.

* Updating the Brevitas model as per @nfraser suggestion

* Fix circular make dependency

* Works using later qonnx changes

* New FIFO depth configurations for the three layers, folding configuration might not match the main plugin version though.

* Added new preconfigured designs for latest brevitas changes.

* Adding license file headers

* updating to correct link in setup instructions

* Tidying up QuantSoftMax/SoftMax

* Cleaning up utils and testing

* Cleaning up endtoend pytestingclear

* Adding back in the bitwidth option for the parameter sweep with the new model generation

* Added a parameter for changing the sequence length

* Skipping LN test for now

* Changed the artifact naming convention a little

* Remove extraneous implementation of QuantizeLayerNormalization

* Added a script to generate a config (pre FIFO depth sizing) for a particular folding configuration as we explore the DSE side of the Bert build

* Added a makefile recipe for a maximum folding three layer design for passing to RW team

* Adjusting number of layers on the design

* Manually control the fifo depth stage instead of setting it if a param file is present

* Need to come up with better arg naming for parameters, maybe just enforce longargs?

* Makefile recipies use the generation script for various SIMD/PE configurations rather than prebaking them

---------

Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: jsmonson <jsmonson@gmail.com>

* BERT builder flow arguments for fifosim n_inferences (#6)

* Added extra arguments to reflect latest change in finn/custom/transformer that enables you to override the number of inferences that the fifo depth sizing stage performs.

* Fixing the recipies and simplifying

* [SoftMax] New Improved SoftMax  (#11)

* Improvements to SoftMax hardware efficiency and also adding support for ap_float<W,I> datatypes.

* Fixes and compiler integration for new SoftMax

* fixing license header

* [BugFix] Issues with incorrect configuration of SIMD for ShuffleB nodes on three layer designs (#9)

* Adding check to make sure that we don't accidentally set SIMD for shuffleB yet, also updated the config generation so that we do not accidentally set the wrong shuffle in later layers

* Cleaning up the build scripts a little thanks @auphelia

* Moving the constraining of shuffle paramemters and pumpedCompute to temporary custom transformations so that they are more reliable

* Removing the temporary check and relying on the custom pass for now until the parallel transpose op comes online

* Fixed the return type of the custom transformations

* Adding cycle testing to custom op test scripts  (#7)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Added a custom step that extracts metadata for the shell integration flow (#14)

* [TinyBERT] Removing accidentally included start_step in the endtoend flow (#15)

* Removing the accidentally included startstep in the endtoend flow

* Restoring the default to 8 for bitwidth

* Removing rtlsim_backend after pyverilator deprecation (#16)

* Name stylize BrainSmith --> Brainsmith (#17)

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>

* [TinyBERT] Add ref IO to stitched_ip as part of metadata handover (#18)

* Include the reference IO as part of the metadata handover

* typo fix

* [Testing] Created OpTest class for abstracting CustomOp tests (#19)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Created OpTest class for abstracting CustomOp tests
- This class helps reduce shared boilerplate code between tests for custom FINN ops.
- The OpTest class is designed to be inherited by custom test classes. These custom test classes will inherit pre-written commonly used tests, and helper functions to make writing tests easier.
- An example of a test designed using OpTest can be found at the end of `./test/fpgadataflow/test_fpgadataflow_layernorm.py`.
- While functional, the class is still a work in progress, and more functionality will be added in alignment with the needs of the engineers who use it.

* Applied linting
- Applied linting using black's default settings.

* Created target_fpga fixture, removed prints, added SIMD ids
- Target FPGA, as used by the model_specialise fixture, is now a fixture, which can be overridden by a test class.
- Removed print statements in op_test.py that were used for debugging
- Added IDs to TestLayerNorms SIMD parameters. Pytest now displays SIMD1, SIMD2, SIMD4, instead of 1, 2, 4. More human-readable!

* Implemented reviewer suggestions, new 'target_node' fixture, improved typing
- Implemented @STFleming 's suggestions:
  - The `exec_mode` comparsisons at lines 65 and 68 now use `==` instead of `is`.
  - The reference to `LayerNorm` in the comment at line 173 has been removed.
  - `apply_transforms()` no longer uses an `assert`, instead it raises a `RuntimeError`.
- Implemented a new fixture, `target_node()`. This fixture returns an integer, specifiying the index in the model of the node we're testing. This means a model can contain nodes/layers other than the the one we want to test.
- Improved typing consistency throughout 'op_test.py': `input_tensors()` and `apply_transforms()` were missing parameter type hints.

* Initial repository structure (#20)

* Formatting bert_build as a job

* Further iteration/brainstorming

* Initial FINN docker transplant

* Adding deps to git ignore

* [Deps] Restructure python github repo installs (#8)

Co-authored-by: auphelia <jakobapk@web.de>

* Initial docker structuring for BrainSmith

* entrypoint path bugfix

* [Docker] Enable interactive mode for docker container (#10)

* Added model profiling scripts

* Hotpatch to remove pyverilator

* Normalize line endings in SUPPORT.md

* finnbrainsmith --> brainsmith/finnlib paths

* Tools folder restructure

* Fix gen_bert paths & name in expand_norms

* Custom QONNX branch to fix is_finn

* Removed old QuantLayerNorm func

* Initial job runner structuring

* Job structure v0, structure for profiling improvements

* Updated readme

* Template path fix

* Unsued import and formatting cleanup

* FP IP import fix

* Docker updates for pyxsi

* Pyxsi path fix

* Onnx path + linting fixes

* Removed finnlib, moving up sub folders

* Moved run_job to core for consistency

* Linting cleanup

* Updated README

* Added RTL placeholder

* Typo & gitignore fixes

* Updated finnlib to brainsmith in tests

* bert_steps path fix in tests

* Fix punctuation in README instructions.

* Update LICENSE: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update LICENSE: Brainsmith name fix 2

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update README.md - typo fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update brainsmith/tools/README.md: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Removed exec from fetch_repos

* Copyright typo fix

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Add Custom ONNXSCRIPT repository to BrainSmith (#21)

* add custom onnxscript branch

* Add TODO for reconciling onnxscript dependencies

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>

* Revert "Add Custom ONNXSCRIPT repository to BrainSmith (#21)" (#22)

This reverts commit 15fb647.

* [CustomOps] Update brainsmith custom ops with changes on finn side (#25)

* Initial continuous integration tests (#24)

* Initial attempt at docker build action

* Added branch name to action

* PR & weekly tests for dev/ci-actions

* Added self-hosted runner

* Adjusted runs-on label

* path fix

* Added debug to orient pwd

* Added pytest keyword through run-docker.sh

* Fixed license path

* Updated upload-artifats to v4

* Reorganize bert demo for github action

* Updated run-docker CLI args

* Added e2e test to actions

* Removed build artifacts

* Fix ci.yml run-docker statement

* Removed "push" trigger

* Merge with develop changes and add num workers env variable

* Re-added push trigger for testing

* Fix merge

* Temporarily disabled docker and pytest for e2e validation

* Fix BSMITH_BUILD_DIR env variable

* Remove push trigger, since PR trigger is sufficient

* Remove tesing branches and triggers for PR

* Remove auto-gen docs

* Delete demos/bert/configs/l1_simd12_pe8.json

Removed extraneous config from test

---------

Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>

* Revert onnxscript add Revert (#26)

* add custom onnxscript branch

* fix torch error

* readd todo

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Fix Dynamic Matmul Initial Config For BERT-Large (#28)

* fix formatting with copilot

* fix dynamic matmul config when sizing is not divisble by 3

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* fix argparse arg that could never be false (#30)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Patch Pull Request #30: Update args variable to match new argument name (#31)

* fix argparse arg that could never be false

* update fifosizing arg in hw compiler to match new argument name

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* update pytorch to 2.7 (#34)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* [Hotfix] Cleanup CI runner artifacts (#33)

* Added cleanup steps and job

* Made num_default_worker env variable

* update brevitas commit hash (#36)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Set onnxscript to a fixed commit id (#37)

* set to a fixed commit #

* moved up to previous latest commit

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Hardware Kernel Generator: RTL Parser & wrapper generation (#32)

* Debugging ckpt 0

* Fucntional parser

* Organized docs

* Fix interface docs name

* Functional interface, broke parser, debugging

* Debug ckpt 0

* Debug ckpt 1 -- functional width parsing

* Debug ckpt 2

* rtl_parser test suite passing

* All pytests passing

* parser.py audit

* Refactoring parser.py

* Removed old tests

* Organzied docs & logs

* Cleanup interface files

* Added license header to tests

* Updated readme

* Improved docstrings, combined interface-types+data

* Updated readme

* Add md type to convo log

* Initial RTL template generation

* HKG test passes

* Improve AXI detection resiliency

* Debug ckpt 0

* Functional RTL Template generation

* Initial structure

* Initial debugg ckpt

* Cleanup & streamlining pragma & interface code

* test_rtl_parser core

* Partial interface refactor

* rtl_parser test suite fully passing

* Begin HWCOp implementation

* Fix onnxscript dependencies

* Removed test artifacts

* RTL parser readme & comment cleanup, initial layout detector

* Test file cleanup

* RTL parser test suite clean-up & refactor

* Cleaned up placeholders

* Consolidated LLM artifacts to docs/rtl_parser

* Cleaned up old examples

* Removed duplicate, outdated test

* Removed layout files, fixed license headers

* Added HKG readme

* Add BERT-Large CI Test (#40)

* set to a fixed commit #

* add bert large single layer test

* moved up to previous latest commit

* reduce folding config

* update folding parameter to account for absence of pTranspose

* add bi-weekly bert-large single layer ci test.

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Docker workflow modernization (#38)

Comprehensive modernization of Brainsmith's Docker workflow and GitHub Actions CI system, introducing persistent container management and modular action architecture that reduces code duplication by 75% while achieving 73% performance improvements for multi-command workflows.

* Update FINN (#41)

* Convert pyxsi commands to finnxsi

* [bert-folding] Adjust folding script to correctly set SIMD and PE for dynamic MVAUs

* [Deps] Reset qonnx url to main repo

* Reset finn commit to custom/transformer

* Core DSE & Plugin Library (#44)

  Complete architectural overhaul introducing:
  - Plugin system for extensible kernels, transforms, and build steps
  - Blueprint YAML interface for declarative design space configuration
  - Segment-based execution tree for efficient DSE with computation reuse
  - Unified `smithy` CLI replacing run-docker.sh with improved container management
  - Reorganized module structure: custom_op → kernels, transformation → transforms
  - New core modules for design parsing, DSE runners, and framework adapters
  - Modular GitHub Actions workflows replacing monolithic CI

  Breaking changes:
  - Module paths changed (e.g., brainsmith.custom_op → brainsmith.kernels)
  - Docker workflow now uses ./smithy instead of ./run-docker.sh
  - Configuration format migrated from Python to YAML blueprints
  - Renamed demos/ to examples/ following standard conventions

  This refactor establishes the foundation for planned features including
  multi-layer offload, parallelized tree execution, and automated kernel
  integration while maintaining backward compatibility for the BERT demo.

* [Deps] Update and fix finn and qonnx deps (#54)

Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: Shane Fleming <shane.fleming@amd.com>
Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Daniel Penrose <Daniel.Penrose@amd.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>
tafk7 added a commit that referenced this pull request Sep 27, 2025
* Created branch, added codeowners

* Initial migration from internal repo (#5)

* Initial commit

* finn flow: pass absolute path names to finn

* Added scripts for roofline analysis

* Making the output save in the current directory

* release v0.2.0

Enable 4 bits

* Bringing up a branch that is just the plugin framework for the BERT ops that have been added

* Initial cleanup script. Performs some simplification and does some surgery to remove the Dropout layer. For some reason the IdentityOps are not being removed

* Added a simple input arg

* Moving to bert_build

* Added a transformation to reorder the inputs so that the remove IdentityOP transformation is effective.

* Initial cut and laying the groundwork for plugin-based shuffle convert_to_hw operator

* Getting stubs up for shuffle op and starting to populate some

* Cleanup and some more asserts to check permutation list and shapes match up

* Initial helper functions for shuffle work

* Adding the input_generator for the cases where the inner dimension is not migrating.

* Adding latest version of the onnx model and combining cleanup and bringup scripts into a single build script with multiple steps.

* Added the infer QuantSoftMax to the pipecleaner build script, renamed the brevitas script

* First cut at shuffle specialise layer

* Registering Shuffle_hls

* Added convert step that is currently skipped

* Added a step that attempts to specialise layers on the pipecleaner model

* Using fpgapart from the config instead

* fixed model

* adding some streamlining steps to the build flow which are passing through on the modified input model

* Initial commit

* finnbrainsmith integration

* Added a simple README for now

* fixing typoe thanks @auphelia

* Initial build shuffle tests up"

* populating member functions for getting the dtype and instream/outstream width for HLS generation

* Adding the loop_coeffs to the attribute types dict

* Needed to give nodes unique names to start generating hardware

* Adding a custom HLSBackend where the tcl generation is overridden so that we can include the hlsextension directory

* Fixing some portname issues in the generated HLS code

* IP successfully building

* Added cppsim support, passed suspiciously easily

* Added some temporary stop-gaps with a brainsmith_templates so that we can support vector inputs before they appear in finn/dev

* Fixing loop bound/coefficient zipping ordering

* Reshaping now happening properly and avoiding cppsim segfault

* removing IPgen step... for now...

* Adding testing from pytorch for the shuffles

* cppsim from pytorch to hw is passing

* Ramping up testing for all the shuffle types

* Removing redundant reshape in testing

* First cut at rtlsim support for shuffles

* First shuffle RTLSim tests passing

* cleaning up the test a little

* Cleaning up the InferShuffle transformation

* shuffle cppsim codegen cleanup

* fixing bug with shape of output when a reshape was present

* Needed to increase liveness threshold to get all the rtlsim's to pass'

* Bigger bump needed?

* [BugFix] Fixed issue with using old Brevitas API for quant_act_scale.

* Was including the file from the location

* Using the plugin's template now

* Removing test that doesn't make sense anymore

* Removing INT16 for now focusing testing on INT8 for EoY goal

* Adding the latest Brevitas bert build script and starting work on the cleanup scripts

* Datatype name fix

* cppsim integration

* Fixing issues with the decapitation step

* Added model tail removal custom step

* Cleaning up the cleanup script

* Removing redundant cleanup step

* Adding an endtoend script and updating the README

* Ensuring hash's and branches are consistent on the README

* Added a minimal initial endtoend test

* test fixed

* Added a switch to end2end test to attempt IP generation (this is currently failing)

* Extended the test to track how many ops have been successfully specialised and what percentage

* Have the end2end test export a json dashboard file instead for tracking progress.

* refactoring the endtoend test a bit to use fixtures and track progress through the build process

* Updated testing to track various bits

* RTLSim for QuantSoftMax

* Removing prepare_rtlsim stub

* QuantSoftMax RTLSim bugfixes (working now)

* fix issue of passing datatypes instead of datatype strings

* Adding template types to the treereduction operation

* cppsim compiling, for the half it required some casting that I was not quite sure about.

* ensure that the context array is np.float32

* Getting stuff working with the latest changes

* Clean up remove head and add streamlining steps

* Add streamlining steps for softmax

* add gather to crop

* Fixing linker library paths and include directories for 2024.2 compatibility

* Cleanup

* tracking individual steps now with fixtures dependencies, also added the ability to dump data to the dashboard json file

* Refactored testing so that each step in the build flow is a separate pytest fixture. If we want to add a test at any point in the build flow we can just pass the step fixture in as an argument and then the cached build at that specific point will be picked up"

* Starting to bring in the default steps

* Generate a test for each step added automatically

* Trying as much of the default flow as possible

* removing tests that don't make sense right now

* fixing the custom steps

* Remove call to default convert_to_hw

* Reverting back to old specialise layers

* need dataflow partition, comment out for now

* Removing duplication of the custom steps for BERT and duplicated scripts

* updating endtoend script to include some of the default steps

* commenting out the last few steps for now

* Add a check at the end to see if hls synth went okay

* dashboard json data update

* Cleaning up the custom steps

* Docstring explanations of the custom_steps required for BERT also cleaned up the flow a bit

* bringing up validation testing of some of the steps

* Adding python execution model for the shuffle

* Added a small function for validation that when a test fails will examine the contexts and show what is the same and what differs

* Silly mistake with the shuffle execute, it was not writing the result back into the context but was returning it

* Elemwise integration

* Adding UINT8 testcase which is the same as the BERT model

* Increasing the timeout on softmax tests

* Changing paths to match new 2024.2 directory structure

* keep things float32 for now

* Fixing case issue on SIMD attribute allowed the compilation to go further

* boilerplate prepare_rtl sim is okay now, removing overridden version

* Input int8, 2024.2 update

* FuncLayerNorm bugfix and FLOAT32 testcase

* "exec_mode" fix and code cleanup

* Merge feature/plugin/layernorm_stf

* support multiple lines

* Added template parameter to enable/disable the quant stage at the end of the softmax

* Adjusting the nodeattr for shuffle so that it is compatible with the set_target_fps transformation

* QuantSoftMax nodeattr compatibility with set_fps_target transformation

* Adding nodeattr so that layernorm is compatible with set_target_fps transformations

* simd to SIMD

* Non Quant softmax passing cppsim

* Validation is having a lot more success with HWSoftMax rather than QuantSoftMax

* reintroducing some essential streamlining steps, validation looking a lot better

* Endtoend up without fps_target yet

* integer cycles to stop issue in set_fifo_depths

* Using the v80 part number for the softmax tests

* Fix for the issue causing the stitched rtl sim stall

* Setting reasonable fps target for initial pipecleaning

* Fix for infering the datatypes in the shuffle node thanks @auphelia

* Adding some configuration files for the bert end2end flow

* Added some expected input and output npy files

* Removing start step

* Adding correct expected output

* Adding an RTLSim node-by-node test to the pytests. Adjusting the configuration for a default build flow.

* Adding more rtlsim based testing to the end2end pytests

* Saving the context of the node-by-node runs under a different dir name

* generate a reference IO each time due to randomly generated weights in brevitas script

* Adding a custom step that generates the reference IO for each run for validation

* SIMD parameter for shuffles in testing is now properly being set, some tests are now failing cppsim and need fixing

* Not every loop coeff should be divided by simd

* Fixed the shuffle SIMD issue

* Making more command line arguments available for the parameter sweeping for the bert_build demo scripts

* Woops left in note

* Removing the custom debugging steps from the build flow

* Adding an example bash script to sweep over some parameters.

* Added a simple script to print the results of param sweep

* Cleaning up to remove c++17 warning

* Tidying up comments / warnings for demos

* Using board instead of fpga_part

* Making the output look a bit neater

* Removing unused validation steps

* fix param sweep

* Slight tweak to example param sweep script

* Adding a makefile and configs for some single layer and three layer configurations.

* We have some large fifos in these builds that need to be split.

* Updating the Brevitas model as per @nfraser suggestion

* Fix circular make dependency

* Works using later qonnx changes

* New FIFO depth configurations for the three layers, folding configuration might not match the main plugin version though.

* Added new preconfigured designs for latest brevitas changes.

* Adding license file headers

* updating to correct link in setup instructions

* Tidying up QuantSoftMax/SoftMax

* Cleaning up utils and testing

* Cleaning up endtoend pytestingclear

* Adding back in the bitwidth option for the parameter sweep with the new model generation

* Added a parameter for changing the sequence length

* Skipping LN test for now

* Changed the artifact naming convention a little

* Remove extraneous implementation of QuantizeLayerNormalization

* Added a script to generate a config (pre FIFO depth sizing) for a particular folding configuration as we explore the DSE side of the Bert build

* Added a makefile recipe for a maximum folding three layer design for passing to RW team

* Adjusting number of layers on the design

* Manually control the fifo depth stage instead of setting it if a param file is present

* Need to come up with better arg naming for parameters, maybe just enforce longargs?

* Makefile recipies use the generation script for various SIMD/PE configurations rather than prebaking them

---------

Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: jsmonson <jsmonson@gmail.com>

* BERT builder flow arguments for fifosim n_inferences (#6)

* Added extra arguments to reflect latest change in finn/custom/transformer that enables you to override the number of inferences that the fifo depth sizing stage performs.

* Fixing the recipies and simplifying

* [SoftMax] New Improved SoftMax  (#11)

* Improvements to SoftMax hardware efficiency and also adding support for ap_float<W,I> datatypes.

* Fixes and compiler integration for new SoftMax

* fixing license header

* [BugFix] Issues with incorrect configuration of SIMD for ShuffleB nodes on three layer designs (#9)

* Adding check to make sure that we don't accidentally set SIMD for shuffleB yet, also updated the config generation so that we do not accidentally set the wrong shuffle in later layers

* Cleaning up the build scripts a little thanks @auphelia

* Moving the constraining of shuffle paramemters and pumpedCompute to temporary custom transformations so that they are more reliable

* Removing the temporary check and relying on the custom pass for now until the parallel transpose op comes online

* Fixed the return type of the custom transformations

* Adding cycle testing to custom op test scripts  (#7)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Added a custom step that extracts metadata for the shell integration flow (#14)

* [TinyBERT] Removing accidentally included start_step in the endtoend flow (#15)

* Removing the accidentally included startstep in the endtoend flow

* Restoring the default to 8 for bitwidth

* Removing rtlsim_backend after pyverilator deprecation (#16)

* Name stylize BrainSmith --> Brainsmith (#17)

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>

* [TinyBERT] Add ref IO to stitched_ip as part of metadata handover (#18)

* Include the reference IO as part of the metadata handover

* typo fix

* [Testing] Created OpTest class for abstracting CustomOp tests (#19)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Created OpTest class for abstracting CustomOp tests
- This class helps reduce shared boilerplate code between tests for custom FINN ops.
- The OpTest class is designed to be inherited by custom test classes. These custom test classes will inherit pre-written commonly used tests, and helper functions to make writing tests easier.
- An example of a test designed using OpTest can be found at the end of `./test/fpgadataflow/test_fpgadataflow_layernorm.py`.
- While functional, the class is still a work in progress, and more functionality will be added in alignment with the needs of the engineers who use it.

* Applied linting
- Applied linting using black's default settings.

* Created target_fpga fixture, removed prints, added SIMD ids
- Target FPGA, as used by the model_specialise fixture, is now a fixture, which can be overridden by a test class.
- Removed print statements in op_test.py that were used for debugging
- Added IDs to TestLayerNorms SIMD parameters. Pytest now displays SIMD1, SIMD2, SIMD4, instead of 1, 2, 4. More human-readable!

* Implemented reviewer suggestions, new 'target_node' fixture, improved typing
- Implemented @STFleming 's suggestions:
  - The `exec_mode` comparsisons at lines 65 and 68 now use `==` instead of `is`.
  - The reference to `LayerNorm` in the comment at line 173 has been removed.
  - `apply_transforms()` no longer uses an `assert`, instead it raises a `RuntimeError`.
- Implemented a new fixture, `target_node()`. This fixture returns an integer, specifiying the index in the model of the node we're testing. This means a model can contain nodes/layers other than the the one we want to test.
- Improved typing consistency throughout 'op_test.py': `input_tensors()` and `apply_transforms()` were missing parameter type hints.

* Initial repository structure (#20)

* Formatting bert_build as a job

* Further iteration/brainstorming

* Initial FINN docker transplant

* Adding deps to git ignore

* [Deps] Restructure python github repo installs (#8)

Co-authored-by: auphelia <jakobapk@web.de>

* Initial docker structuring for BrainSmith

* entrypoint path bugfix

* [Docker] Enable interactive mode for docker container (#10)

* Added model profiling scripts

* Hotpatch to remove pyverilator

* Normalize line endings in SUPPORT.md

* finnbrainsmith --> brainsmith/finnlib paths

* Tools folder restructure

* Fix gen_bert paths & name in expand_norms

* Custom QONNX branch to fix is_finn

* Removed old QuantLayerNorm func

* Initial job runner structuring

* Job structure v0, structure for profiling improvements

* Updated readme

* Template path fix

* Unsued import and formatting cleanup

* FP IP import fix

* Docker updates for pyxsi

* Pyxsi path fix

* Onnx path + linting fixes

* Removed finnlib, moving up sub folders

* Moved run_job to core for consistency

* Linting cleanup

* Updated README

* Added RTL placeholder

* Typo & gitignore fixes

* Updated finnlib to brainsmith in tests

* bert_steps path fix in tests

* Fix punctuation in README instructions.

* Update LICENSE: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update LICENSE: Brainsmith name fix 2

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update README.md - typo fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update brainsmith/tools/README.md: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Removed exec from fetch_repos

* Copyright typo fix

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Add Custom ONNXSCRIPT repository to BrainSmith (#21)

* add custom onnxscript branch

* Add TODO for reconciling onnxscript dependencies

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>

* Revert "Add Custom ONNXSCRIPT repository to BrainSmith (#21)" (#22)

This reverts commit d72e045.

* [CustomOps] Update brainsmith custom ops with changes on finn side (#25)

* Initial continuous integration tests (#24)

* Initial attempt at docker build action

* Added branch name to action

* PR & weekly tests for dev/ci-actions

* Added self-hosted runner

* Adjusted runs-on label

* path fix

* Added debug to orient pwd

* Added pytest keyword through run-docker.sh

* Fixed license path

* Updated upload-artifats to v4

* Reorganize bert demo for github action

* Updated run-docker CLI args

* Added e2e test to actions

* Removed build artifacts

* Fix ci.yml run-docker statement

* Removed "push" trigger

* Merge with develop changes and add num workers env variable

* Re-added push trigger for testing

* Fix merge

* Temporarily disabled docker and pytest for e2e validation

* Fix BSMITH_BUILD_DIR env variable

* Remove push trigger, since PR trigger is sufficient

* Remove tesing branches and triggers for PR

* Remove auto-gen docs

* Delete demos/bert/configs/l1_simd12_pe8.json

Removed extraneous config from test

---------

Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>

* Revert onnxscript add Revert (#26)

* add custom onnxscript branch

* fix torch error

* readd todo

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Fix Dynamic Matmul Initial Config For BERT-Large (#28)

* fix formatting with copilot

* fix dynamic matmul config when sizing is not divisble by 3

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* fix argparse arg that could never be false (#30)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Patch Pull Request #30: Update args variable to match new argument name (#31)

* fix argparse arg that could never be false

* update fifosizing arg in hw compiler to match new argument name

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* update pytorch to 2.7 (#34)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* [Hotfix] Cleanup CI runner artifacts (#33)

* Added cleanup steps and job

* Made num_default_worker env variable

* update brevitas commit hash (#36)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Set onnxscript to a fixed commit id (#37)

* set to a fixed commit #

* moved up to previous latest commit

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Hardware Kernel Generator: RTL Parser & wrapper generation (#32)

* Debugging ckpt 0

* Fucntional parser

* Organized docs

* Fix interface docs name

* Functional interface, broke parser, debugging

* Debug ckpt 0

* Debug ckpt 1 -- functional width parsing

* Debug ckpt 2

* rtl_parser test suite passing

* All pytests passing

* parser.py audit

* Refactoring parser.py

* Removed old tests

* Organzied docs & logs

* Cleanup interface files

* Added license header to tests

* Updated readme

* Improved docstrings, combined interface-types+data

* Updated readme

* Add md type to convo log

* Initial RTL template generation

* HKG test passes

* Improve AXI detection resiliency

* Debug ckpt 0

* Functional RTL Template generation

* Initial structure

* Initial debugg ckpt

* Cleanup & streamlining pragma & interface code

* test_rtl_parser core

* Partial interface refactor

* rtl_parser test suite fully passing

* Begin HWCOp implementation

* Fix onnxscript dependencies

* Removed test artifacts

* RTL parser readme & comment cleanup, initial layout detector

* Test file cleanup

* RTL parser test suite clean-up & refactor

* Cleaned up placeholders

* Consolidated LLM artifacts to docs/rtl_parser

* Cleaned up old examples

* Removed duplicate, outdated test

* Removed layout files, fixed license headers

* Added HKG readme

* Add BERT-Large CI Test (#40)

* set to a fixed commit #

* add bert large single layer test

* moved up to previous latest commit

* reduce folding config

* update folding parameter to account for absence of pTranspose

* add bi-weekly bert-large single layer ci test.

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Docker workflow modernization (#38)

Comprehensive modernization of Brainsmith's Docker workflow and GitHub Actions CI system, introducing persistent container management and modular action architecture that reduces code duplication by 75% while achieving 73% performance improvements for multi-command workflows.

* Update FINN (#41)

* Convert pyxsi commands to finnxsi

* [bert-folding] Adjust folding script to correctly set SIMD and PE for dynamic MVAUs

* [Deps] Reset qonnx url to main repo

* Reset finn commit to custom/transformer

* Core DSE & Plugin Library (#44)

  Complete architectural overhaul introducing:
  - Plugin system for extensible kernels, transforms, and build steps
  - Blueprint YAML interface for declarative design space configuration
  - Segment-based execution tree for efficient DSE with computation reuse
  - Unified `smithy` CLI replacing run-docker.sh with improved container management
  - Reorganized module structure: custom_op → kernels, transformation → transforms
  - New core modules for design parsing, DSE runners, and framework adapters
  - Modular GitHub Actions workflows replacing monolithic CI

  Breaking changes:
  - Module paths changed (e.g., brainsmith.custom_op → brainsmith.kernels)
  - Docker workflow now uses ./smithy instead of ./run-docker.sh
  - Configuration format migrated from Python to YAML blueprints
  - Renamed demos/ to examples/ following standard conventions

  This refactor establishes the foundation for planned features including
  multi-layer offload, parallelized tree execution, and automated kernel
  integration while maintaining backward compatibility for the BERT demo.

* [Deps] Update and fix finn and qonnx deps (#54)

Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: Shane Fleming <shane.fleming@amd.com>
Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Daniel Penrose <Daniel.Penrose@amd.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>
tafk7 added a commit that referenced this pull request Sep 27, 2025
* Created branch, added codeowners

* Initial migration from internal repo (#5)

* Initial commit

* finn flow: pass absolute path names to finn

* Added scripts for roofline analysis

* Making the output save in the current directory

* release v0.2.0

Enable 4 bits

* Bringing up a branch that is just the plugin framework for the BERT ops that have been added

* Initial cleanup script. Performs some simplification and does some surgery to remove the Dropout layer. For some reason the IdentityOps are not being removed

* Added a simple input arg

* Moving to bert_build

* Added a transformation to reorder the inputs so that the remove IdentityOP transformation is effective.

* Initial cut and laying the groundwork for plugin-based shuffle convert_to_hw operator

* Getting stubs up for shuffle op and starting to populate some

* Cleanup and some more asserts to check permutation list and shapes match up

* Initial helper functions for shuffle work

* Adding the input_generator for the cases where the inner dimension is not migrating.

* Adding latest version of the onnx model and combining cleanup and bringup scripts into a single build script with multiple steps.

* Added the infer QuantSoftMax to the pipecleaner build script, renamed the brevitas script

* First cut at shuffle specialise layer

* Registering Shuffle_hls

* Added convert step that is currently skipped

* Added a step that attempts to specialise layers on the pipecleaner model

* Using fpgapart from the config instead

* fixed model

* adding some streamlining steps to the build flow which are passing through on the modified input model

* Initial commit

* finnbrainsmith integration

* Added a simple README for now

* fixing typoe thanks @auphelia

* Initial build shuffle tests up"

* populating member functions for getting the dtype and instream/outstream width for HLS generation

* Adding the loop_coeffs to the attribute types dict

* Needed to give nodes unique names to start generating hardware

* Adding a custom HLSBackend where the tcl generation is overridden so that we can include the hlsextension directory

* Fixing some portname issues in the generated HLS code

* IP successfully building

* Added cppsim support, passed suspiciously easily

* Added some temporary stop-gaps with a brainsmith_templates so that we can support vector inputs before they appear in finn/dev

* Fixing loop bound/coefficient zipping ordering

* Reshaping now happening properly and avoiding cppsim segfault

* removing IPgen step... for now...

* Adding testing from pytorch for the shuffles

* cppsim from pytorch to hw is passing

* Ramping up testing for all the shuffle types

* Removing redundant reshape in testing

* First cut at rtlsim support for shuffles

* First shuffle RTLSim tests passing

* cleaning up the test a little

* Cleaning up the InferShuffle transformation

* shuffle cppsim codegen cleanup

* fixing bug with shape of output when a reshape was present

* Needed to increase liveness threshold to get all the rtlsim's to pass'

* Bigger bump needed?

* [BugFix] Fixed issue with using old Brevitas API for quant_act_scale.

* Was including the file from the location

* Using the plugin's template now

* Removing test that doesn't make sense anymore

* Removing INT16 for now focusing testing on INT8 for EoY goal

* Adding the latest Brevitas bert build script and starting work on the cleanup scripts

* Datatype name fix

* cppsim integration

* Fixing issues with the decapitation step

* Added model tail removal custom step

* Cleaning up the cleanup script

* Removing redundant cleanup step

* Adding an endtoend script and updating the README

* Ensuring hash's and branches are consistent on the README

* Added a minimal initial endtoend test

* test fixed

* Added a switch to end2end test to attempt IP generation (this is currently failing)

* Extended the test to track how many ops have been successfully specialised and what percentage

* Have the end2end test export a json dashboard file instead for tracking progress.

* refactoring the endtoend test a bit to use fixtures and track progress through the build process

* Updated testing to track various bits

* RTLSim for QuantSoftMax

* Removing prepare_rtlsim stub

* QuantSoftMax RTLSim bugfixes (working now)

* fix issue of passing datatypes instead of datatype strings

* Adding template types to the treereduction operation

* cppsim compiling, for the half it required some casting that I was not quite sure about.

* ensure that the context array is np.float32

* Getting stuff working with the latest changes

* Clean up remove head and add streamlining steps

* Add streamlining steps for softmax

* add gather to crop

* Fixing linker library paths and include directories for 2024.2 compatibility

* Cleanup

* tracking individual steps now with fixtures dependencies, also added the ability to dump data to the dashboard json file

* Refactored testing so that each step in the build flow is a separate pytest fixture. If we want to add a test at any point in the build flow we can just pass the step fixture in as an argument and then the cached build at that specific point will be picked up"

* Starting to bring in the default steps

* Generate a test for each step added automatically

* Trying as much of the default flow as possible

* removing tests that don't make sense right now

* fixing the custom steps

* Remove call to default convert_to_hw

* Reverting back to old specialise layers

* need dataflow partition, comment out for now

* Removing duplication of the custom steps for BERT and duplicated scripts

* updating endtoend script to include some of the default steps

* commenting out the last few steps for now

* Add a check at the end to see if hls synth went okay

* dashboard json data update

* Cleaning up the custom steps

* Docstring explanations of the custom_steps required for BERT also cleaned up the flow a bit

* bringing up validation testing of some of the steps

* Adding python execution model for the shuffle

* Added a small function for validation that when a test fails will examine the contexts and show what is the same and what differs

* Silly mistake with the shuffle execute, it was not writing the result back into the context but was returning it

* Elemwise integration

* Adding UINT8 testcase which is the same as the BERT model

* Increasing the timeout on softmax tests

* Changing paths to match new 2024.2 directory structure

* keep things float32 for now

* Fixing case issue on SIMD attribute allowed the compilation to go further

* boilerplate prepare_rtl sim is okay now, removing overridden version

* Input int8, 2024.2 update

* FuncLayerNorm bugfix and FLOAT32 testcase

* "exec_mode" fix and code cleanup

* Merge feature/plugin/layernorm_stf

* support multiple lines

* Added template parameter to enable/disable the quant stage at the end of the softmax

* Adjusting the nodeattr for shuffle so that it is compatible with the set_target_fps transformation

* QuantSoftMax nodeattr compatibility with set_fps_target transformation

* Adding nodeattr so that layernorm is compatible with set_target_fps transformations

* simd to SIMD

* Non Quant softmax passing cppsim

* Validation is having a lot more success with HWSoftMax rather than QuantSoftMax

* reintroducing some essential streamlining steps, validation looking a lot better

* Endtoend up without fps_target yet

* integer cycles to stop issue in set_fifo_depths

* Using the v80 part number for the softmax tests

* Fix for the issue causing the stitched rtl sim stall

* Setting reasonable fps target for initial pipecleaning

* Fix for infering the datatypes in the shuffle node thanks @auphelia

* Adding some configuration files for the bert end2end flow

* Added some expected input and output npy files

* Removing start step

* Adding correct expected output

* Adding an RTLSim node-by-node test to the pytests. Adjusting the configuration for a default build flow.

* Adding more rtlsim based testing to the end2end pytests

* Saving the context of the node-by-node runs under a different dir name

* generate a reference IO each time due to randomly generated weights in brevitas script

* Adding a custom step that generates the reference IO for each run for validation

* SIMD parameter for shuffles in testing is now properly being set, some tests are now failing cppsim and need fixing

* Not every loop coeff should be divided by simd

* Fixed the shuffle SIMD issue

* Making more command line arguments available for the parameter sweeping for the bert_build demo scripts

* Woops left in note

* Removing the custom debugging steps from the build flow

* Adding an example bash script to sweep over some parameters.

* Added a simple script to print the results of param sweep

* Cleaning up to remove c++17 warning

* Tidying up comments / warnings for demos

* Using board instead of fpga_part

* Making the output look a bit neater

* Removing unused validation steps

* fix param sweep

* Slight tweak to example param sweep script

* Adding a makefile and configs for some single layer and three layer configurations.

* We have some large fifos in these builds that need to be split.

* Updating the Brevitas model as per @nfraser suggestion

* Fix circular make dependency

* Works using later qonnx changes

* New FIFO depth configurations for the three layers, folding configuration might not match the main plugin version though.

* Added new preconfigured designs for latest brevitas changes.

* Adding license file headers

* updating to correct link in setup instructions

* Tidying up QuantSoftMax/SoftMax

* Cleaning up utils and testing

* Cleaning up endtoend pytestingclear

* Adding back in the bitwidth option for the parameter sweep with the new model generation

* Added a parameter for changing the sequence length

* Skipping LN test for now

* Changed the artifact naming convention a little

* Remove extraneous implementation of QuantizeLayerNormalization

* Added a script to generate a config (pre FIFO depth sizing) for a particular folding configuration as we explore the DSE side of the Bert build

* Added a makefile recipe for a maximum folding three layer design for passing to RW team

* Adjusting number of layers on the design

* Manually control the fifo depth stage instead of setting it if a param file is present

* Need to come up with better arg naming for parameters, maybe just enforce longargs?

* Makefile recipies use the generation script for various SIMD/PE configurations rather than prebaking them

---------

Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: jsmonson <jsmonson@gmail.com>

* BERT builder flow arguments for fifosim n_inferences (#6)

* Added extra arguments to reflect latest change in finn/custom/transformer that enables you to override the number of inferences that the fifo depth sizing stage performs.

* Fixing the recipies and simplifying

* [SoftMax] New Improved SoftMax  (#11)

* Improvements to SoftMax hardware efficiency and also adding support for ap_float<W,I> datatypes.

* Fixes and compiler integration for new SoftMax

* fixing license header

* [BugFix] Issues with incorrect configuration of SIMD for ShuffleB nodes on three layer designs (#9)

* Adding check to make sure that we don't accidentally set SIMD for shuffleB yet, also updated the config generation so that we do not accidentally set the wrong shuffle in later layers

* Cleaning up the build scripts a little thanks @auphelia

* Moving the constraining of shuffle paramemters and pumpedCompute to temporary custom transformations so that they are more reliable

* Removing the temporary check and relying on the custom pass for now until the parallel transpose op comes online

* Fixed the return type of the custom transformations

* Adding cycle testing to custom op test scripts  (#7)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Added a custom step that extracts metadata for the shell integration flow (#14)

* [TinyBERT] Removing accidentally included start_step in the endtoend flow (#15)

* Removing the accidentally included startstep in the endtoend flow

* Restoring the default to 8 for bitwidth

* Removing rtlsim_backend after pyverilator deprecation (#16)

* Name stylize BrainSmith --> Brainsmith (#17)

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>

* [TinyBERT] Add ref IO to stitched_ip as part of metadata handover (#18)

* Include the reference IO as part of the metadata handover

* typo fix

* [Testing] Created OpTest class for abstracting CustomOp tests (#19)

* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Created OpTest class for abstracting CustomOp tests
- This class helps reduce shared boilerplate code between tests for custom FINN ops.
- The OpTest class is designed to be inherited by custom test classes. These custom test classes will inherit pre-written commonly used tests, and helper functions to make writing tests easier.
- An example of a test designed using OpTest can be found at the end of `./test/fpgadataflow/test_fpgadataflow_layernorm.py`.
- While functional, the class is still a work in progress, and more functionality will be added in alignment with the needs of the engineers who use it.

* Applied linting
- Applied linting using black's default settings.

* Created target_fpga fixture, removed prints, added SIMD ids
- Target FPGA, as used by the model_specialise fixture, is now a fixture, which can be overridden by a test class.
- Removed print statements in op_test.py that were used for debugging
- Added IDs to TestLayerNorms SIMD parameters. Pytest now displays SIMD1, SIMD2, SIMD4, instead of 1, 2, 4. More human-readable!

* Implemented reviewer suggestions, new 'target_node' fixture, improved typing
- Implemented @STFleming 's suggestions:
  - The `exec_mode` comparsisons at lines 65 and 68 now use `==` instead of `is`.
  - The reference to `LayerNorm` in the comment at line 173 has been removed.
  - `apply_transforms()` no longer uses an `assert`, instead it raises a `RuntimeError`.
- Implemented a new fixture, `target_node()`. This fixture returns an integer, specifiying the index in the model of the node we're testing. This means a model can contain nodes/layers other than the the one we want to test.
- Improved typing consistency throughout 'op_test.py': `input_tensors()` and `apply_transforms()` were missing parameter type hints.

* Initial repository structure (#20)

* Formatting bert_build as a job

* Further iteration/brainstorming

* Initial FINN docker transplant

* Adding deps to git ignore

* [Deps] Restructure python github repo installs (#8)

Co-authored-by: auphelia <jakobapk@web.de>

* Initial docker structuring for BrainSmith

* entrypoint path bugfix

* [Docker] Enable interactive mode for docker container (#10)

* Added model profiling scripts

* Hotpatch to remove pyverilator

* Normalize line endings in SUPPORT.md

* finnbrainsmith --> brainsmith/finnlib paths

* Tools folder restructure

* Fix gen_bert paths & name in expand_norms

* Custom QONNX branch to fix is_finn

* Removed old QuantLayerNorm func

* Initial job runner structuring

* Job structure v0, structure for profiling improvements

* Updated readme

* Template path fix

* Unsued import and formatting cleanup

* FP IP import fix

* Docker updates for pyxsi

* Pyxsi path fix

* Onnx path + linting fixes

* Removed finnlib, moving up sub folders

* Moved run_job to core for consistency

* Linting cleanup

* Updated README

* Added RTL placeholder

* Typo & gitignore fixes

* Updated finnlib to brainsmith in tests

* bert_steps path fix in tests

* Fix punctuation in README instructions.

* Update LICENSE: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update LICENSE: Brainsmith name fix 2

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update README.md - typo fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update brainsmith/tools/README.md: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Removed exec from fetch_repos

* Copyright typo fix

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Add Custom ONNXSCRIPT repository to BrainSmith (#21)

* add custom onnxscript branch

* Add TODO for reconciling onnxscript dependencies

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>

* Revert "Add Custom ONNXSCRIPT repository to BrainSmith (#21)" (#22)

This reverts commit d72e045.

* [CustomOps] Update brainsmith custom ops with changes on finn side (#25)

* Initial continuous integration tests (#24)

* Initial attempt at docker build action

* Added branch name to action

* PR & weekly tests for dev/ci-actions

* Added self-hosted runner

* Adjusted runs-on label

* path fix

* Added debug to orient pwd

* Added pytest keyword through run-docker.sh

* Fixed license path

* Updated upload-artifats to v4

* Reorganize bert demo for github action

* Updated run-docker CLI args

* Added e2e test to actions

* Removed build artifacts

* Fix ci.yml run-docker statement

* Removed "push" trigger

* Merge with develop changes and add num workers env variable

* Re-added push trigger for testing

* Fix merge

* Temporarily disabled docker and pytest for e2e validation

* Fix BSMITH_BUILD_DIR env variable

* Remove push trigger, since PR trigger is sufficient

* Remove tesing branches and triggers for PR

* Remove auto-gen docs

* Delete demos/bert/configs/l1_simd12_pe8.json

Removed extraneous config from test

---------

Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>

* Revert onnxscript add Revert (#26)

* add custom onnxscript branch

* fix torch error

* readd todo

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Fix Dynamic Matmul Initial Config For BERT-Large (#28)

* fix formatting with copilot

* fix dynamic matmul config when sizing is not divisble by 3

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* fix argparse arg that could never be false (#30)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Patch Pull Request #30: Update args variable to match new argument name (#31)

* fix argparse arg that could never be false

* update fifosizing arg in hw compiler to match new argument name

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* update pytorch to 2.7 (#34)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* [Hotfix] Cleanup CI runner artifacts (#33)

* Added cleanup steps and job

* Made num_default_worker env variable

* update brevitas commit hash (#36)

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Set onnxscript to a fixed commit id (#37)

* set to a fixed commit #

* moved up to previous latest commit

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Hardware Kernel Generator: RTL Parser & wrapper generation (#32)

* Debugging ckpt 0

* Fucntional parser

* Organized docs

* Fix interface docs name

* Functional interface, broke parser, debugging

* Debug ckpt 0

* Debug ckpt 1 -- functional width parsing

* Debug ckpt 2

* rtl_parser test suite passing

* All pytests passing

* parser.py audit

* Refactoring parser.py

* Removed old tests

* Organzied docs & logs

* Cleanup interface files

* Added license header to tests

* Updated readme

* Improved docstrings, combined interface-types+data

* Updated readme

* Add md type to convo log

* Initial RTL template generation

* HKG test passes

* Improve AXI detection resiliency

* Debug ckpt 0

* Functional RTL Template generation

* Initial structure

* Initial debugg ckpt

* Cleanup & streamlining pragma & interface code

* test_rtl_parser core

* Partial interface refactor

* rtl_parser test suite fully passing

* Begin HWCOp implementation

* Fix onnxscript dependencies

* Removed test artifacts

* RTL parser readme & comment cleanup, initial layout detector

* Test file cleanup

* RTL parser test suite clean-up & refactor

* Cleaned up placeholders

* Consolidated LLM artifacts to docs/rtl_parser

* Cleaned up old examples

* Removed duplicate, outdated test

* Removed layout files, fixed license headers

* Added HKG readme

* Add BERT-Large CI Test (#40)

* set to a fixed commit #

* add bert large single layer test

* moved up to previous latest commit

* reduce folding config

* update folding parameter to account for absence of pTranspose

* add bi-weekly bert-large single layer ci test.

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

* Docker workflow modernization (#38)

Comprehensive modernization of Brainsmith's Docker workflow and GitHub Actions CI system, introducing persistent container management and modular action architecture that reduces code duplication by 75% while achieving 73% performance improvements for multi-command workflows.

* Update FINN (#41)

* Convert pyxsi commands to finnxsi

* [bert-folding] Adjust folding script to correctly set SIMD and PE for dynamic MVAUs

* [Deps] Reset qonnx url to main repo

* Reset finn commit to custom/transformer

* Core DSE & Plugin Library (#44)

  Complete architectural overhaul introducing:
  - Plugin system for extensible kernels, transforms, and build steps
  - Blueprint YAML interface for declarative design space configuration
  - Segment-based execution tree for efficient DSE with computation reuse
  - Unified `smithy` CLI replacing run-docker.sh with improved container management
  - Reorganized module structure: custom_op → kernels, transformation → transforms
  - New core modules for design parsing, DSE runners, and framework adapters
  - Modular GitHub Actions workflows replacing monolithic CI

  Breaking changes:
  - Module paths changed (e.g., brainsmith.custom_op → brainsmith.kernels)
  - Docker workflow now uses ./smithy instead of ./run-docker.sh
  - Configuration format migrated from Python to YAML blueprints
  - Renamed demos/ to examples/ following standard conventions

  This refactor establishes the foundation for planned features including
  multi-layer offload, parallelized tree execution, and automated kernel
  integration while maintaining backward compatibility for the BERT demo.

* [Deps] Update and fix finn and qonnx deps (#54)

Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: Shane Fleming <shane.fleming@amd.com>
Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Daniel Penrose <Daniel.Penrose@amd.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>
tafk7 pushed a commit that referenced this pull request Sep 27, 2025
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants