Add Asynchronous Inference option #21

LinusJungemann · 2025-07-14T07:59:30Z

No description provided.

Copilot

Pull Request Overview

This PR introduces support for asynchronous inference by adding dynamic queue implementations and callback/drain mechanisms, and updates tests and benchmarks accordingly.

Added DynamicSPSCQueue and replaced static queue usage in asynchronous buffer wrappers.
Extended DeviceHandler, Accelerator, and BaseDriver APIs with registerCallback and drain methods.
Updated integration tests and benchmarks to remove fixed sleeps and rely on blocking or callback-driven result retrieval.

Reviewed Changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.

File	Description
unittests/utils/SPSCQueueTest.cpp	Replaced range-insert with manual push_back loop
src/FINNCppDriver/utils/FPGAReset.hpp	Added unit-test bypass for reset and included chrono literals in header
src/FINNCppDriver/core/BaseDriver.hpp	Exposed `registerCallback` and `drain` APIs for async inference
integrationtest/AsyncInference.cpp	Removed sleeps and switched to blocking `getResults()`

Comments suppressed due to low confidence (1)

unittests/utils/SPSCQueueTest.cpp:514

[nitpick] Instead of manually looping and calling push_back, you can use all_consumed.insert(all_consumed.end(), results.begin(), results.begin() + dequeued) for clearer intent and potentially better performance.

                for (size_t i = 0; i < dequeued; i++) {

Copilot · 2025-07-14T12:32:39Z

src/FINNCppDriver/utils/FPGAReset.hpp

+#include <thread>
+#include <vector>
+
+using namespace std::chrono_literals;


[nitpick] Avoid using namespace directives in headers as they pollute the global namespace for all translation units. Consider qualifying chrono literals where needed or moving the using directive into the implementation file.

Suggested change

using namespace std::chrono_literals;

// Removed `using namespace std::chrono_literals;` to avoid namespace pollution.

Copilot · 2025-07-14T12:32:39Z

src/FINNCppDriver/core/BaseDriver.hpp

+        /**
+         * @brief Register a callback function to be called when the inference of a batch is finished
+         */
+        template<typename = std::enable_if<!SynchronousInference>>


[nitpick] This use of std::enable_if does not actually disable the template when the condition is false. To constrain these methods properly, use std::enable_if_t or C++20 concepts (requires) so the overload is SFINAE-friendly.

Suggested change

template<typename = std::enable_if<!SynchronousInference>>

template<typename = std::enable_if_t<!SynchronousInference>>

Copilot · 2025-07-14T12:32:40Z

src/FINNCppDriver/core/DeviceBuffer/AsyncDeviceBuffers.hpp

-            FINN_LOG(loglevel::info) << "Destructing Asynchronous input buffer";
-            workerThread.request_stop();  // Joining will be handled automatically by destruction
-        };
+        ~AsyncDeviceInputBuffer() override { FINN_LOG(loglevel::info) << "Destructing Asynchronous input buffer" << std::endl; };


The destructor no longer calls queue.shutdown(), so the worker thread may attempt to enqueue after it's signaled to stop, causing potential data races or hangs. Ensure the queue is shut down before thread teardown (e.g., call queue.shutdown() before logging).

Suggested change

~AsyncDeviceInputBuffer() override { FINN_LOG(loglevel::info) << "Destructing Asynchronous input buffer" << std::endl; };

~AsyncDeviceInputBuffer() override {

prepareForShutdown();

FINN_LOG(loglevel::info) << "Destructing Asynchronous input buffer" << std::endl;

};

Copilot · 2025-07-14T12:32:40Z

integrationtest/AsyncInference.cpp

    driver.input(data.begin(), data.end());
-    std::this_thread::sleep_for(200ms);
-    auto results = driver.getResults();
+    auto results = driver.getResults();  // This should block until the results are available

-    Finn::vector<uint16_t> expectedResults = { 98, 50, 65476, 65493, 27 };
+    Finn::vector<uint16_t> expectedResults = {98, 50, 65476, 65493, 27};


Removing the sleep assumes getResults() will block until data is ready, but if the async pipeline hasn't produced anything yet, this may hang or fail. Add an explicit wait or callback notification to ensure data availability before calling getResults().

* Remove hardcoded batch size from kernel execution * Implement setBatchSize for complete Stack * Remove RingBuffer from Synchronous Inference and add full batch mapping * Deduplicate batchsize in basedriver & fix unittests * Fix integrationtests * Change input kernel code to run concurrrently to output kernel code * Optimize inference of lower batch sizes * Increase packing performance * Further optimize OpenMP * Optimize Utils * Some small changes * Add example data * Small Amounts of cleanup * Change Driver to run without XRT managed kernels * Add more efficient version of execute method * Hotfix FPGA bricking * Simplify inference interface to speed up inference * Update unittest * Simplify code * Update CMake * Fix Release Build CMakeLists * Fix wrong old variable names in CMake * Fix formatting * Change format target * Add changes to paper version * Add final paper changes * Add basic host mem functionality * Add switch for Host Memory Access and fix unittests for User Managed Kernels support * Revert timing changes for paper * Formatting changes * Remove unneccesary benchmark * Small changes * Clean up and update dependencies * Merge dev into paperVersion * Fix setting of Host Mem Var and update cppcheck config * Update CI definition * Fix typo in CI * Remove hardcoded path from examples * Fix linting for json files * Expand integrationTests * Update FPGA PCIe signatures * Increase timelimits of jobs * Switch CI partition to HACC for testing * Bump Graphviz version * Optimize CI * Fix integrationtest path * Update CI and add performance benchmark * Fix paths * Change logger and add exptected performance results to synchronous inference benchmark * Update expected results * Add missing path change * Add regression tests * Add test condition to regression test * Fix broken bash script in CI * Fix broken bash script in CI * Update dependencies in CI pipeline * Fix missing boost lib * Fix missing libs * Change number of processors to be correct and simplify regression tests * Fix typo in ci * Fix floating point comparison * Add debug print to CI * Add debug print to CI * Filter colored output * Filter colored output * Update .gitlab-ci.yml * Update .gitlab-ci.yml * Update .gitlab-ci.yml * Update remove shipped boost version and merge main to dev (#18) * Merge dev into main for v1.2 release (#13) * Remove hardcoded batch size from kernel execution * Implement setBatchSize for complete Stack * Remove RingBuffer from Synchronous Inference and add full batch mapping * Deduplicate batchsize in basedriver & fix unittests * Fix integrationtests * Change input kernel code to run concurrrently to output kernel code * Optimize inference of lower batch sizes * Increase packing performance * Further optimize OpenMP * Optimize Utils * Some small changes * Add example data * Small Amounts of cleanup * Change Driver to run without XRT managed kernels * Add more efficient version of execute method * Hotfix FPGA bricking * Simplify inference interface to speed up inference * Update unittest * Simplify code * Update CMake * Fix Release Build CMakeLists * Fix wrong old variable names in CMake * Fix formatting * Change format target * Add changes to paper version * Add final paper changes * Add basic host mem functionality * Add switch for Host Memory Access and fix unittests for User Managed Kernels support * Revert timing changes for paper * Formatting changes * Remove unneccesary benchmark * Small changes * Clean up and update dependencies * Merge dev into paperVersion * Fix setting of Host Mem Var and update cppcheck config * Update CI definition * Fix typo in CI * Remove hardcoded path from examples * Fix linting for json files * Expand integrationTests * Update FPGA PCIe signatures * Increase timelimits of jobs * Switch CI partition to HACC for testing * Bump Graphviz version * Optimize CI * Fix integrationtest path * Update CI and add performance benchmark * Fix paths * Change logger and add exptected performance results to synchronous inference benchmark * Update expected results * Add missing path change * Add regression tests * Add test condition to regression test * Fix broken bash script in CI * Fix broken bash script in CI * Update dependencies in CI pipeline * Fix missing boost lib * Fix missing libs * Change number of processors to be correct and simplify regression tests * Fix typo in ci * Fix floating point comparison * Add debug print to CI * Add debug print to CI * Filter colored output * Filter colored output * Update .gitlab-ci.yml * Update .gitlab-ci.yml * Update .gitlab-ci.yml * Pending changes exported from your codespace * Remove boost form being shipped with the driver * Update CI * Refactor build configuration: remove mdspan submodule, update CMakeLists for output directories, and enhance FINNDriver with static configuration check * update README.md * Format FinnDatatypes.hpp * Fix linting * Update src/FINNCppDriver/FINNDriver.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/FINNCppDriver/core/DeviceBuffer/DeviceBuffer.hpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/FINNCppDriver/FINNDriver.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update readme * Remove boost from the driver (#20) * Remove boost logging and program options * Add SPSCQueue * Remove Ringbuffer * Remove boost completely * Update CI * Fix linting * Add missing includes * Fix Integrationtests * Fix ODR * Fix bug where batch size is not set correctly * Fix linting * Update dependencies * Bump module versions * Update include files * Fix Async Inference * Add Asynchronous Inference option (#21) * Disable FPGA resets when running unittests * Finish async implementation * Update CI * Add missing header * Fix aligned alloc * Increase timeout * Linting and CI timeout * Increase CI timeout * Fix warnings * Add unused annotion * Add missing doxygen documentation * Merge branch 'main' into dev * Fix small details * Merge main into dev * Update README.md --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LinusJungemann added 8 commits July 2, 2025 11:39

Disable FPGA resets when running unittests

129f248

Finish async implementation

c1a9e87

Update CI

ed1f0a2

Add missing header

7af8565

Fix aligned alloc

66fcbcd

Increase timeout

c40b958

Linting and CI timeout

a2c5bd7

Increase CI timeout

7ede73a

LinusJungemann requested a review from bwintermann July 14, 2025 07:59

bwintermann approved these changes Jul 14, 2025

View reviewed changes

LinusJungemann requested a review from Copilot July 14, 2025 12:25

Copilot AI reviewed Jul 14, 2025

View reviewed changes

LinusJungemann added 3 commits July 15, 2025 09:42

Fix warnings

b8de1d2

Add unused annotion

73626c2

Add missing doxygen documentation

5de5138

LinusJungemann merged commit 6c1f0f4 into dev Jul 17, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Asynchronous Inference option #21

Add Asynchronous Inference option #21

Uh oh!

LinusJungemann commented Jul 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	using namespace std::chrono_literals;
	// Removed `using namespace std::chrono_literals;` to avoid namespace pollution.

	template<typename = std::enable_if<!SynchronousInference>>
	template<typename = std::enable_if_t<!SynchronousInference>>

Add Asynchronous Inference option #21

Add Asynchronous Inference option #21

Uh oh!

Conversation

LinusJungemann commented Jul 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants