AMD-SHARK-AI Release v3.9.0

Important Note

The project repositories have been renamed to reflect AMD branding:

shark-ai → amd-shark-ai
sharktank → amd-sharktank
sharktuner → amd-sharktuner

Please update your repository references, import statements, and dependencies accordingly.

Breaking Changes: None expected.

Sharktuner

BOO Tuner has been successfully integrated into the project. This addition enables automated performance tuning and optimization of AI workloads (#2433)
Completed Python Binding Architecture for Transform Dialect Spec Construction (#2543, #2584, #2596)

Change Log

Git History

What's Changed

[Sharktank] Remove bias and beta from fp4 gemm asm kernel by @jinchen62 in #2513
[FusilliPlugin] TheRock RFC revisions by @sjain-stanford in #2546
[Tuner] Add KnobAssignment to track z3 feature vars in candidate generation by @RattataKing in #2434
[Sharktank] Add missing methods for InferenceTensor by @Alex-Vasile in #2547
[Sharktank] Replace torch.cat usages with ops.cat by @Alex-Vasile in #2551
[Sharktank] Add ops.abs and ops.log by @Alex-Vasile in #2554
[tuner] use python binding to build td specs for contraction by @bangtianliu in #2543
[Sharktank] Add ops.zeros, ops.ones, ops.ones_like by @Alex-Vasile in #2549
[Sharktank] Use .to() instead of .bool(), .float(), or .int(). Remove .bool() from InferenceTensors by @Alex-Vasile in #2550
[Fusilli] Added arithmetic pointwise ops - Mul, Sub, Div by @a-sidorova in #2545
Bump version to 3.9.0 after 3.8.0 release. by @sa-faizal in #2509
[Fusilli] Create FUSILLI_REQUIRE_OK macro by @AaronStGeorge in #2556
[FusilliPlugin] Cleanup draft RFC by @sjain-stanford in #2558
[FusilliPlugin] FetchContent hipDNN dependency by @AaronStGeorge in #2563
[sharktank] Refactor mask generation to utils from ops by @archana-ramalingam in #2496
[Fusilli] Replace remaining REQUIRE(isOk(...)) uses with new macro in recently added sample by @sjain-stanford in #2566
[Fusilli] Split pointwise tests into two: unary and binary ops by @a-sidorova in #2569
[sharktank] Refactor and cleanup unused imports in sharktank by @archana-ramalingam in #2561
[sharktank] Remove chunked prefill test by @archana-ramalingam in #2562
[CI] Remove Continue On Error In E2E Tests by @yash-amd in #2530
[Sharktank] Add ops.full by @Alex-Vasile in #2565
[Sharktank] Update Rotary Embedding and tests to use ops instead of torch by @Alex-Vasile in #2559
[Sharktank] Add ops.where by @Alex-Vasile in #2567
[Sharktank] Add ops.tensor by @Alex-Vasile in #2568
[Sharktank] Add ops.logical_or by @Alex-Vasile in #2570
[Sharktank] Use ops implementation instead of torch where important by @Alex-Vasile in #2573
[Fusilli] Add support for ConvWGrad by @IanWood1 in #2542
[Fusilli] Use allocateBufferOfType helper by @IanWood1 in #2580
[Fusilli] Pick compiler fixes and change dispatch count checks for conv3d by @sjain-stanford in #2560
[tuner] use python binding to build td specs for convolution by @bangtianliu in #2584
[sharktank] adding weight generation in util for gpt-oss toy model by @oyazdanb in #2574
[Fusilli,FusilliPlugin] Async execution & device selection by @AaronStGeorge in #2487
[sharktank] gpt-oss_theta_block by @oyazdanb in #2583
[Fusilli] Check for consistency in ConvWGradNode's input/output shapes by @IanWood1 in #2585
[Fusilli] Added Grouped Conv support by @a-sidorova in #2581
[CI] Update Gold Time For Benchmark by @yash-amd in #2588
[Fusilli] [NFC] Claude suggested fixes by @sjain-stanford in #2593
[Fusilli] Add ConvDGrad by @IanWood1 in #2582
Upload artifacts for codecov by @zeeshanhaque21 in #2598
Update tokenizers and transformers package versions by @zeeshanhaque21 in #2603
[CI] Gpt-oss e2e in CI by @oyazdanb in #2591
[tuner] use python binding to build td specs for attention by @bangtianliu in #2596
Simplify dispatch and testing logic by redispatching after unpacking tensors by @KyleHerndon in #2464
[Fusilli] Add .clang-tidy by @IanWood1 in #2595
[CI] Match Exact Flags Used For Compile As By Used Iree by @yash-amd in #2602
[CI] Fix For Llama 405B CI by @yash-amd in #2610
[Fusilli] [NFC] Consistent benchmark name by @sjain-stanford in #2611
[Fusilli] Disable iree-llvmcpu-target-cpu flag to mitigate ILLEGAL instruction errors on CPU runners by @sjain-stanford in #2612
[tuner] support phase-aware candidate pruning by @bangtianliu in #2592
[Fusilli] Add clang-tidy checks to CI by @IanWood1 in #2607
[Fusilli] Use clang-tidy (20.0.0) from PATH by @sjain-stanford in #2615
[tuner] set prefetch shared memory option based on layout matching for attention ops by @bangtianliu in #2613
[Fusilli] Header cleanup based on misc-include-cleaner from VSCode/Cursor clang-tidy extension by @sjain-stanford in #2617
[tuner][nfc]: merge imports and update the readme by @bangtianliu in #2621
[Fusilli] Added Grouped ConvDGrad support by @a-sidorova in #2600
[Fusilli] Added Grouped ConvWGrad support by @a-sidorova in #2599
[Fusilli] Re-enable a few of the previously skipped clang-tidy checks by @sjain-stanford in #2618
[Fusilli] Pick IREE fixes for ConvDGrad dispatch count and Illegal instruction flakes by @sjain-stanford in #2625
Fix for TOM CI by @yash-amd in #2622
[Fusilli] Run sharkfuser workflow on all PRs by @IanWood1 in #2634
Ignore boo driver statsu code for iree time out issues by @prosenjitdhole in #2638
[Fusilli] Fix issue link on lit test, switch UNSUPPORTED -> XFAIL, minor nits by @sjain-stanford in #2639
[Fusilli] Add support for ConvWGrad and ConvDGrad to benchmark driver by @raayandhar in #2624
[Fusilli] Set cache dir for benchmark tests as well by @sjain-stanford in #2642
[tuner] Create dedicated IREE requirements file for sharktuner by @bangtianliu in #2641
[tuner] exits gracefully for unsupported cases like mat-vec operations by @bangtianliu in #2627
[Fusilli] Add matmul attributes 1/4 by @IanWood1 in #2631
[Tuner] Support candidate reordering by @RattataKing in #2555
[Fusilli] Added lit and sample tests for grouped strided ConvWGrad by @a-sidorova in #2629
Update ci_boo_conv.yml by @pdhirajkumarprasad in #2647
[Fusilli] Cleanup LINALG-CHECK from certain lit tests by @sjain-stanford in #2646
Add 405b pp2 and pp8 To E2E Tests by @yash-amd in #2645
[Fusilli] batch profiling of commands to automate data collection by @raayandhar in #2590
[CI] Add 405b CI To E2E workflow by @yash-amd in #2571
[Fusilli] Benchmark runner improvements by @sjain-stanford in #2650
Fix Version Issue For The Logs by @yash-amd in #2651
Bump IREE requirement pins to 3.9.0rc20251105 by @shark-pr-automator[bot] in #2498
[sharktank] Change LLM scheduler to allow for more broad uses of chunked prefill by @sogartar in #2506
[sharktank] Materialize indices in PrefillTask to ease branching logic in downstream code by @sogartar in #2507
[tuner][nfc] clean up the code by @bangtianliu in #2656
[sharktank] Make LLM utils support chunked prefill for included Llama 8b eager quick evaluation tests by @sogartar in #2499
[Tuner] Replace hardcoded SIMD=4 with value from IREE target info by @RattataKing in #2649
[Fusilli] Generalize physical tensor layout support by @IanWood1 in #2653
[Fusilli] Bump docker & re-enable 3d grouped conv test by @sjain-stanford in #2664
[sharktuner] Add BOO tuner by @rkayaith in #2433
[Fusilli] Add additional checks to Buffer::allocate by @IanWood1 in #2665
[boo tuner] Switch to newer BOO backend by @rkayaith in #2670
[boo tuner] Reset compilation and allocation caches for each config by @rkayaith in #2671
Bump IREE requirement pins to 3.9.0rc20251112 by @shark-pr-automator[bot] in #2658
[Tuner] Expand tuning candidate logging feature by @RattataKing in #2666
[Tuner] Add and fix codecov test in CI by @RattataKing in #2672
migration to amdshark by @pdhirajkumarprasad in #2677
[boo tuner] Fix error when tuning a config multiple times by @rkayaith in #2681
[Fusilli] Remove Fusilli by @AaronStGeorge in #2682
[tuner] use igemm bindings by @bangtianliu in #2683
Correct Repository, Token Names by @yash-amd in #2685
Fix for migration by @pdhirajkumarprasad in #2690
Update ci-amdsharktank.yml by @pdhirajkumarprasad in #2695
Migration to AMD-Shark by @pdhirajkumarprasad in #2694
Fix for AMD-Shark migration by @pdhirajkumarprasad in #2699
Bump IREE requirement pins to 3.10.0rc20251130 by @shark-pr-automator[bot] in #2703
fixes HuggingFace repo name from amd-amdshark to amd-shark by @amd-vivekag in #2707
[tuner] update the calculation of shared memory usage by @bangtianliu in #2698
Bump IREE requirement pins to 3.10.0rc20251201 by @shark-pr-automator[bot] in #2705
updates docs for SDXL by @amd-vivekag in #2708
[tuner] Sync Padding for TileAndFuse with IREE changes by @bangtianliu in #2692
Update IRPA files for Llm e2e tests by @vivekkhandelwal1 in #2712
Enable Llama 8b Perplexity Test by @vivekkhandelwal1 in #2710
Bump IREE requirement pins to 3.10.0rc20251203 by @shark-pr-automator[bot] in #2709
[tuner]: sync the change of using prefetch_num_stages to replace prefetch_shared_memory by @bangtianliu in #2713
Fix CI Irpa Issues by @yash-amd in #2716
Bump IREE requirement pins to 3.10.0rc20251205 by @shark-pr-automator[bot] in #2714
[Tuner] Polish process util functions by @RattataKing in #2718
[Tuner] Use python library shlex to convert cmd str by @RattataKing in #2719
[Tuner] Resolve the pytest warnings caused by new added process utils and expand test content by @RattataKing in #2720
Fix Migration Issues by @yash-amd in #2722
Change Sync Resource Path by @yash-amd in #2723
Bump IREE requirement pins to 3.10.0rc20251210 by @shark-pr-automator[bot] in #2724
Clear engineer from IREE bump duty rotation by @KyleHerndon in #2728
[tuner] add padding_conv attribute along IGEMM supprot for conv by @bangtianliu in #2701
[Tuner] Modify CI pytest cmd to convert warnings to errors by @RattataKing in #2721
[CI] Fix irpa and inputs path issue by @yash-amd in #2729
Fix Input Issue For Benchmarking Tests by @yash-amd in #2731
AMD-Shark migration by @pdhirajkumarprasad in #2730
Add gpt oss in E2E Tests by @yash-amd in #2732

New Contributors

@a-sidorova made their first contribution in #2545
@prosenjitdhole made their first contribution in #2638
@raayandhar made their first contribution in #2624

Full Changelog: v3.8.1...v3.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.9.0

Choose a tag to compare

Sorry, something went wrong.