AMD-SHARK-AI Release v3.9.0
Important Note
The project repositories have been renamed to reflect AMD branding:
shark-ai → amd-shark-ai
sharktank → amd-sharktank
sharktuner → amd-sharktuner
Please update your repository references, import statements, and dependencies accordingly.
Breaking Changes: None expected.
Sharktuner
- BOO Tuner has been successfully integrated into the project. This addition enables automated performance tuning and optimization of AI workloads (#2433)
- Completed Python Binding Architecture for Transform Dialect Spec Construction (#2543, #2584, #2596)
Change Log
Git History
What's Changed
- [Sharktank] Remove bias and beta from fp4 gemm asm kernel by @jinchen62 in #2513
- [FusilliPlugin] TheRock RFC revisions by @sjain-stanford in #2546
- [Tuner] Add
KnobAssignmentto track z3 feature vars in candidate generation by @RattataKing in #2434 - [Sharktank] Add missing methods for InferenceTensor by @Alex-Vasile in #2547
- [Sharktank] Replace torch.cat usages with ops.cat by @Alex-Vasile in #2551
- [Sharktank] Add ops.abs and ops.log by @Alex-Vasile in #2554
- [tuner] use python binding to build td specs for contraction by @bangtianliu in #2543
- [Sharktank] Add ops.zeros, ops.ones, ops.ones_like by @Alex-Vasile in #2549
- [Sharktank] Use .to() instead of .bool(), .float(), or .int(). Remove .bool() from InferenceTensors by @Alex-Vasile in #2550
- [Fusilli] Added arithmetic pointwise ops - Mul, Sub, Div by @a-sidorova in #2545
- Bump version to 3.9.0 after 3.8.0 release. by @sa-faizal in #2509
- [Fusilli] Create FUSILLI_REQUIRE_OK macro by @AaronStGeorge in #2556
- [FusilliPlugin] Cleanup draft RFC by @sjain-stanford in #2558
- [FusilliPlugin]
FetchContenthipDNNdependency by @AaronStGeorge in #2563 - [sharktank] Refactor mask generation to utils from ops by @archana-ramalingam in #2496
- [Fusilli] Replace remaining REQUIRE(isOk(...)) uses with new macro in recently added sample by @sjain-stanford in #2566
- [Fusilli] Split pointwise tests into two: unary and binary ops by @a-sidorova in #2569
- [sharktank] Refactor and cleanup unused imports in sharktank by @archana-ramalingam in #2561
- [sharktank] Remove chunked prefill test by @archana-ramalingam in #2562
- [CI] Remove Continue On Error In E2E Tests by @yash-amd in #2530
- [Sharktank] Add ops.full by @Alex-Vasile in #2565
- [Sharktank] Update Rotary Embedding and tests to use ops instead of torch by @Alex-Vasile in #2559
- [Sharktank] Add ops.where by @Alex-Vasile in #2567
- [Sharktank] Add ops.tensor by @Alex-Vasile in #2568
- [Sharktank] Add ops.logical_or by @Alex-Vasile in #2570
- [Sharktank] Use ops implementation instead of torch where important by @Alex-Vasile in #2573
- [Fusilli] Add support for ConvWGrad by @IanWood1 in #2542
- [Fusilli] Use allocateBufferOfType helper by @IanWood1 in #2580
- [Fusilli] Pick compiler fixes and change dispatch count checks for conv3d by @sjain-stanford in #2560
- [tuner] use python binding to build td specs for convolution by @bangtianliu in #2584
- [sharktank] adding weight generation in util for gpt-oss toy model by @oyazdanb in #2574
- [Fusilli,FusilliPlugin] Async execution & device selection by @AaronStGeorge in #2487
- [sharktank] gpt-oss_theta_block by @oyazdanb in #2583
- [Fusilli] Check for consistency in ConvWGradNode's input/output shapes by @IanWood1 in #2585
- [Fusilli] Added Grouped Conv support by @a-sidorova in #2581
- [CI] Update Gold Time For Benchmark by @yash-amd in #2588
- [Fusilli] [NFC] Claude suggested fixes by @sjain-stanford in #2593
- [Fusilli] Add ConvDGrad by @IanWood1 in #2582
- Upload artifacts for codecov by @zeeshanhaque21 in #2598
- Update tokenizers and transformers package versions by @zeeshanhaque21 in #2603
- [CI] Gpt-oss e2e in CI by @oyazdanb in #2591
- [tuner] use python binding to build td specs for attention by @bangtianliu in #2596
- Simplify dispatch and testing logic by redispatching after unpacking tensors by @KyleHerndon in #2464
- [Fusilli] Add .clang-tidy by @IanWood1 in #2595
- [CI] Match Exact Flags Used For Compile As By Used Iree by @yash-amd in #2602
- [CI] Fix For Llama 405B CI by @yash-amd in #2610
- [Fusilli] [NFC] Consistent benchmark name by @sjain-stanford in #2611
- [Fusilli] Disable
iree-llvmcpu-target-cpuflag to mitigate ILLEGAL instruction errors on CPU runners by @sjain-stanford in #2612 - [tuner] support phase-aware candidate pruning by @bangtianliu in #2592
- [Fusilli] Add clang-tidy checks to CI by @IanWood1 in #2607
- [Fusilli] Use clang-tidy (20.0.0) from PATH by @sjain-stanford in #2615
- [tuner] set prefetch shared memory option based on layout matching for attention ops by @bangtianliu in #2613
- [Fusilli] Header cleanup based on misc-include-cleaner from VSCode/Cursor clang-tidy extension by @sjain-stanford in #2617
- [tuner][nfc]: merge imports and update the readme by @bangtianliu in #2621
- [Fusilli] Added Grouped ConvDGrad support by @a-sidorova in #2600
- [Fusilli] Added Grouped ConvWGrad support by @a-sidorova in #2599
- [Fusilli] Re-enable a few of the previously skipped clang-tidy checks by @sjain-stanford in #2618
- [Fusilli] Pick IREE fixes for ConvDGrad dispatch count and Illegal instruction flakes by @sjain-stanford in #2625
- Fix for TOM CI by @yash-amd in #2622
- [Fusilli] Run sharkfuser workflow on all PRs by @IanWood1 in #2634
- Ignore boo driver statsu code for iree time out issues by @prosenjitdhole in #2638
- [Fusilli] Fix issue link on lit test, switch UNSUPPORTED -> XFAIL, minor nits by @sjain-stanford in #2639
- [Fusilli] Add support for ConvWGrad and ConvDGrad to benchmark driver by @raayandhar in #2624
- [Fusilli] Set cache dir for benchmark tests as well by @sjain-stanford in #2642
- [tuner] Create dedicated IREE requirements file for sharktuner by @bangtianliu in #2641
- [tuner] exits gracefully for unsupported cases like mat-vec operations by @bangtianliu in #2627
- [Fusilli] Add matmul attributes 1/4 by @IanWood1 in #2631
- [Tuner] Support candidate reordering by @RattataKing in #2555
- [Fusilli] Added lit and sample tests for grouped strided ConvWGrad by @a-sidorova in #2629
- Update ci_boo_conv.yml by @pdhirajkumarprasad in #2647
- [Fusilli] Cleanup LINALG-CHECK from certain lit tests by @sjain-stanford in #2646
- Add 405b pp2 and pp8 To E2E Tests by @yash-amd in #2645
- [Fusilli] batch profiling of commands to automate data collection by @raayandhar in #2590
- [CI] Add 405b CI To E2E workflow by @yash-amd in #2571
- [Fusilli] Benchmark runner improvements by @sjain-stanford in #2650
- Fix Version Issue For The Logs by @yash-amd in #2651
- Bump IREE requirement pins to 3.9.0rc20251105 by @shark-pr-automator[bot] in #2498
- [sharktank] Change LLM scheduler to allow for more broad uses of chunked prefill by @sogartar in #2506
- [sharktank] Materialize indices in PrefillTask to ease branching logic in downstream code by @sogartar in #2507
- [tuner][nfc] clean up the code by @bangtianliu in #2656
- [sharktank] Make LLM utils support chunked prefill for included Llama 8b eager quick evaluation tests by @sogartar in #2499
- [Tuner] Replace hardcoded SIMD=4 with value from IREE target info by @RattataKing in #2649
- [Fusilli] Generalize physical tensor layout support by @IanWood1 in #2653
- [Fusilli] Bump docker & re-enable 3d grouped conv test by @sjain-stanford in #2664
- [sharktuner] Add BOO tuner by @rkayaith in #2433
- [Fusilli] Add additional checks to Buffer::allocate by @IanWood1 in #2665
- [boo tuner] Switch to newer BOO backend by @rkayaith in #2670
- [boo tuner] Reset compilation and allocation caches for each config by @rkayaith in #2671
- Bump IREE requirement pins to 3.9.0rc20251112 by @shark-pr-automator[bot] in #2658
- [Tuner] Expand tuning candidate logging feature by @RattataKing in #2666
- [Tuner] Add and fix codecov test in CI by @RattataKing in #2672
- migration to amdshark by @pdhirajkumarprasad in #2677
- [boo tuner] Fix error when tuning a config multiple times by @rkayaith in #2681
- [Fusilli] Remove Fusilli by @AaronStGeorge in #2682
- [tuner] use igemm bindings by @bangtianliu in #2683
- Correct Repository, Token Names by @yash-amd in #2685
- Fix for migration by @pdhirajkumarprasad in #2690
- Update ci-amdsharktank.yml by @pdhirajkumarprasad in #2695
- Migration to AMD-Shark by @pdhirajkumarprasad in #2694
- Fix for AMD-Shark migration by @pdhirajkumarprasad in #2699
- Bump IREE requirement pins to 3.10.0rc20251130 by @shark-pr-automator[bot] in #2703
- fixes HuggingFace repo name from amd-amdshark to amd-shark by @amd-vivekag in #2707
- [tuner] update the calculation of shared memory usage by @bangtianliu in #2698
- Bump IREE requirement pins to 3.10.0rc20251201 by @shark-pr-automator[bot] in #2705
- updates docs for SDXL by @amd-vivekag in #2708
- [tuner] Sync Padding for TileAndFuse with IREE changes by @bangtianliu in #2692
- Update IRPA files for Llm e2e tests by @vivekkhandelwal1 in #2712
- Enable Llama 8b Perplexity Test by @vivekkhandelwal1 in #2710
- Bump IREE requirement pins to 3.10.0rc20251203 by @shark-pr-automator[bot] in #2709
- [tuner]: sync the change of using prefetch_num_stages to replace prefetch_shared_memory by @bangtianliu in #2713
- Fix CI Irpa Issues by @yash-amd in #2716
- Bump IREE requirement pins to 3.10.0rc20251205 by @shark-pr-automator[bot] in #2714
- [Tuner] Polish process util functions by @RattataKing in #2718
- [Tuner] Use python library shlex to convert cmd str by @RattataKing in #2719
- [Tuner] Resolve the pytest warnings caused by new added process utils and expand test content by @RattataKing in #2720
- Fix Migration Issues by @yash-amd in #2722
- Change Sync Resource Path by @yash-amd in #2723
- Bump IREE requirement pins to 3.10.0rc20251210 by @shark-pr-automator[bot] in #2724
- Clear engineer from IREE bump duty rotation by @KyleHerndon in #2728
- [tuner] add padding_conv attribute along IGEMM supprot for conv by @bangtianliu in #2701
- [Tuner] Modify CI pytest cmd to convert warnings to errors by @RattataKing in #2721
- [CI] Fix irpa and inputs path issue by @yash-amd in #2729
- Fix Input Issue For Benchmarking Tests by @yash-amd in #2731
- AMD-Shark migration by @pdhirajkumarprasad in #2730
- Add gpt oss in E2E Tests by @yash-amd in #2732
New Contributors
- @a-sidorova made their first contribution in #2545
- @prosenjitdhole made their first contribution in #2638
- @raayandhar made their first contribution in #2624
Full Changelog: v3.8.1...v3.9.0