Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency com_github_nvidia_cutlass to v3.5.0 #649

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Apr 12, 2024

Mend Renovate

This PR contains the following updates:

Package Type Update Change
com_github_nvidia_cutlass http_archive minor v3.4.1 -> v3.5.0

Release Notes

NVIDIA/cutlass (com_github_nvidia_cutlass)

v3.5.0: CUTLASS 3.5.0

Compare Source

  • Implicit GEMM Convolutions targeting Hopper SM90A via WGMMA + TMA im2col.
    • Native implementation in CUTLASS 3.x using CuTe, mirroring the same design hierarchy as that of GEMMs.
    • Support for 1D, 2D, and 3D convolutions in a rank-agnostic fashion.
    • Support for Fprop, Dgrad, and Wgrad algorithms.
    • CUTLASS profiler support for 2D and 3D convolutions implemented via the 3.x API.
    • NOTE: this is a beta release. Further updates to CUTLASS will include major performance improvements, feature enablement, and possible breaking changes to the API until 3.7 release. Your feedback is welcome on the design!
  • Support for Ada (SM89) FP8 tensor cores via the 2.x API. Requires CUDA 12.4 or newer.
  • Ampere gather/scatter convolution example in CuTe and CUTLASS 3.x.
    • Showcasing how custom kernels can be written and optimized using CUTLASS 3.x and CuTe and the general strategy for implementing convolutions as specializations of GETTs.
    • Implementation of a coarse grained sparse gather/scatter kernel achieving peak performance on Ampere class tensor cores.
  • 32x and 16x tile sizes are added to CUTLASS 2.x to improve the performance of narrow-tall and wide-short matrices.
  • Updates to CuTe documentation for cute::Tensor<>, MMA atoms, and an overhauled CuTe GEMM tutorial series.
  • Extensions to CuTe to support L2 prefetching and TMA store+reductions.
  • Remove C++11 requirement on a few CUTLASS 2.x API header files. All CUTLASS files now require C++17.
  • Fixes to greatly reduce build warnings.
  • Updates and bugfixes from the community (thanks!)

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot requested a review from a team April 12, 2024 04:10
@anakinxc anakinxc merged commit b121511 into main Apr 15, 2024
11 of 13 checks passed
@anakinxc anakinxc deleted the renovate/com_github_nvidia_cutlass-3.x branch April 15, 2024 02:36
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant