Skip to content

Tune MP GEMM kernel#30

Merged
jmanning-stackav merged 1 commit into
mainfrom
feature/jmanning/cleanup-mp-gemm
Jun 18, 2025
Merged

Tune MP GEMM kernel#30
jmanning-stackav merged 1 commit into
mainfrom
feature/jmanning/cleanup-mp-gemm

Conversation

@jmanning-stackav
Copy link
Copy Markdown
Collaborator

Description

This PR improves the tuning of the mixed-precision GEMM kernel on A10/H100/MI300X.

Testing

Please select all that apply.

  • Existing unit tests
  • Unit tests added by this PR
  • Other (please explain)
  • This PR is not tested

Test instructions

pytest tests/mixed_precision_gemm_test.py

Performance

python benchmarks/mixed_precision_gemm_benchmark.py

A10

Before

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=1072, min=10.466 ms, max=13.039 ms, mean=10.898 ms, median=10.880 ms

After

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=1255, min=8.296 ms, max=10.368 ms, mean=8.770 ms, median=8.748 ms

H100

Before

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=5626, min=1.532 ms, max=1.753 ms, mean=1.649 ms, median=1.650 ms

After

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=6524, min=1.355 ms, max=1.437 ms, mean=1.396 ms, median=1.399 ms

MI300X

Before

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=7538, min=1.136 ms, max=13.295 ms, mean=1.186 ms, median=1.181 ms

After

Results matched :)
Parameters: {'m_dim': 4096, 'k_dim': 8192, 'n_dim': 4096, 'input_dtype': 'fp16', 'weight_dtype': 'uint4b8'}
Conch: num_iterations=8593, min=0.976 ms, max=12.882 ms, mean=1.032 ms, median=1.028 ms

Platforms

Please select all hardware platforms that this PR was tested on.

  • Nvidia GPU
  • AMD GPU
  • Other (please explain)

@jmanning-stackav jmanning-stackav merged commit 9e0d798 into main Jun 18, 2025
@jmanning-stackav jmanning-stackav deleted the feature/jmanning/cleanup-mp-gemm branch June 18, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant