Skip to content

Conversation

@LeiWang1999
Copy link
Contributor

Introduce efficient (but not the perfect) matmul schedule for int8 simt schedule for T4 Cards.

This pull request includes several changes to the bitblas/gpu/matmul.py file, introducing new scheduling rules and optimizations for GPU operators, as well as updates to the bitblas/gpu/matmul_analysis.py and integration/BitNet/eval_correctness.py files. The main changes involve the addition of a new scheduling method for dequantization, type and import adjustments, and version checks for compatibility.

New Scheduling Method:

  • Added sch_dequantize_in_register_with_config method to handle dequantization scheduling without shared memory prefetch for devices lacking async copy. (bitblas/gpu/matmul.py)

Type and Import Adjustments:

  • Updated imports to include List and suppress, and added get_coalesced_veclen from ..base.analysis. (bitblas/gpu/matmul.py)
  • Added _collect_producers to the list of imports from matmul_analysis. (bitblas/gpu/matmul.py)

Configuration and Typo Fixes:

  • Fixed a typo in the calculation of thread_row_tiles by using config.thread[0] instead of config.thread[1]. (bitblas/gpu/matmul.py)
  • Added a check for dequantize_info in the apply_config method to call the new dequantization schedule if present. (bitblas/gpu/matmul.py)

Analysis Updates:

  • Modified analysis_tensorcore_tags to return a Union of bool and Dict and added a check for Tensor Core support based on the SM version. (bitblas/gpu/matmul_analysis.py) [1] [2]
  • Added a threshold for minimal tensorize based on tensor core bit width. (bitblas/gpu/matmul_analysis.py)

Version Check:

  • Added a version check for the transformers library to ensure compatibility, asserting the version is <= 4.40.0. (integration/BitNet/eval_correctness.py)

@LeiWang1999 LeiWang1999 merged commit 4106902 into microsoft:main Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant