Skip to content

Conversation

@yiakwy-xpu-ml-framework-team
Copy link
Contributor

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Sep 5, 2025

Pull Request Template

Description

fix version info

截屏2025-09-05 17 28 45

update pyproject.toml

pyproject.toml should add building tool and torch info otherwise an exception will be thrown

Type of Change

Please check the relevant option(s):

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance optimization
  • CUDA kernel improvement
  • Code refactoring

Related Issues

Please link any related issues:

  • Fixes #(issue number)
  • Related to #(issue number)

Changes Made

Please describe the changes you made:

Code Changes

  • Modified Python API
  • Updated CUDA kernels
  • Changed build system
  • Updated dependencies

Documentation

  • Updated README
  • Updated API documentation
  • Added examples
  • Updated benchmarks

Testing

Please describe the tests you ran to verify your changes:

  • Existing tests pass: python -m pytest tests/ -v
  • Added new tests for new functionality
  • Benchmarks show no performance regression
  • Tested on multiple GPU architectures (if applicable)

Test Configuration

  • OS: ubuntu 22.04
  • Python: 3.12
  • PyTorch: 2.8
  • CUDA: 12.8
  • GPU: H800

Performance Impact

If this change affects performance, please provide benchmarks:

Before

# Benchmark results before your changes

After

# Benchmark results after your changes

Breaking Changes

If this PR introduces breaking changes, please describe:

  • What breaks
  • How users can migrate their code
  • Why the breaking change is necessary

Checklist

Please check all that apply:

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

CUDA-specific (if applicable)

  • CUDA kernels compile without warnings
  • Tested on SM 8.0+ architectures
  • Tested on SM 9.0+ architectures
  • Memory usage has been profiled
  • No memory leaks detected

Additional Notes

Any additional information that reviewers should know:

Facilitate tests in Hopper 95% accuracy, but failed to pass test:

final results

test command

python benchmarks/forward_equivalence.py 2>&1 | tee accuracy.log

Screenshots (if applicable)

If your changes include visual elements or performance improvements, please add screenshots or graphs.

facilitate 95% accuracy pass

@yiakwy-xpu-ml-framework-team
Copy link
Contributor Author

@LoserCheems could you have a look at it ? B.t.w accuracy test in hopper platform failed.

@LoserCheems
Copy link
Collaborator

I'm very sorry @yiakwy-xpu-ml-framework-team, this was my mistake. I wrongly wrote ; instead of ,.😵

@LoserCheems
Copy link
Collaborator

Let's merge

@LoserCheems LoserCheems merged commit c432cc0 into flash-algo:main Sep 5, 2025
@LoserCheems LoserCheems mentioned this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants