Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial 09-experimental-block-pointer with enabled block pointer fails with LTS driver (803.45) #1159

Closed
pbchekin opened this issue May 20, 2024 · 2 comments · Fixed by #1296
Assignees
Labels
bug Something isn't working lts tests: tutorials

Comments

@pbchekin
Copy link
Contributor

pbchekin commented May 20, 2024

cd python/tutorials
export TRITON_INTEL_ENABLE_BLOCK_PTR=1
python 09-experimental-block-pointer.py
triton_output=tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [        nan, -1.7147e+38,  2.8685e-42,  ..., -4.2444e+01,
❌ Triton and Torch differ
         -5.5285e+01, -3.6599e+01],
        [        nan, -1.7147e+38,  2.8685e-42,  ...,  4.4992e+00,
         -3.3115e+01, -5.1707e+00],
        ...,
        [ 3.8032e-01, -3.9251e-07, -1.9164e-01,  ...,  1.0298e-02,
          2.1799e-01,  1.3573e-05],
        [-1.4565e-04,  6.1115e-08, -1.3391e-06,  ...,  2.8589e-02,
          5.9731e-03,  1.8974e-03],
        [-2.1044e-06, -5.5068e-09, -1.7565e-04,  ..., -2.9086e-16,
         -6.5389e-10, -1.5254e-06]], device='xpu:0')
torch_output=tensor([[ 12.5781,  14.3203,  -2.5781,  ...,   1.4473,  13.2656, -21.2656],
        [ 17.9688, -57.6562,   9.9688,  ...,  -6.9219,  20.2656,  24.0781],
        [-29.9844, -11.1094,  -9.3438,  ..., -42.4375, -55.2812, -36.5938],
        ...,
        [ -9.6406,  22.7344,   5.8633,  ...,   7.1055,  22.2656,  11.6406],
        [-24.3125,   2.1738, -17.9844,  ..., -17.2344,  15.8203,  -7.0195],
        [ 18.5156, -15.4766, -24.4062,  ...,   7.1484, -20.0156, -10.5859]],
       device='xpu:0')

Workflow run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9164197371/job/25194970748#step:17:232

@pbchekin
Copy link
Contributor Author

pbchekin commented May 23, 2024

Potentially related to the Triton cache. With Agama 881.12 the following causes segmentation fault:

rm -rf ~/.triton/cache
python 09-experimental-block-pointer.py
TRITON_INTEL_ENABLE_BLOCK_PTR=1 python 09-experimental-block-pointer.py

Update: this was a different issue fixed by #1183.

@pbchekin
Copy link
Contributor Author

Also fails with the latest LTS driver 803.58.

etiotto added a commit that referenced this issue Jun 7, 2024
Signed-off-by: Ettore Tiotto <ettore.tiotto@intel.com>
@etiotto etiotto linked a pull request Jun 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lts tests: tutorials
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants