Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISABLED test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS) #111449

Closed
huydhn opened this issue Oct 18, 2023 · 3 comments
Closed

DISABLED test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS) #111449

huydhn opened this issue Oct 18, 2023 · 3 comments
Assignees
Labels
module: macos Mac OS related issues module: mps Related to Apple Metal Performance Shaders framework skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@huydhn
Copy link
Contributor

huydhn commented Oct 18, 2023

Platforms: mac, macos

This test was disabled because it is failing on main branch (recent examples).

This test has been failing in MacOS x86 for a while https://hud.pytorch.org/pytorch/pytorch/commit/973c87b320b5e7489f18b719d5b1c57a2051ae10. The error is:

RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

@huydhn huydhn added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels Oct 18, 2023
@pytorch-bot pytorch-bot bot added the skipped Denotes a (flaky) test currently skipped in CI. label Oct 18, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2023

Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below:
  • Test name: test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS)
  • Platforms for which to skip the test: mac, macos
  • Disabled by huydhn

Within ~15 minutes, test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS) will be disabled in PyTorch CI for these platforms: mac, macos. Please verify that your test name looks correct, e.g., test_cuda_assert_async (__main__.TestCuda).

To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified.

Platforms: case-insensitive, list, of, platforms

We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows.

1 similar comment
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2023

Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below:
  • Test name: test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS)
  • Platforms for which to skip the test: mac, macos
  • Disabled by huydhn

Within ~15 minutes, test_check_inplace_nn_CELU_mps_float32 (__main__.TestModuleMPS) will be disabled in PyTorch CI for these platforms: mac, macos. Please verify that your test name looks correct, e.g., test_cuda_assert_async (__main__.TestCuda).

To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified.

Platforms: case-insensitive, list, of, platforms

We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows.

@pytorch-bot pytorch-bot bot added module: macos Mac OS related issues skipped Denotes a (flaky) test currently skipped in CI. and removed skipped Denotes a (flaky) test currently skipped in CI. labels Oct 18, 2023
@malfet
Copy link
Contributor

malfet commented Oct 18, 2023

Hmm, this is weird, we should not be running any MPS tests on GitHub Actions runners as they do not have an access to MPS hardware....

@malfet malfet self-assigned this Oct 18, 2023
malfet added a commit that referenced this issue Oct 19, 2023
Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18
```
Found device Apple Paravirtual device isLowPower false supports Metal false
```

As first attempt to allocate memory on such device will fail with:
```
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
```

Fixes #111449
huydhn pushed a commit that referenced this issue Oct 27, 2023
Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18
```
Found device Apple Paravirtual device isLowPower false supports Metal false
```

As first attempt to allocate memory on such device will fail with:
```
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
```

Fixes #111449

Pull Request resolved: #111576
Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/huydhn
huydhn added a commit that referenced this issue Oct 28, 2023
* check in (#111875)

check in impl

address comments, skip test on rocm

unused

* [MPS] Skip virtualized devices (#111576)

Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18
```
Found device Apple Paravirtual device isLowPower false supports Metal false
```

As first attempt to allocate memory on such device will fail with:
```
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
```

Fixes #111449

Pull Request resolved: #111576
Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/huydhn

* Revert "check in (#111875)"

This reverts commit 2f502cc.

---------

Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
xuhancn pushed a commit to xuhancn/pytorch that referenced this issue Nov 7, 2023
Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18
```
Found device Apple Paravirtual device isLowPower false supports Metal false
```

As first attempt to allocate memory on such device will fail with:
```
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
```

Fixes pytorch#111449

Pull Request resolved: pytorch#111576
Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/huydhn
Halmoni100 pushed a commit to Halmoni100/pytorch that referenced this issue Nov 25, 2023
* check in (pytorch#111875)

check in impl

address comments, skip test on rocm

unused

* [MPS] Skip virtualized devices (pytorch#111576)

Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18
```
Found device Apple Paravirtual device isLowPower false supports Metal false
```

As first attempt to allocate memory on such device will fail with:
```
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
```

Fixes pytorch#111449

Pull Request resolved: pytorch#111576
Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/huydhn

* Revert "check in (pytorch#111875)"

This reverts commit 2f502cc.

---------

Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
janosh added a commit to CederGroupHub/chgnet that referenced this issue Feb 28, 2024
janosh added a commit to CederGroupHub/chgnet that referenced this issue Feb 28, 2024
…E` env var (#131)

* use MPS backend if available and use_device is None

(prev would default to CPU in that case)
also fix type errors

* revert torch.det for volume to torch.dot and torch.cross (which have MPS support)

* try PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in CI to avoid OOM error

E   RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 512 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

* add support for CHGNET_DEVICE environment variable

* test CHGNET_DEVICE in test_model_load_version_params()

update deprecated ruff lint config

* set CHGNET_DEVICE=cpu in test.yml since no MPS hardware available even on macos-14 runners

pytorch/pytorch#111449 (comment)

* fix setting CHGNET_DEVICE env var on windows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: macos Mac OS related issues module: mps Related to Apple Metal Performance Shaders framework skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants