Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for MPS regression in #122016 and #123178 #123234

Closed
wants to merge 5 commits into from

Conversation

jhavukainen
Copy link
Collaborator

@jhavukainen jhavukainen commented Apr 3, 2024

Fixes #122016 and #123178. This regression is related to an OS side change that requires a slight adjustment from us on PyTorch side to restore the previous behavior. Additionally we cleared out pre-MacOS13 related workarounds.

Before the fix on MacOS 14.4:

python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 3., 3.], device='mps:0')

After the fix:

python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 1., 3.], device='mps:0')

This also fixes complex number initialization and as such makes nn.functional.rms_norm pass on MacOS-14+

Copy link

pytorch-bot bot commented Apr 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123234

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a3a2acf with merge base 74b3a79 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: mps Release notes category label Apr 3, 2024
@jhavukainen
Copy link
Collaborator Author

@kulinseth @DenisVieriu97

@malfet malfet added the ciflow/mps Run MPS tests (subset of trunk) label Apr 3, 2024
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it passes CI, then looks good to me

@malfet
Copy link
Contributor

malfet commented Apr 3, 2024

@pytorchbot rebase

@malfet
Copy link
Contributor

malfet commented Apr 3, 2024

(Lint failures are likely due to a broken base, but let's rebase to be on the safe ground)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased dev/joona/Issue122016 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dev/joona/Issue122016 && git pull --rebase)

@malfet
Copy link
Contributor

malfet commented Apr 3, 2024

@pytorchbot r -f "Lint and MPS tests are green"

Copy link

pytorch-bot bot commented Apr 3, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'r' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@malfet
Copy link
Contributor

malfet commented Apr 3, 2024

@pytorchbot merge -f "Lint and MPS tests are green"

@malfet
Copy link
Contributor

malfet commented Apr 3, 2024

@pytorchbot merge -f "Lint and MPS tests are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Apr 4, 2024

@jhavukainen have you run any benchmarks to measure perf implication of this change? I.e. it used to be a purely async operation but now it will pause as scalar needs to be copied to MPS before it can continue, wouldn't it?

@malfet
Copy link
Contributor

malfet commented Apr 4, 2024

@pytorchbot cherry-pick --onto release/2.3 --fixes foo -c criticalfix

Copy link

pytorch-bot bot commented Apr 4, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: argument -c/--classification: invalid choice: 'criticalfix' (choose from 'regression', 'critical', 'fixnewfeature', 'docs', 'release')

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

@malfet
Copy link
Contributor

malfet commented Apr 4, 2024

@pytorchbot cherry-pick --onto release/2.3 --fixes foo -c critical

pytorchbot pushed a commit that referenced this pull request Apr 4, 2024
Fixes #122016 and #123178. This regression is related to an OS side change that requires a slight adjustment from us on PyTorch side to restore the previous behavior. Additionally we cleared out pre-MacOS13 related workarounds.

Before the fix on MacOS 14.4:

```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 3., 3.], device='mps:0')
```

After the fix:
```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 1., 3.], device='mps:0')
```

This also fixes complex number initialization and as such makes `nn.functional.rms_norm` pass on MacOS-14+

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #123234
Approved by: https://github.com/malfet, https://github.com/kulinseth

(cherry picked from commit 05289a2)
@pytorchbot
Copy link
Collaborator

Cherry picking #123234

The cherry pick PR is at #123385 and it is linked with issue foo

Details for Dev Infra team Raised by workflow job

pytorchmergebot pushed a commit that referenced this pull request Apr 5, 2024
Fixes #122016 and #123178. This regression is related to an OS side change that requires a slight adjustment from us on PyTorch side to restore the previous behavior. Additionally we cleared out pre-MacOS13 related workarounds.

Before the fix on MacOS 14.4:

```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 3., 3.], device='mps:0')
```

After the fix:
```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 1., 3.], device='mps:0')
```

This also fixes complex number initialization and as such makes `nn.functional.rms_norm` pass on MacOS-14+

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #123234
Approved by: https://github.com/malfet, https://github.com/kulinseth

(cherry picked from commit 05289a2)
atalman pushed a commit that referenced this pull request Apr 5, 2024
Fixes #122016 and #123178. This regression is related to an OS side change that requires a slight adjustment from us on PyTorch side to restore the previous behavior. Additionally we cleared out pre-MacOS13 related workarounds.

Before the fix on MacOS 14.4:

```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 3., 3.], device='mps:0')
```

After the fix:
```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 1., 3.], device='mps:0')
```

This also fixes complex number initialization and as such makes `nn.functional.rms_norm` pass on MacOS-14+

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #123234
Approved by: https://github.com/malfet, https://github.com/kulinseth

(cherry picked from commit 05289a2)

Co-authored-by: Joona Havukainen <jhavukainen@apple.com>
malfet added a commit to pytorch/builder that referenced this pull request Apr 5, 2024
I.e. if issue are mention in fixed PR as `#\d+` script should still look
for the cherry pick in the branch

Before that change pytorch/pytorch#123234 were missed in release/2.3 branch
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
…123234)

Fixes pytorch#122016 and pytorch#123178. This regression is related to an OS side change that requires a slight adjustment from us on PyTorch side to restore the previous behavior. Additionally we cleared out pre-MacOS13 related workarounds.

Before the fix on MacOS 14.4:

```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 3., 3.], device='mps:0')
```

After the fix:
```
python -c "import torch;x=torch.zeros(3, device='mps');x[1] = 1; x[2] = 3; print(x)"
tensor([0., 1., 3.], device='mps:0')
```

This also fixes complex number initialization and as such makes `nn.functional.rms_norm` pass on MacOS-14+

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: pytorch#123234
Approved by: https://github.com/malfet, https://github.com/kulinseth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) Merged open source release notes: mps Release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MPS] Regression from macOS 14.3 to 14.4 in PyTorch 2.2.0/2.2.1
5 participants