Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid save/restoring AMX registers to avoid a SPR erratum #15168

Merged
merged 1 commit into from Aug 26, 2023

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented Aug 12, 2023

Motivation and Context

#14989

Description

Intel SPR erratum SPR4 says that if you trip into a vmexit while doing FPU save/restore, your AMX register state might misbehave... and by misbehave, I mean (apparently) save all zeroes incorrectly, leading to explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid this is to just not save/restore those when we do anything, since we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any way to mitigate this, on Linux...but I am not an expert.

How Has This Been Tested?

Reporter of #14989 has been running the same workaround for weeks without breaking.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Fixes: openzfs#14989

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Aug 26, 2023
@behlendorf behlendorf merged commit 277f2e5 into openzfs:master Aug 26, 2023
19 checks passed
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 26, 2023
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#14989
Closes openzfs#15168
behlendorf pushed a commit that referenced this pull request Aug 27, 2023
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14989
Closes #15168
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#14989
Closes openzfs#15168
rincebrain added a commit to rincebrain/zfs that referenced this pull request Mar 6, 2024
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#14989
Closes openzfs#15168

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
behlendorf pushed a commit that referenced this pull request Apr 3, 2024
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14989
Closes #15168

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants