-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Handle transient ENOMEM in statx probing #151641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
r? @ibraheemdev rustbot has assigned @ibraheemdev. Use |
54e20c5 to
d9404ad
Compare
| // | ||
| // See: https://github.com/rust-lang/rust/issues/65662 | ||
| // | ||
| // FIXME what about transient conditions like `ENOMEM`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure there are no other transient coditions (other than ENOMEM) that we should worry about here?
As the comment says "like" which indicates to me as an uninformed reader, that there may be more transient conditions to consider other than ENOMEM.
If so, removing this comment is not correct IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya true, deleting the comment was premature...
One concern I have is that caching STATX_STATE::Unavailable feels risky if the failure is transient, since that would effectively disable statx for the lifetime of the process.
- Should Unavailable be cached only for ENOSYS, or also cases like EPERM (seccomp)?
- For other errors that aren’t clearly “missing” or transient, is it better to avoid caching and just fall back to legacy stat for that call?
- Any other transient errors you’d want handled explicitly?
If this needs a bit more discussion, I could open a GitHub issue first, please suggest, thanks mate! @terrarier2111 @ibraheemdev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to discuss here since this is where we would see the changes anyway.
One concern I have is that caching STATX_STATE::Unavailable feels risky if the failure is transient, since that would effectively disable statx for the lifetime of the process.
There is nothing "risky" about using the stat fallback.
- For other errors that aren’t clearly “missing” or transient, is it better to avoid caching and just fall back to legacy stat for that call?
- Any other transient errors you’d want handled explicitly?
It's your PR, could you propose a reasonable solution here? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the worst thing to allow a retry on ENOMEM, but this code changes more than just the ENOMEM handling.
As an alternative we could keep a retry counter and fall back if it fails a number of times, but things are going rather poorly if you get a repeated kernel OOM from an empty statx. I don't really think it's worth it.
@tbu- it looks like you've touched this code most recently, do you have any thoughts?
| match cvt(statx( | ||
| 0, | ||
| ptr::null(), | ||
| 0, | ||
| libc::STATX_BASIC_STATS | libc::STATX_BTIME, | ||
| ptr::null_mut(), | ||
| )) | ||
| .err() | ||
| .and_then(|e| e.raw_os_error()) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the err2 binding and then match on that, it reads significantly better
| // See: https://github.com/rust-lang/rust/issues/65662 | ||
| // | ||
| // FIXME what about transient conditions like `ENOMEM`? | ||
| let err2 = cvt(statx(0, ptr::null(), 0, libc::STATX_BASIC_STATS | libc::STATX_BTIME, ptr::null_mut())) | ||
| .err() | ||
| .and_then(|e| e.raw_os_error()); | ||
| if err2 == Some(libc::EFAULT) { | ||
| STATX_SAVED_STATE.store(STATX_STATE::Present as u8, Ordering::Relaxed); | ||
| return Some(Err(err)); | ||
| } else { | ||
| STATX_SAVED_STATE.store(STATX_STATE::Unavailable as u8, Ordering::Relaxed); | ||
| return None; | ||
|
|
||
|
|
||
| match cvt(statx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up this spacing; the comment was previously attached to the the next line of code. Now there is an empty // line at the end of the comment block and two newlines before the line of code.
| { | ||
| Some(libc::EFAULT) => { | ||
| STATX_SAVED_STATE.store(STATX_STATE::Present as u8, Ordering::Relaxed); | ||
| return Some(Err(err)); | ||
| } | ||
| Some(libc::ENOSYS | libc::EPERM | libc::EACCES) => { | ||
| STATX_SAVED_STATE.store(STATX_STATE::Unavailable as u8, Ordering::Relaxed); | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, allowing a retry on any error condition aside from the few listed is not robust. Compare to the original implementation.
| STATX_SAVED_STATE.store(STATX_STATE::Present as u8, Ordering::Relaxed); | ||
| return Some(Err(err)); | ||
| } | ||
| Some(libc::ENOSYS | libc::EPERM | libc::EACCES) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this selection come from?
|
Reminder, once the PR becomes ready for a review, use |
Fixes a FIXME in
sys/fs/unix.rs.Currently, if the
statxavailability probe fails withENOMEM,statxis permanently disabled for the process. SinceENOMEMis transient, this change allows falling back tostatfor the current call without updatingSTATX_SAVED_STATEtoUnavailable, permittingstatxto be retried later.