-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove unnecessary trait bounds requirements for array #104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense to do this, but there are two somewhat signifciatn problems with the impl:
- this makes the trait technically unsafe. I think that's easy enough to solve though:
diff --git a/borsh/src/de/mod.rs b/borsh/src/de/mod.rs
index 0ca2e79f..be6dfb4b 100644
--- a/borsh/src/de/mod.rs
+++ b/borsh/src/de/mod.rs
@@ -54,10 +54,9 @@ pub trait BorshDeserialize: Sized {
#[inline]
#[doc(hidden)]
- fn copy_from_bytes(buf: &mut &[u8], out: &mut [MaybeUninit<Self>]) -> Result<bool> {
+ fn array_from_bytes<const N: usize>(buf: &mut &[u8]) -> Result<Option<[Self; N]>> {
let _ = buf;
- let _ = out;
- Ok(false)
+ Ok(None)
}
}
@@ -92,19 +91,18 @@ impl BorshDeserialize for u8 {
#[inline]
#[doc(hidden)]
- fn copy_from_bytes(buf: &mut &[u8], out: &mut [MaybeUninit<Self>]) -> Result<bool> {
- if buf.len() < out.len() {
+ #[cfg(feature = "const_generics")]
+ fn array_from_bytes<const N: usize>(buf: &mut &[u8]) -> Result<Option<[Self; N]>> {
+ if buf.len() < N {
return Err(Error::new(
ErrorKind::InvalidInput,
ERROR_UNEXPECTED_LENGTH_OF_INPUT,
));
}
- let (front, rest) = buf.split_at(out.len());
- for (o, f) in out.iter_mut().zip(front.iter()) {
- o.write(*f);
- }
+ let (front, rest) = buf.split_at(N);
*buf = rest;
- Ok(true)
+ let front: &[u8; N] = (&*front).try_into().unwrap();
+ Ok(Some(*front))
}
}
@@ -582,21 +580,22 @@ where
{
#[inline]
fn deserialize(buf: &mut &[u8]) -> Result<Self> {
- let mut result: [MaybeUninit<T>; N] = unsafe { MaybeUninit::uninit().assume_init() };
-
- if !T::copy_from_bytes(buf, &mut result)? {
+ if let Some(result) = T::array_from_bytes::<N>(buf)? {
+ return Ok(result);
+ } else {
+ let mut result: [MaybeUninit<T>; N] = unsafe { MaybeUninit::uninit().assume_init() };
for elem in result.iter_mut() {
elem.write(T::deserialize(buf)?);
}
+ //* SAFETY: This cast is required because `mem::transmute` does not work with const generics
+ //* https://github.com/rust-lang/rust/issues/61956. This array is guaranteed to be
+ //* initialized by this point
+ let res: Self = unsafe {
+ (&*(&MaybeUninit::new(result) as *const _ as *const MaybeUninit<_>))
+ .assume_init_read()
+ };
+ Ok(res)
}
-
- //* SAFETY: This cast is required because `mem::transmute` does not work with const generics
- //* https://github.com/rust-lang/rust/issues/61956. This array is guaranteed to be
- //* initialized by this point
- let res: Self = unsafe {
- (&*(&MaybeUninit::new(result) as *const _ as *const MaybeUninit<_>)).assume_init_read()
- };
- Ok(res)
}
}
- impl for arrays leaks memory. I think that should be fixable by vendoring enough of stdlib in, but that's going to be a large chunk of code.
borsh/src/de/mod.rs
Outdated
for i in 0..$len { | ||
result[i] = T::deserialize(buf)?; | ||
for elem in &mut result { | ||
elem.write(T::deserialize(buf)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will leak T
if deserialize reteruns an error. This feels like a big bug: if the server expects [String; 4]
, I can OOM it by sending shorter-than-necessary buffer.
array::from_fn
is sadly unstable as of yet, so we'd have to vendor that it (and the impl is a bit non-trivial. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to say array::try_from_fn
? Yeah, it seems to have its roots pretty deep. Unsure if it's worth.
Very valid on the leak, would it be too janky to try deserializing, and on failure loop through and drop all T that have been placed? Just trying to avoid vendoring a ton of code
Edit: 5f2585f
(#104) is that change. Is it (or something similar) completely unreasonable? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that I think is broadly reasonable. I'd say it needs two chages:
- put the logic into some sort of a drop gurard, such that panic in deseralize also triggers dropping of inited stuff
- the pedantic logic would also try to drop the rest of the elements even if dropping one of them panicked. I think you get get that for free by calling
ptr::drop_in_place(&mut [T])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick workaround fce4759
(#104) but going to clean up tomorrow. I'll open this PR back up for review when it's ready :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking PR as ready now. I'm still not 100% confident in the current impl, but it's hard for me to come up with an argument against it.
The main change, aside from switching to use drop_in_place
, is to change wrapping the arrays in a MaybeUninit
and then assuming init to just cast as a pointer to [T; N]
before copying (in the case of initializing) or to *mut [T]
for drop_in_place
(d26078e). I can't pinpoint what the intermediate type is and why it's safe to cast the pointer like this, and if this is actually safe to make the assumption that the memory layout of [T]
will always be the same as [MaybeUninit<T>]
. Do you have an argument against or a suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't pinpoint what the intermediate type is
Not sure I understand the Q, but maybe this: init_range as *mut [MaybeUninit<T>] as *mut [T]
?
why it's safe to cast the pointer like this
Pointer casting is always safe, only dereferencing the result can be UB
that the memory layout of [T] will always be the same as [MaybeUninit]
Yeah, I think [T] == [MaybeUnint] is guaranteed, though not-necessary spelled out. This is in contrast to Option vs Option<MaybeUninit> which doesn't hold due to niches.
Also another hidden issue with this PR is that it bumps the MSRV |
692cd8a
to
fce4759
Compare
let serialized = arr.try_to_vec().unwrap(); | ||
let deserialized: [String; 3] = BorshDeserialize::try_from_slice(&serialized).unwrap(); | ||
assert_eq!(arr, deserialized); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for ZSTs? [(); 92]
and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is now! a52b287
(#104)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now!
So this is technically a breaking change, but only to the
copy_from_bytes
function, which is hidden from docs. Unclear if that is acceptable breakage. Unclear the exact implications of this yet, but it seems like it would be more optimized since it doesn't require callingDefault
and then copying the value to initialize the value, but might lose some performance switching fromcopy_from_slice
to iterating over the buffer and copying each u8 individually.I will look more in-depth when I'm on better internet to check the safety of this cast for the transmute workaround. I have low confidence about that unsafe code currently.
closes #103