Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove unnecessary trait bounds requirements for array #104

Merged
merged 19 commits into from
Oct 10, 2022

Conversation

austinabell
Copy link
Contributor

So this is technically a breaking change, but only to the copy_from_bytes function, which is hidden from docs. Unclear if that is acceptable breakage. Unclear the exact implications of this yet, but it seems like it would be more optimized since it doesn't require calling Default and then copying the value to initialize the value, but might lose some performance switching from copy_from_slice to iterating over the buffer and copying each u8 individually.

I will look more in-depth when I'm on better internet to check the safety of this cast for the transmute workaround. I have low confidence about that unsafe code currently.

closes #103

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense to do this, but there are two somewhat signifciatn problems with the impl:

  • this makes the trait technically unsafe. I think that's easy enough to solve though:
diff --git a/borsh/src/de/mod.rs b/borsh/src/de/mod.rs
index 0ca2e79f..be6dfb4b 100644
--- a/borsh/src/de/mod.rs
+++ b/borsh/src/de/mod.rs
@@ -54,10 +54,9 @@ pub trait BorshDeserialize: Sized {
 
     #[inline]
     #[doc(hidden)]
-    fn copy_from_bytes(buf: &mut &[u8], out: &mut [MaybeUninit<Self>]) -> Result<bool> {
+    fn array_from_bytes<const N: usize>(buf: &mut &[u8]) -> Result<Option<[Self; N]>> {
         let _ = buf;
-        let _ = out;
-        Ok(false)
+        Ok(None)
     }
 }
 
@@ -92,19 +91,18 @@ impl BorshDeserialize for u8 {
 
     #[inline]
     #[doc(hidden)]
-    fn copy_from_bytes(buf: &mut &[u8], out: &mut [MaybeUninit<Self>]) -> Result<bool> {
-        if buf.len() < out.len() {
+    #[cfg(feature = "const_generics")]
+    fn array_from_bytes<const N: usize>(buf: &mut &[u8]) -> Result<Option<[Self; N]>> {
+        if buf.len() < N {
             return Err(Error::new(
                 ErrorKind::InvalidInput,
                 ERROR_UNEXPECTED_LENGTH_OF_INPUT,
             ));
         }
-        let (front, rest) = buf.split_at(out.len());
-        for (o, f) in out.iter_mut().zip(front.iter()) {
-            o.write(*f);
-        }
+        let (front, rest) = buf.split_at(N);
         *buf = rest;
-        Ok(true)
+        let front: &[u8; N] = (&*front).try_into().unwrap();
+        Ok(Some(*front))
     }
 }
 
@@ -582,21 +580,22 @@ where
 {
     #[inline]
     fn deserialize(buf: &mut &[u8]) -> Result<Self> {
-        let mut result: [MaybeUninit<T>; N] = unsafe { MaybeUninit::uninit().assume_init() };
-
-        if !T::copy_from_bytes(buf, &mut result)? {
+        if let Some(result) = T::array_from_bytes::<N>(buf)? {
+            return Ok(result);
+        } else {
+            let mut result: [MaybeUninit<T>; N] = unsafe { MaybeUninit::uninit().assume_init() };
             for elem in result.iter_mut() {
                 elem.write(T::deserialize(buf)?);
             }
+            //* SAFETY: This cast is required because `mem::transmute` does not work with const generics
+            //*         https://github.com/rust-lang/rust/issues/61956. This array is guaranteed to be
+            //*         initialized by this point
+            let res: Self = unsafe {
+                (&*(&MaybeUninit::new(result) as *const _ as *const MaybeUninit<_>))
+                    .assume_init_read()
+            };
+            Ok(res)
         }
-
-        //* SAFETY: This cast is required because `mem::transmute` does not work with const generics
-        //*         https://github.com/rust-lang/rust/issues/61956. This array is guaranteed to be
-        //*         initialized by this point
-        let res: Self = unsafe {
-            (&*(&MaybeUninit::new(result) as *const _ as *const MaybeUninit<_>)).assume_init_read()
-        };
-        Ok(res)
     }
 }
  • impl for arrays leaks memory. I think that should be fixable by vendoring enough of stdlib in, but that's going to be a large chunk of code.

borsh/src/de/mod.rs Outdated Show resolved Hide resolved
borsh/src/de/mod.rs Outdated Show resolved Hide resolved
for i in 0..$len {
result[i] = T::deserialize(buf)?;
for elem in &mut result {
elem.write(T::deserialize(buf)?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leak T if deserialize reteruns an error. This feels like a big bug: if the server expects [String; 4], I can OOM it by sending shorter-than-necessary buffer.

array::from_fn is sadly unstable as of yet, so we'd have to vendor that it (and the impl is a bit non-trivial. )

Copy link
Contributor Author

@austinabell austinabell Oct 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to say array::try_from_fn? Yeah, it seems to have its roots pretty deep. Unsure if it's worth.

Very valid on the leak, would it be too janky to try deserializing, and on failure loop through and drop all T that have been placed? Just trying to avoid vendoring a ton of code

Edit: 5f2585f (#104) is that change. Is it (or something similar) completely unreasonable? :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that I think is broadly reasonable. I'd say it needs two chages:

  • put the logic into some sort of a drop gurard, such that panic in deseralize also triggers dropping of inited stuff
  • the pedantic logic would also try to drop the rest of the elements even if dropping one of them panicked. I think you get get that for free by calling ptr::drop_in_place(&mut [T])

Copy link
Contributor Author

@austinabell austinabell Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick workaround fce4759 (#104) but going to clean up tomorrow. I'll open this PR back up for review when it's ready :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking PR as ready now. I'm still not 100% confident in the current impl, but it's hard for me to come up with an argument against it.

The main change, aside from switching to use drop_in_place, is to change wrapping the arrays in a MaybeUninit and then assuming init to just cast as a pointer to [T; N] before copying (in the case of initializing) or to *mut [T] for drop_in_place (d26078e). I can't pinpoint what the intermediate type is and why it's safe to cast the pointer like this, and if this is actually safe to make the assumption that the memory layout of [T] will always be the same as [MaybeUninit<T>]. Do you have an argument against or a suggestion here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't pinpoint what the intermediate type is

Not sure I understand the Q, but maybe this: init_range as *mut [MaybeUninit<T>] as *mut [T]?

why it's safe to cast the pointer like this

Pointer casting is always safe, only dereferencing the result can be UB

that the memory layout of [T] will always be the same as [MaybeUninit]

Yeah, I think [T] == [MaybeUnint] is guaranteed, though not-necessary spelled out. This is in contrast to Option vs Option<MaybeUninit> which doesn't hold due to niches.

@austinabell
Copy link
Contributor Author

austinabell commented Sep 27, 2022

Also another hidden issue with this PR is that it bumps the MSRV 1.55.0 -> 1.60.0 (using the assume_init_read call). We don't have an MSRV check that would have shown this. Was looking into it since the patch you suggest requires const generics and wasn't sure if that would increase MSRV

@austinabell austinabell mentioned this pull request Sep 27, 2022
@austinabell austinabell marked this pull request as ready for review October 6, 2022 01:07
let serialized = arr.try_to_vec().unwrap();
let deserialized: [String; 3] = BorshDeserialize::try_from_slice(&serialized).unwrap();
assert_eq!(arr, deserialized);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for ZSTs? [(); 92] and such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is now! a52b287 (#104)

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@austinabell austinabell merged commit aec5a4e into master Oct 10, 2022
@austinabell austinabell deleted the austin/array_bounds_fix branch October 10, 2022 16:48
This was referenced May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BorshDeserialize shouldn't require Copy
2 participants