Skip to content

Conversation

@jbangelo
Copy link
Contributor

This is a speculative PR that takes advantage of const generics available in the nightly rust compiler. We might want to consider this (or a similar) change once const generics are stable. This PR is split into two levels of changes with the second completely reworking the first. I'd suggest reviewing the changes in isolation treating the first change as a conservative modification and the second and a more intrusive refactor. We can drop the second set of changes if they are not of interest.

The first change modifies the rust code generator to use fixed size arrays in the message types when the message definition includes a known size. It also changes the existing functions to read fixed size arrays to be generic over array sizes instead of taking array size at runtime. Due to the complexity with our string handling code this change does not modify handling of fixed length strings.

The second change is a much wider refactor of our type parsing code in an attempt to reduce the amount of code that the SBP generator needs to generate. It adds a generic trait called SbpParse for parsing types out of something. We implement this trait for all the basic types and for each message, reading out of a &[u8]. We also have a generic implementation for parsing a Vec<T> or [T; N] for any T which can be parsed. Due to the trait definition we need to have separate types for fixed length strings and unbounded length strings so there can be two different parsing implementations. With the introduction of two string types a trait is added to give them a common interface. All of this has the end result of having to only generate a single parse function for each message type and remove the parse_array and parse_array_limit functions from the generated code.

Comment on lines +97 to +98
#[derive(Debug, Clone)]
pub struct BoundedSbpString<const SIZE: usize>([u8; SIZE]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only intentional change in functionality, when the SBP message specifies a fixed length string we store the entire string instead of up to the maximum length.

vec.push(self.parse()?);
}
vec.try_into().map_err(|_| crate::Error::ParseError)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is a practical difference, but we could skip the intermediate vec here with something like

impl<'a, T, const SIZE: usize> SbpParse<[T; SIZE]> for &'a [u8]
where
    &'a [u8]: SbpParse<T>,
{
    fn parse(&mut self) -> Result<[T; SIZE]> {
        let mut data: [std::mem::MaybeUninit<T>; SIZE] =
            unsafe { std::mem::MaybeUninit::uninit().assume_init() };

        for elem in &mut data[..] {
            unsafe {
                std::ptr::write(elem.as_mut_ptr(), self.parse()?);
            }
        }

        let ptr = &mut data as *mut _ as *mut [T; SIZE];
        let res = unsafe { ptr.read() };
        core::mem::forget(data);

        Ok(res)
    }
}

Honestly not sure what the last few lines do, I wanted to write

impl<'a, T, const SIZE: usize> SbpParse<[T; SIZE]> for &'a [u8]
where
    &'a [u8]: SbpParse<T>,
{
    fn parse(&mut self) -> Result<[T; SIZE]> {
        let mut data: [std::mem::MaybeUninit<T>; SIZE] =
            unsafe { std::mem::MaybeUninit::uninit().assume_init() };

        for elem in &mut data[..] {
            unsafe {
                std::ptr::write(elem.as_mut_ptr(), self.parse()?);
            }
        }

        unsafe { Ok(std::mem::transmute::<_, [T; SIZE]>(data)) }
    }
}

But apparently you can't transmute with const generics atm. We could probably also avoid the unsafe by adding a Default trait bound so we could init the array like let mut data = [Default::default(), SIZE];

Not sure this is worth the unsafe, but seems like it would avoid putting some stuff on the heap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to figure out how to avoid the heap allocation, but couldn't figure out how to use MaybeUninit without using even more unstable nightly APIs. I think we should consider something like what you've recommended, all of the fixed size arrays are pretty small so a heap allocation seems overly expensive if it can be avoided. Thanks for the starting point! I'll try and work it in.

Copy link
Contributor

@notoriaga notoriaga Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have linked this earlier but for reference - https://doc.rust-lang.org/nomicon/unchecked-uninit.html
Explains why MaybeUninit::uninit().assume_init() can work (I think)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also based on that, I think the loop is better off as -

for i in 0..SIZE {
    data[i] = std::mem::MaybeUninit::new(self.parse()?);
}

Copy link
Contributor

@notoriaga notoriaga Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I have will leak memory if the array fills up partially then errors.MaybeUninit stops drop from getting called. I think that's fine if T is just a scalar - are there messages with nested arrays/vecs? If so I think we need to manually drop whatever got created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SBP can only nest fixed length arrays (and strings), an unbounded array can't be followed by anything else and has to be made up of fields with a known size. If we change the string type to also used fixed size arrays then we should be fine. That being said it seems like we're asking for trouble by not calling drop on objects when we should.

Copy link
Contributor

@silverjam silverjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle, I'm curious if there's a measurable impact on speed or memory efficiency? From the PR description I'm not sure I understand the benefit/motivation of using const generics over the current approach.

@jbangelo
Copy link
Contributor Author

jbangelo commented Feb 4, 2021

@silverjam The original argument I had, and what motivated me to propose this change, was to make the Rust API more representative of the SBP specification. When then spec says that there are exactly X elements in an array it seems better and more honest for the Rust types representing those messages to mirror that fact. [T; X] is self documenting about the length while a Vec<T> could be any length and you need to read through the protocol documentation to know if it's truly variable length or will always be a fixed length. I acknowledge this is a more philosophical argument, and it might not be worth introducing a breaking API change for this.

Performance impact is a good question. In theory I would expect these changes to give a small boost since we are potentially avoiding a system call to allocate memory (assuming that I address @notoriaga's comment about removing the use of Vec in the array parsing function). On the other hand there might be more CPU used for copying when moving an array as compared to moving a vector. In reality I don't think there would be a noticeable change since these are such small arrays and the messages they are in are relatively small. Though we should test/benchmark this assumption to make sure I haven't introduced some slow down.

To be honest this is a fair amount of change for tweaking about 15 messages by my count.

@jbangelo
Copy link
Contributor Author

I'm going to close this. It doesn't seem like a good trade off at the moment, but we should keep this in mind for any future large scale refactors we do of the Rust parser code.

@jbangelo jbangelo closed this May 17, 2021
@jbangelo jbangelo deleted the jbangelo/fixed-size-rust-arrays branch May 17, 2021 18:36
@notoriaga notoriaga mentioned this pull request Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants