Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support big arrays #631

Closed
dtolnay opened this issue Nov 28, 2016 · 15 comments
Closed

Support big arrays #631

dtolnay opened this issue Nov 28, 2016 · 15 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented Nov 28, 2016

Servo does this to support big arrays in one of their proc macros:

https://github.com/servo/heapsize/blob/44e86d6d48a09c9cbc30a122bc8725b188d017b2/derive/lib.rs#L36-L41

Let's do the same but only if the size of the array exceeds our biggest builtin impl.

Thanks @nox.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 13, 2017

One relatively easy workaround for serialization is coercing to a slice:

struct S {
    #[serde(serialize_with = "<[_]>::serialize")]
    arr: [u8; 256],
}

Deserialization is still annoying I think.

@dtolnay dtolnay modified the milestone: v1.0 Apr 8, 2017
@sdleffler
Copy link

Hey folks, this feature is important to me because I'd like to be able to serialize a 512-bit hash (so, 64 bytes) and because the serde impls necessarily only go up to [u8; 32] I cannot serialize a [u8; 64].

As workarounds I'm considering using [[u8; 32]; 2], GenericArray or just lazily using a Box<[u8]>. I'm piqued by the idea of the workaround shown above - @dtolnay did you ever find a deserialization workaround?

Would it be okay to add impls up to 64? Or is there some reason that hasn't been done?

@clarfonthey
Copy link
Contributor

clarfonthey commented Aug 15, 2017

In the meantime, perhaps we should add impls for the sizes that arrayvec provides?

impl<T> Array for [T; 40]
impl<T> Array for [T; 48]
impl<T> Array for [T; 50]
impl<T> Array for [T; 56]
impl<T> Array for [T; 64]
impl<T> Array for [T; 72]
impl<T> Array for [T; 96]
impl<T> Array for [T; 100]
impl<T> Array for [T; 128]
impl<T> Array for [T; 160]
impl<T> Array for [T; 192]
impl<T> Array for [T; 200]
impl<T> Array for [T; 224]
impl<T> Array for [T; 256]
impl<T> Array for [T; 384]
impl<T> Array for [T; 512]
impl<T> Array for [T; 768]
impl<T> Array for [T; 1024]
impl<T> Array for [T; 2048]
impl<T> Array for [T; 4096]
impl<T> Array for [T; 8192]
impl<T> Array for [T; 16384]
impl<T> Array for [T; 32768]
impl<T> Array for [T; 65536]

@dtolnay
Copy link
Member Author

dtolnay commented Aug 16, 2017

@clarcharr I would prefer to stick with what the standard library does, which is 0 to 32 (inclusive).

@dtolnay
Copy link
Member Author

dtolnay commented Aug 16, 2017

Here is a workaround for deserializing.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::fmt;
use std::marker::PhantomData;
use serde::ser::{Serialize, Serializer, SerializeTuple};
use serde::de::{Deserialize, Deserializer, Visitor, SeqAccess, Error};

trait BigArray<'de>: Sized {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>;
}

macro_rules! big_array {
    ($($len:expr,)+) => {
        $(
            impl<'de, T> BigArray<'de> for [T; $len]
                where T: Default + Copy + Serialize + Deserialize<'de>
            {
                fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
                    where S: Serializer
                {
                    let mut seq = serializer.serialize_tuple(self.len())?;
                    for elem in &self[..] {
                        seq.serialize_element(elem)?;
                    }
                    seq.end()
                }

                fn deserialize<D>(deserializer: D) -> Result<[T; $len], D::Error>
                    where D: Deserializer<'de>
                {
                    struct ArrayVisitor<T> {
                        element: PhantomData<T>,
                    }

                    impl<'de, T> Visitor<'de> for ArrayVisitor<T>
                        where T: Default + Copy + Deserialize<'de>
                    {
                        type Value = [T; $len];

                        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                            formatter.write_str(concat!("an array of length ", $len))
                        }

                        fn visit_seq<A>(self, mut seq: A) -> Result<[T; $len], A::Error>
                            where A: SeqAccess<'de>
                        {
                            let mut arr = [T::default(); $len];
                            for i in 0..$len {
                                arr[i] = seq.next_element()?
                                    .ok_or_else(|| Error::invalid_length(i, &self))?;
                            }
                            Ok(arr)
                        }
                    }

                    let visitor = ArrayVisitor { element: PhantomData };
                    deserializer.deserialize_tuple($len, visitor)
                }
            }
        )+
    }
}

big_array! {
    40, 48, 50, 56, 64, 72, 96, 100, 128, 160, 192, 200, 224, 256, 384, 512,
    768, 1024, 2048, 4096, 8192, 16384, 32768, 65536,
}

#[derive(Serialize, Deserialize)]
struct S {
    #[serde(with = "BigArray")]
    arr: [u8; 64],
}

fn main() {
    let s = S { arr: [1; 64] };
    let j = serde_json::to_string(&s).unwrap();
    println!("{}", j);
    serde_json::from_str::<S>(&j).unwrap();
}

@Binero
Copy link
Contributor

Binero commented Aug 16, 2017

As long as you're not working with primes:

#[derive(Serialize, Deserialize, Debug)]
struct MyStruct {
    data: [[u8; 32]; 16],
}

impl MyStruct {
    fn data(&self) -> &[u8; 512] {
        use std::mem::transmute;
        unsafe { transmute(&self.data) }
    }
}

This is a pretty neat workaround for when never expect a human to use the serialised version (e.g. bincode), as it creates a nested array. Added bonus: it also works for Debug, PartialEq, etc.

@Boscop
Copy link

Boscop commented May 3, 2018

FWIW, I use this:

use serde::{Serialize, Serializer};

pub fn serialize_array<S, T>(array: &[T], serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer, T: Serialize {
	array.serialize(serializer)
}

#[macro_export]
macro_rules! serde_array { ($m:ident, $n:expr) => {
	pub mod $m {
		use std::{ptr, mem};
		use serde::{Deserialize, Deserializer, de};
		pub use $crate::serialize_array as serialize;
		use super::*;

		pub fn deserialize<'de, D, T>(deserializer: D) -> Result<[T; $n], D::Error>
		where D: Deserializer<'de>, T: Deserialize<'de> + 'de {
			let slice: Vec<T> = Deserialize::deserialize(deserializer)?;
			if slice.len() != $n {
				return Err(de::Error::custom("input slice has wrong length"));
			}
			unsafe {
				let mut result: [T; $n] = mem::uninitialized();
				for (src, dst) in slice.into_iter().zip(&mut result[..]) {
					ptr::write(dst, src);
				}
				Ok(result)
			}
		}
	}
}}

serde_array!(a64, 64);
serde_array!(a120, 120);
serde_array!(a128, 128);
serde_array!(a384, 384);

And then

struct Foo {
	#[serde(with = "a128")]
	bar: [f32; 128],
}

@dtolnay dtolnay self-assigned this May 7, 2018
@dtolnay dtolnay removed their assignment Jun 3, 2018
@dtolnay
Copy link
Member Author

dtolnay commented Jun 3, 2018

I do not plan to implement the workaround from heapsize_derive. I would prefer to see something like #631 (comment) provided in a crate.

@est31
Copy link
Contributor

est31 commented Dec 9, 2018

@dtolnay do I have your permission to publish this in a crate? You will be credited as co-author.

@dtolnay
Copy link
Member Author

dtolnay commented Dec 9, 2018

Yes go for it! Thanks.

@est31
Copy link
Contributor

est31 commented Dec 9, 2018

Thanks! Published: https://github.com/est31/serde-big-array | https://crates.io/crates/serde-big-array

@est31
Copy link
Contributor

est31 commented Dec 9, 2018

@dtolnay what do you think, does moving it into the serde-rs org make sense?

@trrichard
Copy link

trrichard commented Dec 10, 2018

+1 on moving this into serde-rs.

Ability to serialize/de-serialize arrays larger than 32 should be a core feature. I'd use it for sure.

@dtolnay I do think we should consider changing the derive macro to support it instead. I'd rather have it work out of the box if possible.

@dtolnay
Copy link
Member Author

dtolnay commented Jan 22, 2019

I posted a request for implementation of a slightly different approach: dtolnay/request-for-implementation#17.

@est31
Copy link
Contributor

est31 commented May 18, 2019

To all the people in this thread hoping that const generics will resolve this: When trying to port serde to using const generics, I came across the problem that Serialize and Deserialize are implemented for arrays of size 0 on all types, not requiring Serialize or Deserialize on the type itself. See commit 6388019 that introduced it. This is a major hurdle as of course serde isn't in the business of doing breaking changes any more. So we'll have to wait for a language improvement to allow sth. like impl <T, const N: usize> Serialize for [T; N] where N > 0 or specialization until it can be fixed in serde proper.

This prolongs the lifetime of the serde-big-array crate until such a fix appears on the stable language, which can be well into next decade. Also I'm currently researching whether maybe at least serde-big-array can avoid requiring you to specify array sizes: est31/serde-big-array#3.

@serde-rs serde-rs deleted a comment from mikevoronov Mar 30, 2020
@serde-rs serde-rs locked and limited conversation to collaborators Mar 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

7 participants