Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use bincode when serializing without macros. Get SyntaxError. #46

Closed
rigdent opened this issue Nov 4, 2015 · 20 comments
Closed

Unable to use bincode when serializing without macros. Get SyntaxError. #46

rigdent opened this issue Nov 4, 2015 · 20 comments

Comments

@rigdent
Copy link

@rigdent rigdent commented Nov 4, 2015

I'm not sure if this is the right place for this, but

I am unable to use bincode to serialize and deserialize structs where the serialize and deserialize traits are implemented instead of derived.
For example, I copied the Point struct, and the serialize and deserialize implementations used in the "Serliazation without macros" section of the readme here, https://github.com/serde-rs/serde#serialization-without-macros and attempted to use bincode to serialize and deserialize a Point variable. I was unable to do so and got this error

thread '

' panicked at 'called Result::unwrap() on an Err value: SyntaxError',
../src/libcore/result.rs:738
Process didn't exit successfully: target/debug/tcpLocal (exit code: 101)

If I instead derive the serialize and deserialize traits, I can serialize and deserialize the Point variable fine. Using the copied implementations of serialize and deserialize, I can serialize and deserialize from JSON just fine.

Is this expected? Did I miss something where only serializing with macros is supported in bincode?

  • Thanks
@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 4, 2015

You should be able to use bincode with custom implementations, provided that the custom implementations are perfect. Bincode cares a lot about the order that items are serialized and deserialized, so things will break if you aren't very careful.

Could you post your custom impls so I can take a look at them?

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 4, 2015

Thanks for the quick response! Here's what I have for my custom IpTuple struct. I also wasn't able to get it to serialize and deserialize the Point Struct that the serde readme implemented those traits for here (it fails when deserializing).

This is also what I followed to create my implementation below. Sorry for the long code and thanks!

extern crate serde;
extern crate bincode;

use serde::ser;
use serde::de;
use std::net::{IpAddr, Ipv4Addr, Ipv6Addr};
use bincode::serde::{deserialize, serialize, DeserializeError};

pub struct IpTuple {
    pub src: IpAddr,
    pub dst: IpAddr,
    pub proto: IpProto,
}

pub struct IpTupleMapVisitor<'a> {
    value: &'a IpTuple,
    state: u8,
}

impl <'a> ser::MapVisitor for IpTupleMapVisitor<'a> {
    fn visit<S>(&mut self, serializer: &mut S) -> Result<Option<()>, S::Error>
        where S: serde::Serializer
    {
        match self.state {
            0 => {
                self.state += 1;
                Ok(Some(try!(serializer.visit_struct_elt("src_ip", convert_ip_addr_to_string(&self.value.src)))))
            },
            1 => {
                self.state += 1;
                Ok(Some(try!(serializer.visit_struct_elt("dst_ip", convert_ip_addr_to_string(&self.value.dst)))))
            },
            2 => {
                self.state += 1;
                match self.value.proto {
                    IpProto::TCP{src_port, dst_port} => {
                        try!(serializer.visit_struct_elt("src_port", src_port));
                        try!(serializer.visit_struct_elt("dst_port", dst_port));
                        Ok(Some(try!(serializer.visit_struct_elt("protocol", 6))))
                    },
                    IpProto::UDP{src_port, dst_port} => {
                        try!(serializer.visit_struct_elt("src_port", src_port));
                        try!(serializer.visit_struct_elt("dst_port", dst_port));
                        Ok(Some(try!(serializer.visit_struct_elt("protocol", 17))))
                    },
                    IpProto::ICMP{icmp_type, icmp_code} => {
                        try!(serializer.visit_struct_elt("icmp_type", icmp_type));
                        try!(serializer.visit_struct_elt("icmp_code", icmp_code));
                        Ok(Some(try!(serializer.visit_struct_elt("protocol", 1))))
                    },
                    IpProto::ICMP6{icmp_type, icmp_code} => {
                        try!(serializer.visit_struct_elt("icmp_type", icmp_type));
                        try!(serializer.visit_struct_elt("icmp_code", icmp_code));
                        Ok(Some(try!(serializer.visit_struct_elt("protocol", 58))))
                    },
                    IpProto::Other(pro) => {
                        Ok(Some(try!(serializer.visit_struct_elt("protocol", pro))))
                    }
                }
            },
            _ => {
                Ok(None)
            }
        }
    }
}

impl ser::Serialize for IpTuple {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: ser::Serializer
    {
        serializer.visit_struct("IpTuple", IpTupleMapVisitor {
            value: self,
            state: 0,
        })
    }
}

fn convert_ip_addr_to_string(ip_addr: &IpAddr) -> String {
    match *ip_addr {
        IpAddr::V4(ipv4_address) => {
            let ipv4_array = ipv4_address.octets();
            let ip_str = &format!("{}.{}.{}.{}",
                                  ipv4_array[0],
                                  ipv4_array[1],
                                  ipv4_array[2],
                                  ipv4_array[3])[..];
            ip_str.to_string()
        }
        IpAddr::V6(ipv6_address) => {
            let ipv6_array = ipv6_address.segments();
            // Convert ip address to string of hex numbers.
            let ip_str = &format!("{:X}:{:X}:{:X}:{:X}:{:X}:{:X}:{:X}:{:X}",
                                  ipv6_array[0],
                                  ipv6_array[1],
                                  ipv6_array[2],
                                  ipv6_array[3],
                                  ipv6_array[4],
                                  ipv6_array[5],
                                  ipv6_array[6],
                                  ipv6_array[7])[..];
            ip_str.to_string()
        }
    }
}
// implement deserialize
#[allow(non_camel_case_types)]
enum IpTupleField {
    src_ip,
    dst_ip,
    src_port,
    dst_port,
    icmp_type,
    icmp_code,
    protocol,
}

impl de::Deserialize for IpTupleField {
    fn deserialize<D>(deserializer: &mut D) -> Result<IpTupleField, D::Error>
        where D: de::Deserializer
    {
        struct IpTupleFieldVisitor;

        impl de::Visitor for IpTupleFieldVisitor {
            type Value = IpTupleField;

            fn visit_str<E>(&mut self, value: &str) -> Result<IpTupleField, E>
                where E: de::Error
            {
                match value {
                    "src_ip" => Ok(IpTupleField::src_ip),
                    "dst_ip" => Ok(IpTupleField::dst_ip),
                    "src_port" => Ok(IpTupleField::src_port),
                    "dst_port" => Ok(IpTupleField::dst_port),
                    "icmp_type" => Ok(IpTupleField::icmp_type),
                    "icmp_code" => Ok(IpTupleField::icmp_code),
                    "protocol" => Ok(IpTupleField::protocol),
                    _ => Err(de::Error::syntax("Unexpected field")),
                }
            }
        }
        deserializer.visit(IpTupleFieldVisitor)
    }
}

impl de::Deserialize for IpTuple {
    fn deserialize<D>(deserializer: &mut D) -> Result<IpTuple, D::Error>
        where D: de::Deserializer
    {
        static FIELDS: &'static [&'static str] = &["src_ip", "dst_ip", "src_port", "dst_port", "icmp_type", "icmp_code", "protocol"];
        deserializer.visit_struct("IpTuple", FIELDS, IpTupleVisitor)
    }
}


struct IpTupleVisitor;

impl de::Visitor for IpTupleVisitor {
    type Value = IpTuple;
    fn visit_map<V> (&mut self, mut visitor: V) -> Result<IpTuple, V::Error>
        where V: de::MapVisitor
    {
        let mut src_ip = None;
        let mut dst_ip = None;
        let mut src_port = None;
        let mut dst_port = None;
        let mut icmp_type = None;
        let mut icmp_code = None;
        let mut protocol = None;
        loop {
            match try!(visitor.visit_key()) {
                Some(IpTupleField::src_ip) => { src_ip = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::dst_ip) => { dst_ip = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::src_port) => { src_port = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::dst_port) => { dst_port = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::icmp_type) => { icmp_type = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::icmp_code) => { icmp_code = Some(try!(visitor.visit_value())); },
                Some(IpTupleField::protocol) => { protocol = Some(try!(visitor.visit_value())); },
                None => { break; }
            }
        }


        // Unwrap the fields that are guaranteed to be in each serialization. i.e. don't want to
        // unwrap the icmp_type field if we're dealing with a TCP connection.
        let src_ip:String = match src_ip {
            Some(src_ip) => src_ip,
            None => try!(visitor.missing_field("src_ip")),
        };

        let dst_ip:String = match dst_ip {
            Some(dst_ip) => dst_ip,
            None => try!(visitor.missing_field("dst_ip")),
        };

        let protocol = match protocol {
            Some(protocol) => protocol,
            None => try!(visitor.missing_field("protocol")),
        };

        try!(visitor.end());

        Ok(IpTuple{
            src: IpAddr::from_str(&src_ip[..]).unwrap(),
            dst: IpAddr::from_str(&dst_ip[..]).unwrap(),
            proto: {
                match protocol {
                    1 => IpProto::ICMP{icmp_code: icmp_code.unwrap(), icmp_type: icmp_type.unwrap()},
                    6 => IpProto::TCP{src_port: src_port.unwrap(), dst_port: dst_port.unwrap()},
                    17 => IpProto::UDP{src_port: src_port.unwrap(), dst_port: dst_port.unwrap()},
                    58 => IpProto::ICMP6{icmp_code: icmp_code.unwrap(), icmp_type: icmp_type.unwrap()},
                    other => IpProto::Other(other),
                }
            },
        })

    }
}
@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 4, 2015

This is your problem right here:

deserializer.visit(IpTupleFieldVisitor)

Bincode doesn't support visit. I filed a bug to improve the error message.

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 4, 2015

What is the alternative to visit, if there is any?

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 4, 2015

I don't know actually... The author of Serde contributed the Serde support to bincode.

I'd honestly stick with the compiler-generated versions. If you are ok with nightly, use Serde's plugin. If not, use rustc_serialize.

You could actually have you system set up to switch between them automatically depending on if you are compiling on nightly or not.

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

I actually am already on Rust Nightly, since I use IpAddr. The reason I can't (or at least don't I can) use the compiler-generated version where I just derive the traits is because IpAddr isn't supported.

I've already made an issue on the serde board, but they've yet to respond.

I think I might just stick with rustc-serialize for binary encoding and serde for json for now.

If you have some more time to spare, I have an unrelated problem I was hoping you could answer.

I'm trying to send buffered data through a tcp stream and am following the below code

use std::fs::File;
use std::io::{BufWriter, BufReader};

fn main() {
    let world = World { entities: vec![Entity {x: 0.0, y: 4.0}, Entity {x: 10.0, y: 20.5}] };

    {
        let mut writer = BufWriter::new(File::create("world.bin").unwrap());
        bincode::encode_into(&world, &mut writer, bincode::SizeLimit::Infinite).unwrap();
    }

    let mut reader = BufReader::new(File::open("world.bin").unwrap());
    let decoded: World = bincode::decode_from(&mut reader, bincode::SizeLimit::Infinite).unwrap();

    assert!(world == decoded);
}

where instead of a file, I put in a TCP stream instance for the writer and TCP Listener for the reader.

If I write two objects into the stream using encode_into and then try to read using the TCP listener using decode_from, I only get back the first object I sent. If I try calling decode_from again, it fails with this error

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidEncoding(InvalidEncoding { desc: "error while decoding utf8 string", detail: Some("Decoding error: invalid utf-8: invalid byte near index 60") })', ../src/libcore/result.rs:738

I know that the reader is reading both objects because if I call read_to_end (read just reads 0 bytes) on the buffered reader instead of using decode_from, I read double the amount of bytes if I had just sent one object using encode_into

BTW. The listener and writer are running in different programs.
If you have any thoughts you'd be willing to share, it would be greatly appreciated. Thanks for all your time!

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

What happens if you read_to_end into a vec, and try to decode from that vec?

Make sure you are flushing your TCP stream too!

If flushing doesn't work, could you send a small, reproducible example so I can play with it? I did a lot of TCP work with Bincode in my Wire and never ran into this issue.

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

I think I implemented my Encodable and Decodable traits incorrectly. When I tried to replicate the issue on a Point struct where I derive the traits, it worked as expected.
When I try running decode_from on my struct that I implemented the Encodable and Decodable traits, I get the above error.
It's weird because I can Encode and Decode individual objects fine. Also, if I use read_to_end to read both objects at once and call decode on a slice of the read bytes (I know how large each of the encoded objects are), I can decode them fine. But calling decode_from doesn't seem to work for me. Any thoughts? Would you like to see my implementation for Encodable and Decodable/ or a small copy of what I'm running? Thanks

btw. I don't think flushing is the issue since the sender program closes after sending, so everything should be flushed then.

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

A small copy of what you are running would be great.

I'm 90% sure that this is an issue with your TCP sockets, because decode (the one that operates on a vec) is actually implemented using decode_from. So it seems unlikely that this is a decoder issue.

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

Here is a link to my example.
https://drive.google.com/file/d/0By6EOjuEPL5IZFR2OHYxcHNUS3M/view?usp=sharing

The sender and receiver are in different programs. Just run the receiver before the sender, but I'm sure you know that already though.
And thanks!

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

Actually, your "read them all into a buffer" code doesn't work either.
Replace the "working code" with this. (notice that I don't use slicing)

       // The below works. The above doesn't.
        let x = buf_stream_reader.read_to_end(&mut data).unwrap();
        println!("Bytes read: {}", x);
        let mut cursor = &data[..];

        let decoded1: IpTuple = decode_from(&mut cursor, bincode::SizeLimit::Infinite).unwrap();
        let decoded2: IpTuple = decode_from(&mut cursor, bincode::SizeLimit::Infinite).unwrap();

        println!("decoded start: {:?}", decoded1.src);
        println!("decoded start2: {:?}", decoded2.src);

    }

You were manually slicing it to where you think it was supposed to start and stop, but your encoding and decoding implementation is wrong. You thought they were 45 bytes long, but they are actually 42.

To be clear, you have a bug in your encoder/decoder pair. It's just that neither bincode strategy should have ever worked.

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

Weird. The "read them all into a buffer" code works fine for me. For me it worked by doing slices of 45 bytes (half of the bytes read, which was 90 for me). It's weird that it didn't work for you.

I guess I'll look into seeing what could be wrong with my implementation of the encodable/decodable trait. If anything popped out to you when you glanced at it, I would love to hear it.

Thanks.

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

The # of bytes read was 84 for me.

Your ipProto code was the part that was broken. Here's a working copy of your code using the derived serialization: https://dl.dropboxusercontent.com/u/14007801/example%204.zip

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

Your solution worked. But now I'm kind of stuck because I wanted the json output to be the specific way I had it.

Oh well. This is a problem I have to figure out on my own for now.

Thanks for all your help! You've been a great help!

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

Final (I promise) question. Do you know if in the benchmarking test done here
http://erickt.github.io/blog/2015/08/07/serde-0-dot-5-0-many-many-changes/

is library: bincode and format: raw output using rustc-serialize?
You might now know this. Thanks!

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

No problem; I'm always happy to help!

Are you talking about the table here?

Rust    bincode (serde) Raw 310
Rust    bincode         Raw 142 291

If so, then yes. The one that isn't labled Serde is using rustc-serialize as an encoder.

@TyOverby TyOverby closed this Nov 5, 2015
@rigdent
Copy link
Author

@rigdent rigdent commented Nov 5, 2015

Cool thanks! If you're curious, I think I'm just going to use both serde_json and bincode using rustc-serialize (i.e. use rustc-serialize for binary and serde_json for json; it seems they're both better at that anyway, and I can keep my json output format).

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 5, 2015

Sounds like that'll work! Shame that getting them unified didn't work out :(

@rigdent
Copy link
Author

@rigdent rigdent commented Nov 10, 2015

Actually, if you're still curious, I was able to fix my decodable implementation to work with decode_from (Turns out it was broken with decode as well. Decode completed without error but the decoded struct had incorrect values). I had needed to change the order in which I encoded the struct. I encoded the struct with some fields that changed depending on field P (protocol in my case), but I had put field P last. But when I decoded, it would always decode P first, then decode whatever that P needed. It worked fine with json because it searched for the key-value pair for P then do the other things, but for binary, it would always read the vector in order, so when it looked for field P in the binary encoded data, it would attempt to read some other field. I fixed this by putting P before all the things that depended on P. It changed my json order a bit, but no biggie.

I think you might have known this already, but maybe someone else can benefit from this. I can now use rustc-serialize for json and binary serialization!

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Nov 10, 2015

Nice! And yeah, I expected that something was up with order or something similar, because it should have worked. Congrats on getting everything together though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.