Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hash and Eq on protobuf messages #211

Closed
thijsc opened this issue Mar 26, 2017 · 13 comments
Closed

Implement Hash and Eq on protobuf messages #211

thijsc opened this issue Mar 26, 2017 · 13 comments

Comments

@thijsc
Copy link
Contributor

thijsc commented Mar 26, 2017

For our use case it'd be really convenient to have the Hash and Ord trait implemented on protobuf messages. Mainly to be able to do some caching using protobuf keys in a HashMap or BTreeMap.

I see a way to do this by generating these traits using only the actual protobuf fields, and not the specials fields like unknown_fields.

Is this something you'd be interested in merging?

@thijsc
Copy link
Contributor Author

thijsc commented Mar 28, 2017

@stepancheg any thoughts on this?

@stepancheg
Copy link
Owner

stepancheg commented Mar 28, 2017

Yes, I have thoughts, but I don't know the right answer. My thoughts are:

  • I'd prefer #[derive(Hash)] to manually generated impls (for generated code size). However it is seems to not easily possible, because it won't work for messages with map fields
  • why would you want to ignore unknown_fields? I think they should be part of hash
  • Probably the best way to achieve this at first would be making an instruction in .proto file to enable generation of Hash or Ord, but extensions are not yet implemented in rust-protobuf

@thijsc
Copy link
Contributor Author

thijsc commented Apr 2, 2017

why would you want to ignore unknown_fields? I think they should be part of hash

You're right, that should be in there.

Probably the best way to achieve this at first would be making an instruction in .proto file to enable generation of Hash or Ord, but extensions are not yet implemented in rust-protobuf

Do you think they should not be generated by default?

@stepancheg
Copy link
Owner

Do you think they should not be generated by default?

Rust doesn't implement Hash for HashMap or f32, so if any field of message has map or float field, simple #[derive(Hash)] won't work.

@ghost
Copy link

ghost commented Sep 21, 2018

Hi @stepancheg, I would like to work on this particular issue.

For one, I have been trying to add the "Hash" trait derivation to the protobuf-codegen folder. Here's what I have done so far:

Added the following helper functino to protobuf-gen/src/message.rs

fn hash_derive_enabled(&self) -> bool {                                                                           
        self.customize.hash_derive.unwrap_or(false)                                                                   
}

Then updated the logic adding "Hash" to the derive vector and how to handle unknown_fields and cached_size. Currently, I have simply added [#skip(Hash)] above unknown_flied (since the Hash trait is not implemented). cached_size needs no special treatment.

What I do not understand is how to have hash_derive_enabled evaluate to true. I added a member on the customize struct:

pub hash_derive: Option<bool>,   

and then added logic to the update_with function in the implementation of Customize.

But what I do not understand is how to add hash_derive to rustproto, i.e. in customize.rs I call:

let hash_derive = rustproto::exts::hash_derive.get(source);

And in rustproto, I added hash_derive as a field of value 17031 like this:

pub const hash_derive: ::protobuf::ext::ExtFieldOptional<::protobuf::descriptor::MessageOptions, ::protobuf::types::ProtobufTypeBool> = ::protobuf::ext::ExtFieldOptional { field_number: 17031, phantom: ::std::marker::PhantomData };  
                                                                                                                  
pub const hash_derive_all: ::protobuf::ext::ExtFieldOptional<::protobuf::descriptor::FileOptions, ::protobuf::types::ProtobufTypeBool> = ::protobuf::ext::ExtFieldOptional { field_number: 17031, phantom: ::std::marker::PhantomData };

However, I have no idea how to format my .proto files such that hash_derive_enabled() = true. I believe I need to set the hash_derive field in the .proto file, but that hasn't worked yet. Is this because I have to modify the file_descriptor_proto_data??? Have I chosen the wrong field number?

Also, what modifications would I need to make to the protobuf directory in order to use protobuf instead of protobuf-codegen-pure to generate my rust files from .proto files?

Thanks for your help. I can add a PR if you would prefer

@stepancheg
Copy link
Owner

@trustyrust what is [#skip(Hash)]? I have not found it anywhere.

BTW, I think that if a message has float fields, then Hash should be implemented manually instead of Derive (i. e. by bitcasting them to integers).

I have no idea how to format my .proto files such that hash_derive_enabled() = true

import "rustproto.proto";

option (rustproto.xxxxx) = true;

If I understood the question correctly.

However, this options can be set programmatically when protoc-rust or protobuf-codegen-pure crates are used.

@ghost
Copy link

ghost commented Sep 21, 2018

Hi @stepancheg,

Thanks for the help.

Regarding skip(Hash), sorry I was mistaken. I wanted to implement something like here:

w.write_line("#[serde(skip)]");

Otherwise I could implement the Hash trait on unknown_fields. Which would you advise is better?

I'm not sure I understand the comment on how to format .proto files:

  1. Where is rustproto.proto? Is that simply generated by protobuf or protobuf-codegen?
  2. Do I need to modify the file_descriptor_proto_data at all? that's where serde_derive and serde_derive_all is located, so I assumed I would need to make a similar entry in there.
  3. so if I have a message.proto, I can do something like:
     syntax = "proto3";
     import "rustproto.proto";
     message Message {
         uint32 count = 1;
         option(rustproto.hash_derive) = true;
     }

If I wanted to have the same functionality in /protobuf instead of /protobuf-codegen, how would I go about doing this? As I understand it, /protobuf is a library that uses the bindings to protoc in protoc-rust, is that correct? Would I make these modifications there?

To set the options programmatically, do I just set the options in the build.rs when I execute the run command?

@stepancheg
Copy link
Owner

stepancheg commented Sep 22, 2018

@trustyrust

Regarding skip(Hash), sorry I was mistaken. I wanted to implement something like here:
w.write_line("#[serde(skip)]");

Serde skip does not apply to #[Derive(Hash)].

Otherwise I could implement the Hash trait on unknown_fields. Which would you advise is better?

Hash for UnknownFields and for map fields can be implemented the same way it is implemented in Java: by computing a sum of hashes of entries. It won't be as good hash as for BTreeMap for example, but probably good enough for practical use cases.

impl Hash for UnknownFields {
    fn hash<H : Hasher>(&self, state: &mut H) {
        if let Some(fields) = self.fields {
            let mut sum = 0;
            for entry in &*fields {
                let mut entry_hasher = DefaultHasher::new();
                entry.hash(entry_hasher);
                sum.wrapping_add(entry_hasher.finish());
            }
            state.write_u64(sum);
        }
    }
}

Where is rustproto.proto? Is that simply generated by protobuf or protobuf-codegen?

Here

It is updated manually, and should be in sync with Customize.

Note that options can be specified via rustproto.proto or via Customize. These are alternative APIs.

Do I need to modify the file_descriptor_proto_data at all? that's where serde_derive and serde_derive_all is located, so I assumed I would need to make a similar entry in there.

file_descriptor_proto_data is simply a binary representation of original .proto file. It is generated automatically.

so if I have a message.proto, I can do something like:
...
If I wanted to have the same functionality in /protobuf instead of /protobuf-codegen, how would I go about doing this? As I understand it, /protobuf is a library that uses the bindings to protoc in protoc-rust, is that correct?

protobuf library is a runtime which is used in generated code
protobuf-codegen is a crate which contains code-generator

Would I make these modifications there?

To make this change, modifications are needed in:

  • protobuf crate (Hash for UnknownFields, and probably other runtime support for hashing float and map fields)
  • protobuf-codegen crate Customize struct
  • rustproto.proto (make it in sync with Customize)
  • protobuf-codegen which changes generated code
  • finally, later, protobuf contains generated code, which probably needs to be regenerated (so if user protos reference builtin protos, hash work), but could be probably done later

To set the options programmatically, do I just set the options in the build.rs when I execute the run command?

For development, the easiest set up is writing protos and tests in protobuf-test crate and simply execute cargo test.

E. g. look at this pair:

test_oneof_expose_pb.proto proto
test_oneof_expose.rs test

And one last thought

rust-protobuf has a reflection implementation. Lots of things can be done in runtime without invoking code generator (json and text format are implemented that way).

When performance is not very critical, probably hashing can be implemented via reflection.

This is probably not acceptable, but still need to be considered.

And yet another one last thought

Probably rust-protobuf should provide hooks for code generation.

E. g. protobuf-codegen could have a callback parameter like:

trait CodegenCallbacks {
    fn code_to_insert_before_message(&self, message_type: &str) -> Option<String>;
}

So #[derive(Hash)] for simple messages can be inserted using that hook and rust-protobuf won't have to deal with all those complexities (except for implementation of Hash for UnknownFields, but that's not a big deal).

@stepancheg
Copy link
Owner

Task about callbacks: #338

jamesray1 added a commit to jamesray1/rust-libp2p that referenced this issue Jan 2, 2019
@jamesray1
Copy link
Contributor

jamesray1 commented Jan 2, 2019

Shall I assume that nobody is working on this and #338? I will have a go at solving them.

@jamesray1
Copy link
Contributor

Looks like Hash has already been implemented for UnknownFields in 113babc.

@mverleg
Copy link

mverleg commented Jun 7, 2019

I'm not sure there should be Hash and Eq implementations for messages that have floats or maps as proto fields, at least not by default. Floats aren't Eq because Eq is for total ordering, and floats don't have total ordering. Seems weird to go against the language standard.

UnknownFields could be an exception as it's internal and we can judge if and how it makes sense in this particular case. But that's not the case for fields generated from proto messages.

Just my two cents. I'm all in favour of implementing them for any message where they can be derived!

@stepancheg
Copy link
Owner

rust-protobuf=3 allows user-provided callbacks, so users can insert #[derive(Hash)] for the generated messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants