Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hash computation for protobuf messages. #2304

Closed
wants to merge 1 commit into from
Closed

Support hash computation for protobuf messages. #2304

wants to merge 1 commit into from

Conversation

slavanap
Copy link

This pull request relates to following issue: #2066

It adds hash computation support for ::google::protobuf::Message class and for all generated classes for messages in C++ code.

This pull request is a draft because I need advise or help about how to

  1. write tests for added code (unfortunately, I'm not familiar with google test framework and have not written much tests in my life),
  2. change or regenerate .pb.h files within protobuf repository according changes in these files' generator.

And of course I need a code review.

@xfxyjwf
We can move discussion from issue ticket here, if you want.
And sorry for the delay. Life is hard sometimes.

@grpc-kokoro
Copy link

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

1 similar comment
@grpc-kokoro
Copy link

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Oct 28, 2016

Sorry for not getting back to you early...

I see that you have added a generated function for each message, but we probably can't afford that for code size concerns. Right now, some Google server binary contain so many generated protobuf code that they are either compiled very slowly or even on the brink of not able to be compiled... We are trying hard to reduce the code size so another generated function would not be acceptable... Can you switch to a pure reflection-based implementation instead?

As to the public API, I would prefer an interface similar to the existing MessageDifferencer class. For example, MessageHasher::HashCode(const Message& message). I can foresee some users might want to skip fields/treat fields differently in the future just like what MessageDifferencer already supports. Could you update this pull request in that direction?

@slavanap
Copy link
Author

slavanap commented Oct 29, 2016

/cc @xfxyjwf

I would still consider rethinking this kind of architecture (similar to MessageDifferencer). I started to use hashes because I wanted to use messages as a key in unordered_map.

Computing hashes via reflection for all messages raises performance conserns. As for MessageDifferencer I can understand that reflection usage is needed because there might be unknown fields, but still if developer does not need to compare them, explicit specialization of std::equal_to<T> would be nice. Additionally, C# code generator generates GetHashCode function the same way, I proposed to use in C++.

If you concerned about generated code size, that's possible to add an option whether or not to generate this kind of specializations. And for all integrated protobuf message types add #ifdef section in the same or separate file with specialized templates for std::hash<T> and std::equal_to<T>. Of course, that option will be off by default.

I've already provided hash<::google::protobuf::Message> template specialization that uses reflections within pull request (see here: interface, implementation). Of course, that's possible to add filtration there, but that will make code too complicated to understand for future developers, as happen with MessageDifferencer for me. For instance, take a look at this block of code in protobuf/master branch. Was it intended to call CompareRequestedFieldsUsingSettings even if unknown_compare_result == false to change parent_fields list inside this method or this is just a mistake that influences code performance?

Unfortunately, it looks for me that if I implement MessageHasher as you propose, I probably am not going to use it in my project that uses protobuf, because of performance concerns. In my initial proposal I just wanted protobuf generated message classes to be more integrated with stdlib.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Nov 2, 2016

@slavanap I agree that the performance of a reflection-based implementation will be a concern, though we just can't afford the code size increase. Adding an option is also unlikely to be accepted as we generally don't allow new options unless necessary.

The good news is that we are currently experimenting a light-weight reflection implementation that are designed to replace most of the generated methods and it offers a much faster speed than what the existing reflection API offers. When that's ready, we should be able to implement the hash function based on that and get both benefits (i.e., fast hash without generated code). Until then, we can only accept reflection-based hash implementation to the protobuf library.

You can still implement a generated hash function using a protocol compiler plugin though, i.e., instead of making it a part of protobuf library, use a standalone compiler plugin that will generate the hash<> functions separately (or injected into the protoc genreated files).

@slavanap
Copy link
Author

slavanap commented Nov 19, 2016

@xfxyjwf

Okay. I've cut generator code and other unrelated code from my pull request.

Still there's a few things I want you to know about.

  1. Please pay attention at this line of code https://github.com/google/protobuf/blob/master/src/google/protobuf/util/message_differencer.cc#L508-L511
    Can you explain why unknown_compare_result is checked after the call to CompareRequestedFieldsUsingSettings function?
  2. In my pull request changes, consider to rename ::google::protobuf::hash<::google::protobuf::Message> implementation or move it to std::hash<::google::protobuf::Message> -- this is not for me to decide.
  3. Did I get it right, that if I write such compiler plugin, I should not provide it as a pull request and just use it on my own?

I welcome to any changes to this pull request.

The good news is that we are currently experimenting a light-weight reflection implementation <...>

Will the new reflection interface be similar to one we already have here?

Note, I squashed my both commits into new one. It still can be merged with current master with no issues. Original commits available here.

  * Reverting back compiler and other unrelated changes.
  * Support hash computation for protobuf messages.
@xfxyjwf xfxyjwf self-assigned this Nov 21, 2016
@xfxyjwf
Copy link
Contributor

xfxyjwf commented Mar 20, 2018

Closing old issues/prs. Feel free to reopen.

@xfxyjwf xfxyjwf closed this Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants