Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to expand MessageDereferencer implementation with hash computation of the message #2066

Open
slavanap opened this issue Sep 3, 2016 · 5 comments

Comments

@slavanap
Copy link

slavanap commented Sep 3, 2016

Assume I have some generated protobuf message class MyMessage in C++.
Assume I want to use std::unordered_map<MyMessage, int> map;
I can not use it, because I need to provide std::hash<MyMessage> implementation.

As for std::equal_to<MyMessage>, it can be easily supported via MessageDereferencer static method

#include <google/protobuf/util/message_differencer.h>
inline bool operator==(const google::protobuf::Message& a, const google::protobuf::Message& b) {
  return google::protobuf::util::MessageDifferencer::Equals(a, b);
}

But new utility MessageDereferencer class lacks API for hash computation of the message. Thus std::hash<google::protobuf::Message> still has to use descriptor and reflection of the message to (recursively) compute hash of the message.

Thus this issue is a request for such functionality that can be widely used across C++ or request for suggestions how to properly instantiate unordered_set<MyMessage>.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Sep 6, 2016

It's not a very good idea to use proto message as keys because they are not true value types. With the presence of unknown fields, two equal messages in one binary might be treated as different by another, and therefore two different binaries might get two different unordered_map<> even given the exactly same input data (if the hash function is implemented using MessageDifferencer). It's probably better to write your own hash function that has your desired behavior rather than replying on a generic one.

@slavanap
Copy link
Author

slavanap commented Sep 8, 2016

@xfxyjwf
Well I do understand that messages are not true value types. I use simple messages as keys. And actually I don't care about unknown message fields in my app algorithms, because I don't plan to use them.

One concern I have before implementing that std::hash<Message> template myself is that only protobuf developers know how to implement it performance-effective for Message type. Moreover, talking about std::hash it doesn't necessary to be different for all different messages. Only operator==(Message&, Message&) is required to have that behavior.

About the feature, before implementing it myself, would you look through pull request with this feature if I provide one, or should I instead fork protobuf repo to integrate this feature?

As for the feature's usage scenario consider this simplified subscription model below (I use similar one in my app). If you have any ideas how to effectively implement this model in different way (without std::unordered_map<Message, some_data>) then I'll gladly review it and integrate it, if it fits my app.

message SymbolDescriptor {
  required string name = 1;
  optional string class_ = 2;
  optional group Period = 3 {
    required google.protobuf.Timestamp begin = 1;
    required google.protobuf.Timestamp end = 2;
  }
}

message Data {
  // some data message related to the Symbol
}


class Processor {
public:
  using Subscriber = std::string;

  // called by data consumer
  void OnSubscriptionRequest(const Subscriber& fromwhom, const SymbolDescriptor& forwhat) {
    bool first_symbol_subscriber = (_subscribersMap.find(forwhat) == _subscribersMap.end());
    auto ret = _subscribersMap.insert(decltype(_subscribersMap)::value_type(forwhat, fromwhom));
    if (ret.second)
      throw std::runtime_error("You already have subscribed on this symbol");
    if (first_symbol_subscriber) {
      // perform additional actions because it is first subscriber for the symbol
    }
  }

  // called by data consumer
  void OnUnsubscribeRequest(const Subscriber& fromwhom, const SymbolDescriptor& forwhat) {
    auto range = _subscribersMap.equal_range(forwhat);
    auto it = std::find_if(range.first, range.second,
      [&fromwhom](const decltype(_subscribersMap)::value_type& pair) {
        return pair.second == fromwhom;
      }
    );
    if (it == range.second) {
      throw std::runtime_error("You have not subscribed to this symbol");
    }
    _subscribersMap.erase(it);
    if (_subscribersMap.find(forwhat) == _subscribersMap.end()) {
      // perform additional actions because it is last subscriber for the symbol
    }
  }

  // called by data producer
  void BroadcastSubscriptionData(const SymbolDescriptor& subscription, const std::shared_ptr<const Data>& data) const {
    auto rng = _subscribersMap.equal_range(subscription);
    for (auto it = rng.first; it != rng.second; ++it) {
      SendTo(it->second, data);
    }
  }

  void SendTo(const Subscriber&, const std::shared_ptr<const Data>& data) const;

private:
  std::unordered_map<SymbolDescriptor, Subscriber> _subscribersMap;

};

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Sep 8, 2016

@slavanap Thanks for the detailed explanation. If the proto message is simple enough (doesn't involve unknown fields etc), a generic hash function should work fine.

If you would like to contribute an implementation, can you first create one with only the public interface? We can start there and once we agree upon the public interface of this feature you can add the actual implementation.

@slavanap
Copy link
Author

slavanap commented Sep 17, 2016

@xfxyjwf sorry for the delay, lots of work.

After a more deep consideration I suggest the interface to look something like this.

#include <google/protobuf/message.h>
#include <google/protobuf/stubs/hash.h>

GOOGLE_PROTOBUF_HASH_NAMESPACE_DECLARATION_START

template<>
struct hash<::google::protobuf::Message> {
    size_t operator()(const ::google::protobuf::Message& m) const;
};

GOOGLE_PROTOBUF_HASH_NAMESPACE_DECLARATION_END

This code may be placed either in google/protobuf/util/message_differencer.h or in a separate file.

In protobuf, you guys already have some hash implementations (here for example), so for hash computation we just need to iterate through message fields (probably only 1-2 highest levels) and call already implemented hash operators for them and combine the results.

ADDED

Interestingly to notice, you already implemented the same functionality, but for C# here.
Probably the best solution might be to add the template<> struct hash<GENERATED_MESSAGE> specialization in protoc compiler, not only in protobuf library.

@xfxyjwf xfxyjwf self-assigned this Sep 17, 2016
@slavanap
Copy link
Author

@xfxyjwf I don't mind to implement this feature by myself, but my pull request code will need a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants