New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added C++ benchmark. #1525

Merged
merged 8 commits into from Sep 23, 2016

Conversation

Projects
None yet
4 participants
@haberman
Contributor

haberman commented May 12, 2016

Here are initial benchmark results:

Run on (12 X 3201 MHz CPU s)
2016-05-11 17:49:45
Benchmark                                     Time           CPU Iterations
---------------------------------------------------------------------------
google_message1_proto2_parse_noarena        274 ns        274 ns    2773101   792.218MB/s
google_message1_proto2_parse_arena          996 ns        993 ns     707578   218.903MB/s
google_message1_proto2_serialize            155 ns        156 ns    4489021   1.36459GB/s
google_message1_proto3_parse_noarena        520 ns        519 ns    1268185   419.151MB/s
google_message1_proto3_parse_arena         1204 ns       1205 ns     604370   180.504MB/s
google_message1_proto3_serialize            293 ns        292 ns    2403943   722.365MB/s
google_message2_parse_noarena            125942 ns     126397 ns       5557   638.088MB/s
google_message2_parse_arena              284564 ns     285310 ns       2464   282.683MB/s
google_message2_serialize                 94871 ns      94737 ns      10123   851.324MB/s

@googlebot googlebot added the cla: yes label May 12, 2016

@haberman

This comment has been minimized.

Show comment
Hide comment
@haberman

haberman May 12, 2016

Contributor

Review to @xfxyjwf.

Contributor

haberman commented May 12, 2016

Review to @xfxyjwf.

@haberman

This comment has been minimized.

Show comment
Hide comment
@haberman

haberman May 12, 2016

Contributor

retest this please

Contributor

haberman commented May 12, 2016

retest this please

Show outdated Hide outdated benchmarks/cpp_benchmark.cc
while (state.KeepRunning()) {
const std::string& payload = payloads_[i.Next()];
total += payload.size();
m->ParseFromString(payload);

This comment has been minimized.

@xfxyjwf

xfxyjwf May 18, 2016

Contributor

In the ArenaParseFixture, a new message is created in every parsing loop, however, in this NoArenaParseFixture, you are reusing the same message. A more fair comparison is probably recreating the message in this parsing loop as well.

If we want to benchmark the case where a message is reused, I guess we can change the ArenaParseFixture to something like:

Arena arena;
Message* m = Arena::CreateMessage<T>(&arena);
while () {
  const std::string& payload = payloads_[i.Next()];
  total += payload.size();
  m->ParseFromString(payload);
  if (counter++ % kArenaThreshold == 0) {
    arena.reset();
    Message* m = Arena::CreateMessage<T>(&arena);
  }
}
@xfxyjwf

xfxyjwf May 18, 2016

Contributor

In the ArenaParseFixture, a new message is created in every parsing loop, however, in this NoArenaParseFixture, you are reusing the same message. A more fair comparison is probably recreating the message in this parsing loop as well.

If we want to benchmark the case where a message is reused, I guess we can change the ArenaParseFixture to something like:

Arena arena;
Message* m = Arena::CreateMessage<T>(&arena);
while () {
  const std::string& payload = payloads_[i.Next()];
  total += payload.size();
  m->ParseFromString(payload);
  if (counter++ % kArenaThreshold == 0) {
    arena.reset();
    Message* m = Arena::CreateMessage<T>(&arena);
  }
}

This comment has been minimized.

@haberman

haberman Sep 22, 2016

Contributor

To make this more fair, I split the "NoArena" case into two: one that creates a message from scratch (parse_new) and one that reuses an existing message (parse_reuse).

I'm not sure it makes sense to allocate multiple top-level messages in a single arena, but reset it periodically. Does anybody use arenas this way?

If you can point me to some real-world uses of arena that work this way, I'll update the benchmark (or maybe add a new one for that pattern).

@haberman

haberman Sep 22, 2016

Contributor

To make this more fair, I split the "NoArena" case into two: one that creates a message from scratch (parse_new) and one that reuses an existing message (parse_reuse).

I'm not sure it makes sense to allocate multiple top-level messages in a single arena, but reset it periodically. Does anybody use arenas this way?

If you can point me to some real-world uses of arena that work this way, I'll update the benchmark (or maybe add a new one for that pattern).

Show outdated Hide outdated benchmarks/cpp_benchmark.cc
std::vector<Message*> messages;
for (size_t i = 0; i < payloads_.size(); i++) {
messages.push_back(prototype_->New());
messages.back()->ParseFromString(payloads_[i]);

This comment has been minimized.

@xfxyjwf

xfxyjwf May 18, 2016

Contributor

Move this out of the benchmark method?

@xfxyjwf

xfxyjwf May 18, 2016

Contributor

Move this out of the benchmark method?

This comment has been minimized.

@haberman

haberman Sep 22, 2016

Contributor

Done.

@haberman

haberman Sep 22, 2016

Contributor

Done.

Show outdated Hide outdated benchmarks/cpp_benchmark.cc
while (state.KeepRunning()) {
str.clear();
messages[i.Next()]->SerializeToString(&str);

This comment has been minimized.

@xfxyjwf

xfxyjwf May 18, 2016

Contributor

How about we just allocate a large enough char array beforehand? (to exclude the cost of allocating strings from the benchmark).

@xfxyjwf

xfxyjwf May 18, 2016

Contributor

How about we just allocate a large enough char array beforehand? (to exclude the cost of allocating strings from the benchmark).

This comment has been minimized.

@haberman

haberman Sep 22, 2016

Contributor

Allocation should only happen for the first time through the loop. str.clear() won't release the memory.

@haberman

haberman Sep 22, 2016

Contributor

Allocation should only happen for the first time through the loop. str.clear() won't release the memory.

@haberman

This comment has been minimized.

Show comment
Hide comment
@haberman

haberman Sep 23, 2016

Contributor

Ping @xfxyjwf , cc @gerben-s.

Contributor

haberman commented Sep 23, 2016

Ping @xfxyjwf , cc @gerben-s.

@haberman

This comment has been minimized.

Show comment
Hide comment
@haberman

haberman Sep 23, 2016

Contributor

Results on my desktop:

Run on (12 X 3201 MHz CPU s)
2016-09-23 10:49:28
Benchmark                                      Time           CPU Iterations
----------------------------------------------------------------------------
google_message1_proto2_parse_new             602 ns        604 ns    1150294    359.99MB/s
google_message1_proto2_parse_reuse           255 ns        254 ns    2665418   855.155MB/s
google_message1_proto2_parse_newarena        926 ns        926 ns     769950   234.796MB/s
google_message1_proto2_serialize             165 ns        165 ns    4214354   1.28456GB/s
google_message1_proto3_parse_new             828 ns        825 ns     855202   263.405MB/s
google_message1_proto3_parse_reuse           471 ns        470 ns    1476628   462.167MB/s
google_message1_proto3_parse_newarena       1046 ns       1049 ns     659339   207.186MB/s
google_message1_proto3_serialize             231 ns        232 ns    2993231   909.373MB/s
google_message2_parse_new                 318212 ns     317106 ns       2223   254.338MB/s
google_message2_parse_reuse               113398 ns     113764 ns       6129   708.942MB/s
google_message2_parse_newarena            252076 ns     252894 ns       2802   318.918MB/s
google_message2_serialize                  65855 ns      65689 ns      10722   1.19901GB/s
Contributor

haberman commented Sep 23, 2016

Results on my desktop:

Run on (12 X 3201 MHz CPU s)
2016-09-23 10:49:28
Benchmark                                      Time           CPU Iterations
----------------------------------------------------------------------------
google_message1_proto2_parse_new             602 ns        604 ns    1150294    359.99MB/s
google_message1_proto2_parse_reuse           255 ns        254 ns    2665418   855.155MB/s
google_message1_proto2_parse_newarena        926 ns        926 ns     769950   234.796MB/s
google_message1_proto2_serialize             165 ns        165 ns    4214354   1.28456GB/s
google_message1_proto3_parse_new             828 ns        825 ns     855202   263.405MB/s
google_message1_proto3_parse_reuse           471 ns        470 ns    1476628   462.167MB/s
google_message1_proto3_parse_newarena       1046 ns       1049 ns     659339   207.186MB/s
google_message1_proto3_serialize             231 ns        232 ns    2993231   909.373MB/s
google_message2_parse_new                 318212 ns     317106 ns       2223   254.338MB/s
google_message2_parse_reuse               113398 ns     113764 ns       6129   708.942MB/s
google_message2_parse_newarena            252076 ns     252894 ns       2802   318.918MB/s
google_message2_serialize                  65855 ns      65689 ns      10722   1.19901GB/s
@xfxyjwf

This comment has been minimized.

Show comment
Hide comment
@xfxyjwf

xfxyjwf Sep 23, 2016

Contributor

LGTM

Please squash the commits before merging.

Contributor

xfxyjwf commented Sep 23, 2016

LGTM

Please squash the commits before merging.

WrappingCounter(size_t limit) : value_(0), limit_(limit) {}
size_t Next() {
size_t ret = value_;

This comment has been minimized.

@gerben-s

gerben-s Sep 23, 2016

Contributor

(value + 1) % limit

@gerben-s

gerben-s Sep 23, 2016

Contributor

(value + 1) % limit

This comment has been minimized.

@haberman

haberman Sep 23, 2016

Contributor

I think what I currently have is much faster. "limit" isn't a compile-time constant, so % will turn into a real idiv instruction, which is very slow. Mine is a single extremely predictable branch.

@haberman

haberman Sep 23, 2016

Contributor

I think what I currently have is much faster. "limit" isn't a compile-time constant, so % will turn into a real idiv instruction, which is very slow. Mine is a single extremely predictable branch.

This comment has been minimized.

@gerben-s

gerben-s Sep 23, 2016

Contributor

I consider this the wrong abstraction of the above one-liner.

If you want to do this wrapping as an abstraction than just abstract the whole payload.

const string& NextPayload() { ...}

@gerben-s

gerben-s Sep 23, 2016

Contributor

I consider this the wrong abstraction of the above one-liner.

If you want to do this wrapping as an abstraction than just abstract the whole payload.

const string& NextPayload() { ...}

This comment has been minimized.

@haberman

haberman Sep 23, 2016

Contributor

I disagree. I think what I have is simpler. It doesn't need to know anything about the type or storage or lifetime of the things being iterated over. It is just a simple wrapping counter.

@haberman

haberman Sep 23, 2016

Contributor

I disagree. I think what I have is simpler. It doesn't need to know anything about the type or storage or lifetime of the things being iterated over. It is just a simple wrapping counter.

@gerben-s

LGTM overall minor comment

@haberman haberman merged commit a289d43 into protocolbuffers:master Sep 23, 2016

2 of 4 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
default Build finished.
Details
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment