Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new, lower level API #22

Merged
merged 24 commits into from Mar 21, 2018
Merged

Add new, lower level API #22

merged 24 commits into from Mar 21, 2018

Conversation

springmeyer
Copy link
Contributor

@springmeyer springmeyer commented Dec 21, 2017

Context

Applications using gzip-hpp like node-cpp-skel (and apps based on it) need to work in as zero-copy way as possible. The common usecase we have is:

  • create a std::unique_ptr<std::string>
  • write a gzip encoded data to that std::string inside that ptr
  • pass the std::string ownership to node.js

This is described in detail at mapbox/node-cpp-skel#69. And mapbox/node-cpp-skel#67 also relates.

Problem

The current gzip API in master was designed by @GretaCB and @springmeyer to be simple and easy to use. However it does allow you to write to memory that is owned elsewhere. It only has the ability to create a new std::string.

Proposed Solution

So I think the best solution is what is proposed in this PR, which:

  • Keeps the existing API working without changes
  • Adds a new, lower level API that can be used by clients with high performance or zero copy needs.

With the low level API it is now possible to:

  • write (both in compress and decompress) to an existing std::string by reference, which allows the caller to control the memory and allocation of this memory. An advantage here is that the caller may want to reuse this buffer as an arena or might want to pre-allocate lots of memory with reserve to ensure writing to this buffer does not require re-allocation.
  • resize the buffer only if needed

Dane Springmeyer added 4 commits January 9, 2018 10:43
 - confirms that arena constantly grows in size if the blocks of memory are bigger
 - all tests use low level api internally so all tests are testing it is working overall
… within them. Removed threading from benchmarks. Made lower level api safe to use by putting resize inside decompress method
@@ -90,8 +91,7 @@ std::string decompress(const char * data, std::size_t size)
{
Decompressor decomp;
std::string output;
std::size_t uncompressed_size = decomp.decompress(output,data,size);
output.resize(uncompressed_size);
decomp.decompress(output,data,size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resize is done here rather than in the lower level API to avoid the cost of the resize in situations where you can pass the buffer along to an API that accepts a separate size argument. So, for a downstream usage of the buffer than only accepts a string, you'd obviously need to resize, but for an api that accepts const char *, std::size_t you don't need to resize and can use the buffer as it, and pass size_uncompressed to avoid the resize (and its cost). /cc @flippmoke

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@springmeyer resize is required otherwise the length of the string will be too long. It will be an invalid decompression. By default resize() will fill new characters in the loop prior to the default value for new char(). This actually extends the size of the buffer in the start of our decompression loop, which is required as the data must be preallocated and assigned in a std::string so it is not writing to potential areas larger then size(). However, after we are done we may not have used all the space that was allocated and assigned. This will leave an extended set of data on our buffer (set to new char()) that is no a valid part of our decompressed data. This must be trimmed down with our final resize().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

bench/run.cpp Outdated
// Run once prior to pre-allocate
decomp.decompress(output, buffer.data(), buffer.size());

while (state.KeepRunning())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flippmoke should we be migrating to for (auto _ : state) {?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not unless we upgrade from mason_use(benchmark VERSION 1.2.0) to mason_use(benchmark VERSION 1.3.0).

@GretaCB
Copy link
Contributor

GretaCB commented Jan 9, 2018

@flippmoke per chat, should we just remove the unneeded DoNotOptimize usage as part of this PR as well?

@flippmoke
Copy link
Member

@GretaCB I should have already done this

@GretaCB
Copy link
Contributor

GretaCB commented Jan 9, 2018

@flippmoke ah 👍 perhaps I was looking at an older commit.

@GretaCB
Copy link
Contributor

GretaCB commented Jan 9, 2018

@flippmoke Like this for example

benchmark::DoNotOptimize(value.data());

@springmeyer springmeyer added this to the v1.x milestone Jan 22, 2018
@springmeyer springmeyer self-assigned this Jan 22, 2018
inline std::string decompress(std::string const& input)
{
return decompress(input.data(), input.size());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flippmoke going to revert this addition since it was removed intentionally per #1 (comment) /cc @GretaCB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in a217d04

@springmeyer
Copy link
Contributor Author

Okay, finally this is ready to go, merging. Also fixes #21.

@springmeyer springmeyer merged commit bb80aac into master Mar 21, 2018
@springmeyer springmeyer deleted the low-level-api branch March 21, 2018 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants