New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new, lower level API #22
Conversation
- confirms that arena constantly grows in size if the blocks of memory are bigger - all tests use low level api internally so all tests are testing it is working overall
e214c2c
to
b12adc9
Compare
… within them. Removed threading from benchmarks. Made lower level api safe to use by putting resize inside decompress method
include/gzip/decompress.hpp
Outdated
@@ -90,8 +91,7 @@ std::string decompress(const char * data, std::size_t size) | |||
{ | |||
Decompressor decomp; | |||
std::string output; | |||
std::size_t uncompressed_size = decomp.decompress(output,data,size); | |||
output.resize(uncompressed_size); | |||
decomp.decompress(output,data,size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resize is done here rather than in the lower level API to avoid the cost of the resize in situations where you can pass the buffer along to an API that accepts a separate size argument. So, for a downstream usage of the buffer than only accepts a string, you'd obviously need to resize, but for an api that accepts const char *, std::size_t
you don't need to resize and can use the buffer as it, and pass size_uncompressed
to avoid the resize (and its cost). /cc @flippmoke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@springmeyer resize is required otherwise the length of the string will be too long. It will be an invalid decompression. By default resize() will fill new characters in the loop prior to the default value for new char()
. This actually extends the size of the buffer in the start of our decompression loop, which is required as the data must be preallocated and assigned in a std::string so it is not writing to potential areas larger then size()
. However, after we are done we may not have used all the space that was allocated and assigned. This will leave an extended set of data on our buffer (set to new char()
) that is no a valid part of our decompressed data. This must be trimmed down with our final resize()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
bench/run.cpp
Outdated
// Run once prior to pre-allocate | ||
decomp.decompress(output, buffer.data(), buffer.size()); | ||
|
||
while (state.KeepRunning()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flippmoke should we be migrating to for (auto _ : state) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not unless we upgrade from mason_use(benchmark VERSION 1.2.0)
to mason_use(benchmark VERSION 1.3.0)
.
@flippmoke per chat, should we just remove the unneeded |
@GretaCB I should have already done this |
@flippmoke ah 👍 perhaps I was looking at an older commit. |
@flippmoke Like this for example Line 24 in 7aa6d89
|
…implified some tests
include/gzip/decompress.hpp
Outdated
inline std::string decompress(std::string const& input) | ||
{ | ||
return decompress(input.data(), input.size()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flippmoke going to revert this addition since it was removed intentionally per #1 (comment) /cc @GretaCB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in a217d04
Okay, finally this is ready to go, merging. Also fixes #21. |
Context
Applications using gzip-hpp like node-cpp-skel (and apps based on it) need to work in as zero-copy way as possible. The common usecase we have is:
std::unique_ptr<std::string>
std::string
inside that ptrstd::string
ownership to node.jsThis is described in detail at mapbox/node-cpp-skel#69. And mapbox/node-cpp-skel#67 also relates.
Problem
The current gzip API in master was designed by @GretaCB and @springmeyer to be simple and easy to use. However it does allow you to write to memory that is owned elsewhere. It only has the ability to create a new
std::string
.Proposed Solution
So I think the best solution is what is proposed in this PR, which:
With the low level API it is now possible to:
std::string
by reference, which allows the caller to control the memory and allocation of this memory. An advantage here is that the caller may want to reuse this buffer as an arena or might want to pre-allocate lots of memory withreserve
to ensure writing to this buffer does not require re-allocation.resize the buffer only if needed