Add new, lower level API #22

springmeyer · 2017-12-21T00:47:07Z

Context

Applications using gzip-hpp like node-cpp-skel (and apps based on it) need to work in as zero-copy way as possible. The common usecase we have is:

create a std::unique_ptr<std::string>
write a gzip encoded data to that std::string inside that ptr
pass the std::string ownership to node.js

This is described in detail at mapbox/node-cpp-skel#69. And mapbox/node-cpp-skel#67 also relates.

Problem

The current gzip API in master was designed by @GretaCB and @springmeyer to be simple and easy to use. However it does allow you to write to memory that is owned elsewhere. It only has the ability to create a new std::string.

Proposed Solution

So I think the best solution is what is proposed in this PR, which:

Keeps the existing API working without changes
Adds a new, lower level API that can be used by clients with high performance or zero copy needs.

With the low level API it is now possible to:

write (both in compress and decompress) to an existing std::string by reference, which allows the caller to control the memory and allocation of this memory. An advantage here is that the caller may want to reuse this buffer as an arena or might want to pre-allocate lots of memory with reserve to ensure writing to this buffer does not require re-allocation.
~~resize the buffer only if needed~~

- confirms that arena constantly grows in size if the blocks of memory are bigger - all tests use low level api internally so all tests are testing it is working overall

… within them. Removed threading from benchmarks. Made lower level api safe to use by putting resize inside decompress method

springmeyer · 2018-01-09T19:14:24Z

include/gzip/decompress.hpp

@@ -90,8 +91,7 @@ std::string decompress(const char * data, std::size_t size)
 {
    Decompressor decomp;
    std::string output;
-    std::size_t uncompressed_size = decomp.decompress(output,data,size);
-    output.resize(uncompressed_size);
+    decomp.decompress(output,data,size);


The resize is done here rather than in the lower level API to avoid the cost of the resize in situations where you can pass the buffer along to an API that accepts a separate size argument. So, for a downstream usage of the buffer than only accepts a string, you'd obviously need to resize, but for an api that accepts const char *, std::size_t you don't need to resize and can use the buffer as it, and pass size_uncompressed to avoid the resize (and its cost). /cc @flippmoke

@springmeyer resize is required otherwise the length of the string will be too long. It will be an invalid decompression. By default resize() will fill new characters in the loop prior to the default value for new char(). This actually extends the size of the buffer in the start of our decompression loop, which is required as the data must be preallocated and assigned in a std::string so it is not writing to potential areas larger then size(). However, after we are done we may not have used all the space that was allocated and assigned. This will leave an extended set of data on our buffer (set to new char()) that is no a valid part of our decompressed data. This must be trimmed down with our final resize().

GretaCB · 2018-01-09T21:07:06Z

bench/run.cpp

+    // Run once prior to pre-allocate
+    decomp.decompress(output, buffer.data(), buffer.size());
+
+    while (state.KeepRunning())


@flippmoke should we be migrating to for (auto _ : state) {?

We can not unless we upgrade from mason_use(benchmark VERSION 1.2.0) to mason_use(benchmark VERSION 1.3.0).

GretaCB · 2018-01-09T21:08:50Z

@flippmoke per chat, should we just remove the unneeded DoNotOptimize usage as part of this PR as well?

flippmoke · 2018-01-09T21:19:37Z

@GretaCB I should have already done this

GretaCB · 2018-01-09T21:23:06Z

@flippmoke ah 👍 perhaps I was looking at an older commit.

GretaCB · 2018-01-09T21:31:33Z

@flippmoke Like this for example

gzip-hpp/bench/run.cpp

Line 24 in 7aa6d89

benchmark::DoNotOptimize(value.data());

…implified some tests

…lib macros

springmeyer · 2018-03-16T15:28:32Z

include/gzip/decompress.hpp

+inline std::string decompress(std::string const& input)
+{
+    return decompress(input.data(), input.size());
+}


@flippmoke going to revert this addition since it was removed intentionally per #1 (comment) /cc @GretaCB

done in a217d04

springmeyer · 2018-03-21T04:13:28Z

Okay, finally this is ready to go, merging. Also fixes #21.

springmeyer mentioned this pull request Dec 21, 2017

External dep example mapbox/node-cpp-skel#88

Closed

2 tasks

springmeyer mentioned this pull request Jan 8, 2018

Windows Bits Clarity #23

Merged

Dane Springmeyer added 4 commits January 9, 2018 10:43

add low level API

b601f7b

add tests for low level API

6f8024d

- confirms that arena constantly grows in size if the blocks of memory are bigger - all tests use low level api internally so all tests are testing it is working overall

also test capacity size says at reserve in decompression

a8280cd

benchmark both high and low level api

b12adc9

flippmoke force-pushed the low-level-api branch from e214c2c to b12adc9 Compare January 9, 2018 16:44

flippmoke added 2 commits January 9, 2018 11:50

Updated to use limit during decompression.

fef368b

Updated benchmarks to more common benchmark style and fixed some bugs…

a49202f

… within them. Removed threading from benchmarks. Made lower level api safe to use by putting resize inside decompress method

springmeyer mentioned this pull request Jan 9, 2018

implement gzip decompression in vtvalidate mapbox/vtvalidate#8

Closed

4 tasks

springmeyer commented Jan 9, 2018

View reviewed changes

Added previous no reallocation benchmark styles

7aa6d89

GretaCB reviewed Jan 9, 2018

View reviewed changes

flippmoke added 6 commits January 10, 2018 09:01

Updated to benchmark 1.3.0

17a2ba8

Fixed resize to be within lower level api, removed returning sizes

ba26398

Added ability to use strings as well as pointers with sizes to API, s…

aedf8fb

…implified some tests

Fix formatting

6893fe8

Small change to use already calculated size

f3c0e39

Remove pedantic semi colons

acd8c39

springmeyer added this to the v1.x milestone Jan 22, 2018

springmeyer self-assigned this Jan 22, 2018

Dane Springmeyer added 2 commits January 25, 2018 18:25

add sudo:required for sanitizer build - mapbox/node-cpp-skel#93

82dfdb3

template to avoid dep on std::string + ignore old world cast inside z…

3fecd8d

…lib macros

springmeyer commented Mar 16, 2018

View reviewed changes

Dane Springmeyer added 2 commits March 20, 2018 23:36

remove std::string API - refs #22 (comment)

a217d04

remove strategy to simplify API

f06b9b5

Dane Springmeyer added 6 commits March 20, 2018 23:46

apply clang-format

765490b

add to clean and distclean targets

f6b0304

ignore asm usage in catch.hpp

82245ec

use std::uint64_t (thanks clang-tidy)

83681f7

upgrade to llvm 5.0.1 + clang-format

674359b

create multiple tests to trigger #21

ff65c47

springmeyer mentioned this pull request Mar 21, 2018

Multiply-defined symbols linker error #21

Closed

clang-format fix

ffe7147

springmeyer merged commit bb80aac into master Mar 21, 2018

springmeyer deleted the low-level-api branch March 21, 2018 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new, lower level API #22

Add new, lower level API #22

springmeyer commented Dec 21, 2017 •

edited

springmeyer Jan 9, 2018 •

edited

flippmoke Jan 9, 2018

springmeyer Mar 16, 2018

GretaCB Jan 9, 2018

flippmoke Jan 9, 2018

GretaCB commented Jan 9, 2018

flippmoke commented Jan 9, 2018

GretaCB commented Jan 9, 2018

GretaCB commented Jan 9, 2018

springmeyer Mar 16, 2018

springmeyer Mar 21, 2018

springmeyer commented Mar 21, 2018

Add new, lower level API #22

Add new, lower level API #22

Conversation

springmeyer commented Dec 21, 2017 • edited

Context

Problem

Proposed Solution

springmeyer Jan 9, 2018 • edited

Choose a reason for hiding this comment

flippmoke Jan 9, 2018

Choose a reason for hiding this comment

springmeyer Mar 16, 2018

Choose a reason for hiding this comment

GretaCB Jan 9, 2018

Choose a reason for hiding this comment

flippmoke Jan 9, 2018

Choose a reason for hiding this comment

GretaCB commented Jan 9, 2018

flippmoke commented Jan 9, 2018

GretaCB commented Jan 9, 2018

GretaCB commented Jan 9, 2018

springmeyer Mar 16, 2018

Choose a reason for hiding this comment

springmeyer Mar 21, 2018

Choose a reason for hiding this comment

springmeyer commented Mar 21, 2018

springmeyer commented Dec 21, 2017 •

edited

springmeyer Jan 9, 2018 •

edited