frame-api: add function to insert uncomressed data #1094

alexmohr · 2022-06-09T11:20:20Z

new function uncompressed_update allows to insert blocks without
compression into the lz4 stream.
The usage is documented in the frameCompress example

This could be a solution for #814

^{Alexander Mohr, alexander.m.mohr@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH, imprint}

Signed-off-by: Alexander Mohr alexander.m.mohr@mercedes-benz.com

t-mat · 2022-06-09T14:23:28Z

Hi, @alexmohr
If you have some difficulty to pass our compatibility test, please use make c_standards in your local terminal.
It checks compatibility with C90, C99 and C11.

new method `uncompressed_update` allows to insert blocks without compression into the lz4 stream. The usage is documented in the frameCompress example Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr · 2022-06-09T15:11:32Z

@t-mat Thanks, I think the compatibility tests are working now. I also had a segfault because I forgot to add a null check if no uncompressed file is passed.

Cyan4973 · 2022-06-09T16:41:18Z

lib/lz4.h

@@ -346,6 +346,8 @@ LZ4LIB_API int LZ4_loadDict (LZ4_stream_t* streamPtr, const char* dictionary, in
 */
 LZ4LIB_API int LZ4_compress_fast_continue (LZ4_stream_t* streamPtr, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);

+LZ4LIB_API int LZ4_DictSize (LZ4_stream_t* LZ4_dict, int dictSize);


A few conventions :

function names start in lowercase, excluding the prefix

new functions shall be documented. What does it do ? Set a new dictSize ? Get a current dictSize ? What are the limitations ? What is the parameter for ? What happens in case of error ?

Generally, function name starts with a verb/action, to better qualify the effect, for example LZ4_setDictSize() or LZ4_reduceDictSize().

New symbols do not start their life directly in "stable" area. They have to spend some time in "staging" area below, to prove their worth and collect user feedback. As a consequence, the qualifier changes to LZ4LIB_STATIC_API.

I think I solved your comments. Also added a fuzzing test to make sure the changes are working properly

Also added a fuzzing test to make sure the changes are working properly

Great !

Cyan4973 · 2022-06-09T16:44:53Z

lib/lz4hc.h

@@ -173,6 +173,8 @@ LZ4LIB_API int LZ4_compress_HC_continue_destSize(LZ4_streamHC_t* LZ4_streamHCPtr
                                           const char* src, char* dst,
                                                 int* srcSizePtr, int targetDstSize);

+LZ4LIB_API int LZ4_DictHCSize(LZ4_streamHC_t* LZ4_streamHCPtr, int dictSize);


same comment regarding new function symbol name

Cyan4973 · 2022-06-09T16:46:21Z

lib/lz4frame.h

@@ -160,6 +160,11 @@ typedef enum {
    LZ4F_OBSOLETE_ENUM(skippableFrame)
 } LZ4F_frameType_t;

+typedef enum {


Is this enum used / manipulated by the user ?
If not, it doesn't need to be part of the public API,
and can remain private inside lz4frame.c.

Moved it into lz4frame.c as it's not supposed to by used by the user.

This commit fixes the review findings Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

Cyan4973 · 2022-06-10T23:22:48Z

lib/lz4.h

@@ -509,6 +509,17 @@ LZ4LIB_STATIC_API int LZ4_compress_fast_extState_fastReset (void* state, const c
 */
 LZ4LIB_STATIC_API void LZ4_attach_dictionary(LZ4_stream_t* workingStream, const LZ4_stream_t* dictionaryStream);

+/*! LZ4_getDictSize():


Thanks for adding this documentation. It makes the intention clearer.
Yet, I had a hard time connecting the function's name with its intended objective:

This can be used for adding data without compression to the LZ4 archive.
If linked blocked mode is used the memory of the dictionary is kept free.

I suspect that's because the documentation blends multiple layers of responsibilities in this paragraph.

At this place in the API, LZ4_getDictSize() seems to be just about knowing the current dictionary size of the active LZ4_stream_t* state. And it's likely implied that this is not to be used in a concurrent access scenario.

That this function is then employed in the context of LZ4Frame for a specific mode adding uncompressed data can be interesting information, but it does not define what this function is doing. The size information it provides could be employed for any other usage, so it matters that it's cleanly defined.

This leads me to a few simple questions :

what is @dictSize argument for ?
All it does is cap the reported dictSize, without changing anything to underlying situation ?
If the point of this function is to return the dictionary size, maybe it should do just that ?
And if there is a reason to cap the value at the calling site, maybe this should be done at the calling site ?

Getter generally to not mutate the state they are looking into. Assuming this is the case here too, the state could be const LZ4_stream_t* instead, which makes it clear that this function has no side effect.

This code just has been moved out of the LZ4_saveDict function (where it's also used now).

I passed @dictSize to be consistent with the previous implementation. I'm fine with removing the dict size parameter and capping it to 64 KB locally in this function. But than we would either have to change the signature of LZ4_saveDict to remove the dictSize parameter or restore the old code of LZ4_saveDict to calculate the dict size there, which would lead to duplicated code.

As for you second point: I changed the parameter as well as dict const to make clear that these are not modified.

As I wrote in the other thread already all this only has been added so do not modify the dictionary when adding uncompressed data.
If you think modifying the dictionary is okay when adding uncompressed data, I'd remove the if from https://github.com/lz4/lz4/pull/1094/files#diff-16e71ed5519d7ce479c3a3c3158b3e5b121fd300b78497bb477a6695b6d08b50R969 and restore the old way the dict was calculated here.

In case we keep the get_dictSize function I'll update the documentation again to again make it a bit clearer what the intent of this function is.

Cyan4973 · 2022-06-10T23:26:30Z

lib/lz4frame.c

-            int const realDictSize = LZ4F_localSaveDict(cctxPtr);
-            assert(0 <= realDictSize && realDictSize <= 64 KB);
-            cctxPtr->tmpIn = cctxPtr->tmpBuff + realDictSize;
+          /* only keep the space of the dictionary, so dict data is kept for the next compressedUpdate


This portion of the code confuses me.
What is the objective ?

The idea was not to write the dictionary if an uncompressed block is written. That's why I've added the get dict size functions. They are used to keep the space of the dictionary free without putting any new data in.
The alternative would be to remove this and always update the dictionary even if we are writing an uncompressed block.
It would make the dictionary a bit worse but probably simplify this.

I'm concerned that this might not be conformant to the frame specification (though I'm unsure if I do understand the details).

Let's quickly state that independent blocks are unaffected, this part is clear.

For linked blocks though, it's specified that the each block uses previous block(s) as a dictionary.

If this flag is set to “0”, each block depends on previous ones (up to LZ4 window size, which is 64 KB). In such case, it’s necessary to decode all blocks in sequence.

Note that each block depends on previous ones, not on previous compressed blocks. This means that, if a block is uncompressed, it's still part of the dictionary for the following block.

I'm not sure how this plays out here.

That's why we keep the dictionary but without modifications. The uncompressed block still contains the dictionary but it's not updated with new data.

lib/lz4frame.c#975

realDictSize = LZ4F_localDictSize(cctxPtr); } assert(0 <= realDictSize && realDictSize <= 64 KB); cctxPtr->tmpIn = cctxPtr->tmpBuff + realDictSize;

as real dict size is now set to the last size of the dictionary cctxPtr->tmpIn starts behind the dictionary data and the memory of the dict is not modified. When the block is written is still contains the data

I probably should update the fuzzing test to make sure it's working dependent and independent blocks.

Fuzzing test is updated to include both blocked modes.

add a fuzzing test for uncompressed frame api Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

change the context to const to make clear that the context is not modified

add static dependency to examples

fuzzing test now tests linked and independent blocks Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr · 2022-06-24T08:21:34Z

Hi @Cyan4973 do you have further change requests?

Cyan4973 · 2022-06-27T17:32:15Z

Hi @Cyan4973 do you have further change requests?

Hi @alexmohr ,
there is nothing needed to add on this PR.
You did a good job, it's a well done PR, it comes with good comments and good tests.

I'm just a bit uneasy about what happens to the dictionary after inserting an uncompressed block in the frame.
Nothing obvious, it's just that I don't fully understand everything.
Also this scenario is apparently tested in the fuzzer, so it should have caught something if there was any obvious flaw.

Essentially I just need to find some time to properly validate this PR.

Cyan4973 · 2022-07-01T17:27:57Z

I made a few tests with the new LZ4F_uncompressedUpdate() method in this PR,
unfortunately they all fail,
resulting in various errors on both the compression or decompression sides.

It's unclear if I did something wrong or if there is a pb with the new entry point.
I need to spend more time to understand the errors, and why they were not found by existing tests.

Cyan4973 · 2022-07-01T17:42:02Z

When trying to compile the ossfuzz test provided in this PR, I'm getting :

round_trip_frame_uncompressed_fuzzer.c:81: undefined reference to `LZ4F_uncompressedUpdate'

Probably some Include path issue.

alexmohr · 2022-07-01T20:47:01Z

I made a few tests with the new LZ4F_uncompressedUpdate() method in this PR, unfortunately they all fail, resulting in various errors on both the compression or decompression sides.

It's unclear if I did something wrong or if there is a pb with the new entry point. I need to spend more time to understand the errors, and why they were not found by existing tests.

Can you share the tests? I suspect there is still an issue with my implementation

alexmohr · 2022-07-01T21:02:14Z

When trying to compile the ossfuzz test provided in this PR, I'm getting :
round_trip_frame_uncompressed_fuzzer.c:81: undefined reference to `LZ4F_uncompressedUpdate'
Probably some Include path issue.

I'm probably doing something different here bc compiling works just fine for me

[22:58:46] # mohalex @ bob in /tmp 
$ git clone https://github.com/alexmohr/lz4 -b add-uncompressed-api 
Cloning into 'lz4'...
remote: Enumerating objects: 13123, done.
remote: Counting objects: 100% (72/72), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 13123 (delta 40), reused 67 (delta 40), pack-reused 13051
Receiving objects: 100% (13123/13123), 5.92 MiB | 3.63 MiB/s, done.
Resolving deltas: 100% (9122/9122), done.

[22:58:51] # mohalex @ bob in /tmp 
$ cd lz4 

[22:58:53] # mohalex @ bob in /tmp/lz4 on git:add-uncompressed-api o 
$ make -C ossfuzz
make: Entering directory '/tmp/lz4/ossfuzz'
make -C ../lib CFLAGS=" -g -DLZ4_DEBUG=1 " liblz4.a
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION compress_fuzzer.c -o compress_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION lz4_helpers.c -o lz4_helpers.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION fuzz_data_producer.c -o fuzz_data_producer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION standaloneengine.c -o standaloneengine.o
make[1]: Entering directory '/tmp/lz4/lib'
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION decompress_fuzzer.c -o decompress_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION round_trip_fuzzer.c -o round_trip_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION round_trip_stream_fuzzer.c -o round_trip_stream_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION compress_hc_fuzzer.c -o compress_hc_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION round_trip_hc_fuzzer.c -o round_trip_hc_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION compress_frame_fuzzer.c -o compress_frame_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION round_trip_frame_fuzzer.c -o round_trip_frame_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION round_trip_frame_uncompressed_fuzzer.c -o round_trip_frame_uncompressed_fuzzer.o
cc -c  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION decompress_frame_fuzzer.c -o decompress_frame_fuzzer.o
compiling static library
make[1]: Leaving directory '/tmp/lz4/lib'
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   compress_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o compress_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   decompress_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o decompress_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   round_trip_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o round_trip_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   round_trip_stream_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o round_trip_stream_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   compress_hc_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o compress_hc_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   round_trip_hc_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o round_trip_hc_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   compress_frame_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o compress_frame_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   round_trip_frame_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o round_trip_frame_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   round_trip_frame_uncompressed_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o round_trip_frame_uncompressed_fuzzer
g++  -g -DLZ4_DEBUG=1   -I../lib -DXXH_NAMESPACE=LZ4_ -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION   decompress_frame_fuzzer.o lz4_helpers.o fuzz_data_producer.o ../lib/liblz4.a standaloneengine.o -o decompress_frame_fuzzer
rm compress_frame_fuzzer.o decompress_frame_fuzzer.o decompress_fuzzer.o round_trip_hc_fuzzer.o compress_fuzzer.o round_trip_frame_uncompressed_fuzzer.o standaloneengine.o round_trip_stream_fuzzer.o fuzz_data_producer.o round_trip_frame_fuzzer.o round_trip_fuzzer.o compress_hc_fuzzer.o lz4_helpers.o
make: Leaving directory '/tmp/lz4/ossfuzz'

If you can post the commands you're using I'll try to find out whats different.

Cyan4973 · 2022-07-01T21:42:38Z

I think I narrowed down the issue to situations where the fuzzer generates a lot of data to pass via the new LZ4F_uncompressedUpdate method.

Cyan4973 · 2022-07-01T21:46:36Z

OK, so this happens specifically when the amount of data to pass via the new LZ4F_uncompressedUpdate is larger than a block size.

Cyan4973 · 2022-07-01T21:48:48Z

Can you share the tests? I suspect there is still an issue with my implementation

I'm using a modified variant of frametest, which is a sort of "poor man's fuzzer" implementation.
In this updated variant, uncompressed blocks are randomly added to the frame using the new LZ4F_uncompressedUpdate entry point.

I could create a feature branch to publish the modified test if you want.

Cyan4973 · 2022-07-01T21:52:25Z

If you can post the commands you're using I'll try to find out whats different.

I was doing something equivalent on my side when link stage failed.
Just to be on the safe side, I decided to copy/paste your proposed list of commands exactly, and it worked.
It seems to show that the build recipe is correct. I guess the issue is on my system.

Anyway, I'm not using this tool for testings currently, but rather a modified variant of frametest, so it did not block me.

Cyan4973 · 2022-07-01T21:54:04Z

Modified variant of frametest posted in feature branch pr1094_frametest

Cyan4973 · 2022-07-02T00:36:19Z

I also now realize that we have been jumping into the implementation details without even talking about the use case.

The initial message mentions #814 as a reason to propose this PR,
but #814 is actually very different (a niche scenario, with unspecified out-of-band capabilities, and focused on the decompression side) that this PR doesn't answer, not even partially.

So the question is :
In which scenario is it desirable to send raw uncompressed blocks inside an LZ4 Frame ?

Asking as:

we may possibly have existing ways to provide a solution for the target scenario.
any added code is more maintenance and more attack vectors to protect against. So it should be justified by a reasonable scenario to serve.

alexmohr · 2022-07-04T05:44:06Z

I also now realize that we have been jumping into the implementation details without even talking about the use case.

The initial message mentions #814 as a reason to propose this PR, but #814 is actually very different (a niche scenario, with unspecified out-of-band capabilities, and focused on the decompression side) that this PR doesn't answer, not even partially.

So the question is : In which scenario is it desirable to send raw uncompressed blocks inside an LZ4 Frame ?

Asking as:
* we may possibly have existing ways to provide a solution for the target scenario.

* any added code is more maintenance and more attack vectors to protect against. So it should be justified by a reasonable scenario to serve.

Regarding #814 I was refering to this part of the description:

Alternatively, the user could prepend a fake LZ4F block header to the uncompressed data, and pass that to the normal decompression function. This works with the current LZ4 version.

The use case for me is that we're streaming a lz4 compressed tar archive from memory to disk. Tar does not support streaming out of the box, so we have to patch the header by setting the correct file size as soon as we're done with streaming a file.
As our output is lz4 compressed we have to write the header uncompressed so we can seek back in the file and correct the data on disk.

when the block mode changes a flush is executed, to prevent mixing compressed and uncompressed data. Prior to this commit dstStart, dstPtr, dstCapacity where not updated to include the offset from bytesWritten. For inputs > blockSize this meant the flushed data was overwritten. Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr · 2022-07-04T06:57:14Z

Modified variant of frametest posted in feature branch pr1094_frametest

Thanks, I found the issue using your test and pushed a new commit. Should I cherry-pick your commit on pr1094_frametest into this PR?

Cyan4973 · 2022-07-04T20:12:07Z

The use case for me is that we're streaming a lz4 compressed tar archive from memory to disk. Tar does not support streaming out of the box, so we have to patch the header by setting the correct file size as soon as we're done with streaming a file.
As our output is lz4 compressed we have to write the header uncompressed so we can seek back in the file and correct the data on disk.

OK, thanks for the explanation, that's an important starting point.

It looks to me that your use case doesn't only need to send some data uncompressed. In order to modify this data later,
it also needs this segment to be excluded from history, so that no future block could be based on past data that will be modified afterwards, resulting in corruption.
Such condition is automatically valid when using Block Independence mode, but is more troublesome when blocks are linked: now history must be actively messed up with.

While I understand the use case, there is a balance to find between serving it, and making the general library more complex to maintain and understand for everybody. Sometimes, for very niche use cases, it's acceptable to create a fork to serve it, and keep the "general" library free.

Here are a few proposals that could be employed to serve this use case :

The LZ4 Frame format is designed to deal with multiple concatenated frames, and deal with them as if they were a single content. Therefore, one approach could be to generate a "fake" frame with uncompressed content, for the tar header, followed by a normal frame for the file's content. This approach can be employed multiple times, the concatenation of all these frames would still be decompressed as if it was a single content.
If, for some reason, the receiving system is unable to deal with multiple frames, an intermediate idea would be to allow the creation of uncompressed data blocks, but only if the block mode is set to Independent. This way, it naturally solves the issue of making the content of the uncompressed block "disappear" from history, with no extra complexity.

I find the second idea attractive because it's likely going to reduce complexity significantly, and if becomes "simple enough", then there is less "weight" supporting it into the general library. I also suspect independent blocks is what you had in mind to begin with, so making it a pre-requisite to use this capability is not going to hurt your use case.

Regarding #814 I was refering to this part of the description:

Alternatively, the user could prepend a fake LZ4F block header to the uncompressed data, and pass that to the normal decompression function. This works with the current LZ4 version.

This is actually a very different scenario. Here, data is presumed sent "out of band", in order to remove any kind of additional byte, not even a small header. And then it's re-inserted into LZ4F history on the decompression side.
That's a very niche use case, and it's unclear if it's the right move to have dedicated code to support it directly within the general liblz4 library, or if a fork would be more appropriate for that. The proposed solution doesn't need direct contribution from the library, and is likely what is being used currently, since it would work "as is".

Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr · 2022-07-05T11:14:16Z

If, for some reason, the receiving system is unable to deal with multiple frames, an intermediate idea would be to allow the creation of uncompressed data blocks, but only if the block mode is set to Independent. This way, it naturally solves the issue of making the content of the uncompressed block "disappear" from history, with no extra complexity.

I quite like the idea of making this only available for independent blocks. This means we can remove the special dictionary handling which you commented on. It makes everything much simpler and works for my use case just fine. I changed the MR accordingly and ran your modified frame test again and everything seems to be in working order

Cyan4973 · 2022-07-05T17:35:29Z

lib/lz4hc.h

@@ -405,6 +405,18 @@ LZ4LIB_STATIC_API void LZ4_attach_HC_dictionary(
          LZ4_streamHC_t *working_stream,
    const LZ4_streamHC_t *dictionary_stream);

+/*! LZ4_getDictHCSize():


I presume this entry point is not needed anymore

Sorry I forgot to remove that.

Cyan4973 · 2022-07-05T17:36:42Z

lib/lz4frame.c

+                               void* dstBuffer, size_t dstCapacity,
+                         const void* srcBuffer, size_t srcSize,
+                         const LZ4F_compressOptions_t* compressOptionsPtr) {
+    assert(cctxPtr->prefs.frameInfo.blockMode == LZ4F_blockIndependent);


Here, I would prefer an actual test, followed by an error if the condition is not respected.
Wrong block mode is an easy mistake to make.

replaced with RETURN_ERROR_IF(cctxPtr->prefs.frameInfo.blockMode != LZ4F_blockIndependent, blockMode_invalid);

Cyan4973 · 2022-07-05T17:39:32Z

contrib/meson/meson/examples/meson.build

@@ -26,7 +26,7 @@ foreach e, src : examples
  executable(
    e,
    lz4_source_root / 'examples' / src,
-    dependencies: liblz4_dep,
+    dependencies: [liblz4_dep, liblz4_internal_dep],


What is liblz4_internal_dep ?

I removed liblz4_dep as liblz4_internal_dep is necessary for static linkage of lz4. Having both is redundant.
It's defined here as static_library(...) for example contrib/meson/meson/programs/meson.build is using the static linkage as well.

Cyan4973 · 2022-07-05T19:13:31Z

contrib/meson/meson/lib/meson.build

@@ -43,7 +43,7 @@ liblz4_dep = declare_dependency(
  include_directories: include_directories(lz4_source_root / 'lib')
 )

-if get_option('tests') or get_option('programs')
+if get_option('tests') or get_option('programs') or get_option('programs')


get_option('programs') seems repeated twice

I commited it too early. I didn't think you're going to review it right away but I guess you got a few mails :/

* replace assert with test for LZ4F_uncompressedUpdate * update documentation to incldue correct docstring * remove unecessary entry point * remove compress_linked_block_mode from fuzzing test Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch 2 times, most recently from 9e24dac to c2e0230 Compare June 9, 2022 14:18

frame-api: add method to insert uncomressed data

4aeb502

new method `uncompressed_update` allows to insert blocks without compression into the lz4 stream. The usage is documented in the frameCompress example Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch from c2e0230 to 4aeb502 Compare June 9, 2022 15:09

Cyan4973 reviewed Jun 9, 2022

View reviewed changes

alexmohr added 2 commits June 10, 2022 06:00

review: Fix review findings

62f6cef

This commit fixes the review findings Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

review: Fix review findings

5c73827

This commit fixes the review findings Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch from 70c8abb to d0a67df Compare June 10, 2022 22:59

Cyan4973 reviewed Jun 10, 2022

View reviewed changes

alexmohr and others added 2 commits June 11, 2022 22:47

fuzz-test: add fuzz test for uncompressed api

1738b50

add a fuzzing test for uncompressed frame api Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

lz4frame: fix different linkage error

3c57d2f

Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch from d0a67df to 3c57d2f Compare June 11, 2022 21:07

alexmohr changed the title ~~frame-api: add method to insert uncomressed data~~ frame-api: add function to insert uncomressed data Jun 11, 2022

dict-size: make lz4 context const

9a42a9d

change the context to const to make clear that the context is not modified

alexmohr force-pushed the add-uncompressed-api branch from c3b5ef7 to 9a42a9d Compare June 11, 2022 21:58

alexmohr and others added 2 commits June 12, 2022 00:41

meson: fix meson build

af447b2

add static dependency to examples

ossfuzz: extend fuzzing test to include linked blocks

5065080

fuzzing test now tests linked and independent blocks Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

uncompressed-api: allow uncompressed_update only for independent blocks

42eb47d

Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch from 6610426 to 42eb47d Compare July 5, 2022 09:56

Cyan4973 reviewed Jul 5, 2022

View reviewed changes

alexmohr force-pushed the add-uncompressed-api branch 3 times, most recently from ab8c4ee to 25feb4f Compare July 5, 2022 19:12

Cyan4973 reviewed Jul 5, 2022

View reviewed changes

review: fix findings

0ac3c74

* replace assert with test for LZ4F_uncompressedUpdate * update documentation to incldue correct docstring * remove unecessary entry point * remove compress_linked_block_mode from fuzzing test Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>

alexmohr force-pushed the add-uncompressed-api branch from 25feb4f to 0ac3c74 Compare July 5, 2022 19:14

Cyan4973 approved these changes Jul 5, 2022

View reviewed changes

Cyan4973 merged commit 4da5c4d into lz4:dev Jul 5, 2022

Cyan4973 mentioned this pull request Jul 5, 2022

Add a fuzzer test for LZ4F_uncompressedUpdate() within frametest #1099

Merged

frame-api: add function to insert uncomressed data #1094

frame-api: add function to insert uncomressed data #1094

Conversation

alexmohr commented Jun 9, 2022 • edited

t-mat commented Jun 9, 2022

alexmohr commented Jun 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cyan4973 Jun 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cyan4973 Jun 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexmohr Jun 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexmohr commented Jun 24, 2022

Cyan4973 commented Jun 27, 2022

Cyan4973 commented Jul 1, 2022

Cyan4973 commented Jul 1, 2022

alexmohr commented Jul 1, 2022

alexmohr commented Jul 1, 2022

Cyan4973 commented Jul 1, 2022

Cyan4973 commented Jul 1, 2022

Cyan4973 commented Jul 1, 2022 • edited

Cyan4973 commented Jul 1, 2022

Cyan4973 commented Jul 1, 2022

Cyan4973 commented Jul 2, 2022

alexmohr commented Jul 4, 2022

alexmohr commented Jul 4, 2022

Cyan4973 commented Jul 4, 2022 • edited

alexmohr commented Jul 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexmohr commented Jun 9, 2022 •

edited

Cyan4973 Jun 9, 2022 •

edited

Cyan4973 Jun 10, 2022 •

edited

alexmohr Jun 11, 2022 •

edited

Cyan4973 commented Jul 1, 2022 •

edited

Cyan4973 commented Jul 4, 2022 •

edited