Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux-kernel] Prepare zstd for the Linux Kernel #2289

Merged
merged 10 commits into from
Sep 9, 2020

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Aug 27, 2020

  • Replace all variables named current with curr because it collides with a kernel macro.
  • Add ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics, since the headers aren't available in the kernel.
  • Fix ZSTD_initCStream_advanced() with no dictionary and static allocation
  • Reduce zstd stack usage by 1KB. Now about 800 bytes for decompression and 2KB for compression. I test that it is < 2.5 KB, so we don't regress.
  • Add support to freestanding.py to hardwire preprocessor macros.
  • Add the freestanding linux-kernel translation script. Add a test that runs the script and then builds and tests the generated library. Run that test in CI.

TODO:
Document the newly added build macros to lib/README.md

@terrelln
Copy link
Contributor Author

The problem was just that decodecorpus wasn't checking the return code of FSE_buildCTable_wksp(), which errored because the workspace was too small. It then continued to use the uninitialized table causing untold mayhem.

@terrelln terrelln force-pushed the zstd-kernel-2 branch 4 times, most recently from 17b85e8 to a57fb69 Compare August 28, 2020 00:25
@terrelln
Copy link
Contributor Author

The PR is now complete, except any CI problems or problems that show up in further testing.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 9, 2020

regarding freestanding.py :
this file seems highly coupled to the source code, and makes a few assumptions about #ifdef construction.
How reliable is it, in the case of a source change ?
How is it tested ?

@terrelln
Copy link
Contributor Author

terrelln commented Sep 9, 2020

regarding freestanding.py :
this file seems highly coupled to the source code, and makes a few assumptions about #ifdef construction.
How reliable is it, in the case of a source change ?

Yeah, the ifdef parsing only handles simple cases. If there are extra parenthesizes for example, it doesn't handle it. But, it will only do the transform when it can prove it is correct.

I tried to use a full pre-processor engine. But, it ended up way too complex, so I went back to regexes. The regexes handle all the variants of the preprocessor that show up in the zstd codebase.

How is it tested ?

This script gets run by Github Actions, and then the generated code is tested.

I also manually ran tests like grep -r _MSC_VER and made sure it doesn't show up. I will add an automated test that checks to make sure a few of the macros that I define or undefine get removed to CI.

Additionally, the changes the script makes get printed out like a git diff showing the removed and added lines. When releasing a new kernel, one should review these diffs to make sure they all look correct. I'm going to add instructions for releasing a new kernel version to the contrib/linux-kernel/README.md once I know all the steps. An example of the output printed is in https://github.com/facebook/zstd/pull/2289/checks?check_run_id=1039278145.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 9, 2020

This PR introduces a substantial impact on the public API zstd.h,
due to the optional invocation of ZSTD_NO_MALLOC compilation directive,
which can make some symbols unavailable.
The API's scope becomes conditional.

A potential alternative could have been to keep the symbols present and published in zstd.h,
and make them return an error code whenever they are invoked with ZSTD_NO_MALLOC
(typically ZSTD_error_memory_allocation).

The advantage is that ZSTD_NO_MALLOC is now confined to the implementation side, with no impact on interface publication.
Also, the impact can be centralized quite efficiently, by substituting ZSTD_malloc() and co. by variants which return NULL and trigger a ZSTD_error_memory_allocation error code, thus dramatically reducing the nb of #ifdef ZSTD_NO_MALLOC sections to maintain. Also, in the context of future code evolution, the creation or update of new functions which relying on the availability of a malloc() function would be automatically taken care of, thus avoiding to rely on code review to properly setup ZSTD_NO_MALLOC exceptions.

On the downside, absence of a symbol can provide a compile-time signal in a way that a runtime error code cannot.

While I acknowledge that none of these solutions looks perfect,
I'm concerned about the risk of making the public API more difficult to read and the code more complex to update
for the benefit of an important yet relatively niche scenario.

@terrelln
Copy link
Contributor Author

terrelln commented Sep 9, 2020

Let's just drop that commit then.

The goal is exclusively on the API side. I didn't want to have functions that simply don't work in the kernel's API. But, if you think that the cost for all other uses is too high, then let's skip it. The kernel implementation of ZSTD_malloc() and friends always return NULL.

@terrelln
Copy link
Contributor Author

terrelln commented Sep 9, 2020

That commit is dropped, there are no longer any changes to lib/zstd.h.

@terrelln terrelln merged commit 5118e30 into facebook:dev Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants