Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add zstd compression #134

Closed
ivmaks opened this issue May 19, 2018 · 7 comments · Fixed by #336 or #444
Closed

add zstd compression #134

ivmaks opened this issue May 19, 2018 · 7 comments · Fixed by #336 or #444

Comments

@ivmaks
Copy link

ivmaks commented May 19, 2018

https://github.com/facebook/zstd

@caffeinejolt
Copy link

this is a big feature since for really large tables with big text fields - compression speed is the bottleneck - zstd would be super helpful with this

or better yet might be an option that allows users to choose their own program to pipe SQL through to perform whatever alterations they choose - which could include zstd

@peterdk
Copy link

peterdk commented Feb 2, 2020

It's quite easy to implement actually. I have it working with zstd 1.4.4 by adding the zstd zlib wrapper. You only have to replace the include for zlib with the wrapper, and include the wrapper files and some headers. Only thing needed then is to replace all .gz occurences with .zst in the code.

I am now testing it with a 1.9TB database and zstd compression level 11, to see if it works properly. However, I am in no way a really experienced C dev or anything related to mysql. So I will upload it to a repo later, for those who want to build it themselves. Since I don't change anything in the actual code of mydumper I expect it to just work, but I cannot guarantee it in any way.

Update
zstd multithreaded compression now also added. Now dumping the one big table using 4 cores for zstd instead of 1.

@twotwotwo
Copy link

@peterdk I'd love to see what you did to make the zlib zstd wrapper work, even in an unpolished state. Seem to be gzip-bound dumping TBs, and it didn't end up trivial for me use the the wrapper, probably because of my unfamiliarity with cmake.

@peterdk
Copy link

peterdk commented Aug 15, 2020 via email

@twotwotwo
Copy link

twotwotwo commented Aug 15, 2020

OK, I managed to do this, but not in the best way or even close. I just worked around cmake because I don't know it and it seems like a lot to learn.

I needed to link against libzstd.a from a local zstd build, not libzstd.so, because the zlib wrapper wants to use ZSTD_free and ZSTD_malloc but they don't seem to be exported by the .so. I may've done something silly, but in case I didn't, I opened an issue on zstd to potentially clarify the instructions if others need to do that too.

In case it helps anybody, the steps were:

  • Building libzstd. Again, system libzstd.so won't do.
  • Replacing #include <zlib.h> with #include "zstd_zlibWrapper.h"
  • Copying gz*.{c,h} and zstd_zlibwrapper.{c,h} over from zstd/zlibWrapper
  • perl -pi -e "s/\.gz/.zst/g" my{dump,load}er.c
  • Turning the output of VERBOSE=1 make into a single build command
    • Adding -DZWRAP_USE_ZSTD=1
    • Adding gz*.c and zstd_zlibwrapper.c
    • Adding libzstd.a from the zstd build
    • Adding lib/common from the checkout of zstd to the include path

So I ended up building mydumper with: /usr/bin/clang -O3 -g -DZWRAP_USE_ZSTD=1 -I/usr/include/{mysql/,glib-2.0/} -I../zstd/lib/common/ -I/usr/lib64/glib-2.0/include -m64 mydumper.c server_detect.c gz*.c ./zstd_zlibwrapper.c ../zstd/lib/libzstd.a getPassword.c connection.c /usr/lib64/mysql/libmysqlclient.so -lpthread -lz -lm -lrt -lssl -lcrypto -ldl -lglib-2.0 -lgthread-2.0 -lpcre -lz -lstdc++ -lm -lrt -lssl -lcrypto -ldl -lglib-2.0 -lgthread-2.0 -lpcre -lstdc++ -o mydumper

Just substituting in 'loader' gets you myloader.

(Edited to add: slightly easier to make MOREFLAGS=-DZWRAP_USE_ZSTD=1 in zlibWrapper dir then link zstd/zlibWrapper/*.o. Then you don't need to add the header files or all the .c files to your mydumper build.)

@davidducos davidducos added this to the Release 0.10.7 milestone Mar 31, 2021
@twotwotwo
Copy link

(FYI, my comments about zlibwrapper requiring you to link in libzstd.a are outdated -- since facebook/zstd#2273 was fixed zlibwrapper uses only public libzstd symbols.)

@davidducos davidducos linked a pull request May 7, 2021 that will close this issue
2 tasks
@davidducos
Copy link
Member

As you can see in #336, I had implemented zstd compression. I'm still working on that, but it will be hopefully released in v0.11.3, but we have to decide what is the best path to complete the implementation of this feature. I'm open to hear any comments, ideas, suggestion, anything! meanwhile, this is the plan that I thought:

  • Merge Adding support for ZSTD #336 in v0.11.3 where user can compile their own version using WITH_ZSTD to enable zstd, as default behavior is OFF.
  • We are going to release the binary without zstd, as we are considering experimental.
  • When we consider that it is working properly, we are going to release both binaries supporting zlib and zstd.
  • In the future, on what I think is going to be v0.12.1, we are going to set zstd as default and stop releasing zlib

So, what we are going to do, is replace zlib with zstd, as I was not able to make both live together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants