Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New (tiny) feature - print summary stats even when --quiet is enabled #1158

Closed
wants to merge 1 commit into from

Conversation

qth
Copy link

@qth qth commented May 31, 2018

The title kind of says it all - I wanted zstd to print out the compression summary upon completion even when quiet mode is enabled.

I'm not great at coding, so if this pull request is bad please consider it a feature request instead and just send my code to /dev/null. I figured it'd be a more genuine way to ask :-)

@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 31, 2018

Thanks for the suggestion @qth .

The patch is mostly fine with regards to its stated objectives.
There are a couple of minor things we can detail should we decide to move it forward.

At this stage, I'm mostly interested by the user side.

You basically need an intermediate display mode, between 2 ("progression notification" + "end summary" + "overwrite prompt") and 1 (no output, except error), which would only feature "end summary".

Your suggestion is to add a new mode, triggered through a new long command --sum.

There are a few higher level questions associated to this suggested feature.

  1. How useful is it ? Who needs it (beside you) ?

Point is, adding any new functionality is a burden for all users not using it,
in multiple little ways, be it bigger documentation, more code to debug (for maintainers), etc.
So it better be worth "something".
Considering the code added is not that large, the bar is not high.
But still, it should solve "some pain" for at least some use cases.

  1. Presuming yes, is adding a command the better way to trigger it ?

For example, could it be served by the introduction of a new intermediate display level ?

  1. Presuming a new command is the better way, which command better reflects the intention, and blend well with existing commands ?

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@qth
Copy link
Author

qth commented Jun 1, 2018

Hi @Cyan4973,

I'll try to answer your questions as best I can:

  1. I can't speak for everyone else, but my specific use case it that I've embedded zstd into a powershell script that performs pg_dumpall from a PostgreSQL database on Windows. The status update output doesn't come through well and ends up spamming the terminal because of the formatting. I thought it'd be good to just have status displayed upon completion for three reasons: [1] I'm lazy and don't want to learn powershell or why it can't display the progress updates cleanly, [2] having a summary makes it easy to tell how much data came through the pipe from pg_dumpall (size of DB that came from STDOUT to STDIN), [3] if I schedule this as an automated task, I only want to know the end result anyway. E.g., I could email the interested parties that, "Hey, today we backed up your 30GiB DB into a 7.5GiB file in about 5 minutes." The automated task doesn't need to capture the progress along the way.

So, I hope that folks would find it useful, but I'll let others judge if I'm just feature greedy and contributing to the carry cost of the software :-)

  1. I'm not sure what the best way to trigger the summary display would be, but (2.5) yes, it's essentially a display level in between the existing default and quiet modes.

  2. I'm more than happy to accept & follow advice regarding the best way to implement the feature, if it passes the value test (1) and can move ahead. Either way, thank you for considering the feature and articulating the situation clearly and cordially!

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jun 1, 2018

Regarding your use case :
what about asking the filesystem about the size of the produced compressed file ?
Is there a reason this wouldn't be convenient ?

@qth
Copy link
Author

qth commented Jun 1, 2018

Right, it's easy to get the size of the generated file / output when the process is complete, but it doesn't tell me the uncompressed size / size of the input stream. The input stream size is unknown when it comes from STDIN. So, I also just tried to do 'zstd -l' to see if that could satisfy my use case, but got an interesting result instead.

Here's the status line from the compression operation:

/*stdin*\            : 22.51%   (31724227672 => 7141369398 bytes, d:\pg_backup\20180531-psql.zst)

Here is what happened when I queried the file with 'zstd -l', and what happened when I decompressed:

PS D:\pg_backup> zstd -l .\20180531-psql.zst
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
Error: could not skip to end of block
An error occurred while getting file info
     0      0  6810.54 MB                        None  .\20180531-psql.zst
PS D:\pg_backup> zstd -v -d .\20180531-psql.zst
*** zstd command line interface 64-bits v1.3.4, by Yann Collet ***
.\20180531-psql.zst : 31724227672 bytes
PS D:\pg_backup> dir 20180531*


    Directory: D:\pg_backup


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         5/31/2018   6:18 PM 3172422767 20180531-psql
                                           2
-a---         5/31/2018   6:18 PM 7141369398 20180531-psql.zst

(I used the stock 1.3.4 build from https://github.com/facebook/zstd/releases/download/v1.3.4/zstd-v1.3.4-win64.zip for all operations)

I also can't use the PostgreSQL data directory as an indicator of uncompressed file size since it's smaller than the pg_dumpall output (data directory is only 25GiB vs. 29.5GiB)

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jun 1, 2018

Thanks, it seems your test triggered an error in the --list function,
likely related to the use of fseek() on some large file (> 4 GB).
That's certainly something we need to fix.

Anyway, your use case is slightly more complex than I expected,
since you need both uncompressed and compressed size,
and cannot probe the uncompressed size before compression.

Note that, even with a fix, -l won't give you the original uncompressed size.
Since src data is piped, the compressor cannot know its full size at the beginning,
hence cannot write it in the header.

There is another way to report the amount of data transferred into zstd : use dd :
dd < FILE | zstd > FILE.zst
dd will write its report to stderr by default. Example :

414033+0 records in
414033+0 records out
211984896 bytes transferred in 2.058185 secs (102996031 bytes/sec)

Not sure if this solution would be suitable for your use case.

If not, then indeed, I see no other way than adding a new command.

@qth
Copy link
Author

qth commented Jun 1, 2018

I wish I were doing all of this on a BSD or Linux installation, but alas, I'm stuck on Windows 2012 server for this particular project. I'll try getting a port of dd and seeing how it performs.

@rmg
Copy link

rmg commented Jul 12, 2018

This is what I was expecting to see after reading the problem description:

$ zstd --help
...
--[no-]progress : Enable progress reporting (default: enabled if TTY)
...

@Cyan4973
Copy link
Contributor

I will close this thread, since we will not merge this patch "as is".
However, I'll keep tracking the feature request, in a dedicated issue thread.

@Cyan4973 Cyan4973 mentioned this pull request Oct 12, 2018
@Cyan4973 Cyan4973 closed this Oct 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants