Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gzip in tidy instead of bzip2 for compatibility #45

Merged
merged 1 commit into from
Oct 18, 2017
Merged

Use gzip in tidy instead of bzip2 for compatibility #45

merged 1 commit into from
Oct 18, 2017

Conversation

danielparks
Copy link
Contributor

bzip2 isn't installed by default on CentOS 7. (!)

This also removes the error suppresion on tar, which improves debuggability.

Also, "!" is a shell special character. This switches to "-not", though it probably doesn't matter since it's non interactive.

@npwalker
Copy link
Owner

Adam made the script to start with and I think he used bzip due to increased compression over gzip. But hopefully he'll chime in.

@abottchen
Copy link

Thanks for pointing this out. I did not know there were any cases where bzip2 was not available.

Nick is correct, I used bzip2 due to its improved compression. It looks as though only the CentOS 7 minimal install does not include it. I do not know how common it is for a customer to both use the minimal installation option and never install bzip2 on their own.

So the question becomes, which is more important. A 5% improvement in compression ratios on all systems, or covering for the (potentially) rare case where bzip2 is not installed.

I think, ideally, we would put a conditional in there checking for bzip2. Use it if present and fall back to gzip if not. With that method, we could even use xz and gain another 3% compression.

@danielparks
Copy link
Contributor Author

We have a pretty standard CentOS installation, so I think the case where bzip2 isn't installed must be fairly common.

I have a hard time seeing the extra complexity being worth a 5 percentage point improvement in compression, even ignoring the CPU hit (generally 5× or more). Especially since the compressed output is less than 1 MB per day on our system.

@danielparks
Copy link
Contributor Author

Also, as a sysadmin, I very rarely use bzip2 any more. The ops team doesn't install it on any of our servers, and I stopped installing it on my personal servers a long time ago.

My network connections are fast enough and my disks big enough that it's just noise. gzip is better supported and it's always there.

@npwalker
Copy link
Owner

@abottchen any further thoughts?

@abottchen
Copy link

Even if a customer has 1 GB of data, we are only talking about a savings of 50MB with bzip2. There isn't enough gain to justify the risk regardless of the percentage of affected systems. I agree with Daniel, let's fall back to gzip.

xargs -a "$DIR.tmp" tar -jcf "$DIR/$METRICS_TYPE-$(date +%Y.%m.%d.%H.%M.%S).tar.bz2" 2>/dev/null;
xargs -a "$DIR.tmp" rm ;
rm "$DIR.tmp";
find . -type f -not -name "*.gz" > "$DIR.tmp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this, but I suspect this line will need something along the lines of -o -name "*.bz2". Otherwise customers that have this already in place with bzip2 will end up compressing their existing bz2 files into their new gzip files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'm on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Turns out it has to be -a -not

@npwalker
Copy link
Owner

@jarretlavallee @abottchen I'm about to merge this. Can someone update the metrics viewer to deal with unzipping the gzips instead of bzips?

xargs -a "$DIR.tmp" tar -jcf "$DIR/$METRICS_TYPE-$(date +%Y.%m.%d.%H.%M.%S).tar.bz2" 2>/dev/null;
xargs -a "$DIR.tmp" rm ;
rm "$DIR.tmp";
find manifests -type f -not -name "*.gz" -a -not -name "*.bz2" > "$DIR.tmp"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did this find get changed from . to manifests? @danielparks

In my testing it still needs to be .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's bizarre. There's nothing called manifests. Must have changed it accidentally somehow? I did test this…

@abottchen
Copy link

@npwalker puppetlabs/puppet-metrics-viewer#19 should take care of it

abottchen added a commit to abottchen/puppet-metrics-viewer that referenced this pull request Oct 18, 2017
abottchen added a commit to abottchen/puppet-metrics-viewer that referenced this pull request Oct 18, 2017
bzip2 isn't installed by default on CentOS 7. (!)

This also removes the error suppresion on tar, which improves
debuggability.

Finally, "!" is a shell special character. This switches to "-not",
though it's academic here since it's non interactive.
@danielparks
Copy link
Contributor Author

Fixed the weird manifests thing. It was correct in the branch I'm using in production. I think I must have been testing locally in whatever directory I was already in.

@npwalker
Copy link
Owner

No worries, that's why we test things :)

@npwalker npwalker merged commit 71475b9 into npwalker:master Oct 18, 2017
@danielparks danielparks deleted the tar-fix branch October 19, 2017 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants