Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the size of files on disk #61

Merged
merged 4 commits into from
Jul 15, 2020

Conversation

jarretlavallee
Copy link
Contributor

This PR reduces the file size on disk by doing the following things:

  • Fixes a bug where the tarballs files would contain 90 days worth of files instead of 1
  • Ensure the system metrics can be tidied up
  • Stop pretty-printing the system metrics.

Jarret Lavallee added 3 commits June 29, 2020 16:33
Prior to this commit, the system metrics were pretty printed. This leads
to additional consumed disk space. The json2timeseries STDIN expects
that all data incoming is a single line. This commit changes the system
metrics to log minified JSON.
Prior to this commit, the system metrics tidy was not functional. It
would error out do to restrictions in the `metrics_tidy` script. This
commit adds the system directories to the tidy script.
Prior to this commit, the JSON files were being backed up but only
cleaned up every 90. This resulted in the same files being duplicated in
each tarball for 90 days. This commit cleans up the json files to ensure
that the tarballs only have a single days worth of data.
m0dular
m0dular previously approved these changes Jun 30, 2020
@@ -55,3 +55,6 @@ find "$metrics_directory" -type f -ctime +"$retention_days" -delete
# The return code of a pipeline is the rightmost command, which means we trigger our trap if tar fails
find "$metrics_directory" -type f -name "*json" | \
tar --create --gzip --file "${metrics_directory}/${metrics_type}-$(date +%Y.%m.%d.%H.%M.%S).tar.gz" --files-from -

# Cleanup the backed up json files so that we do not duplicate files in the tarballs.
find "$metrics_directory" -type f -name "*json" -delete
Copy link
Member

@Sharpie Sharpie Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced running a separate find is the best way to do this since it can result in files being deleted before they are archived depending on how the cron jobs line up.

The old way of running find, saving the results to a file and then having tar and rm both operate on the list in that file seemed to work well. Is there a reason we can't return to using that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, my concerns about xargs are probably unfounded in this case. Pushed a commit to this branch.

Revert to the behavior of storing the output of `find` in a temp file.
This guarantees `tar` and `rm` operate on the same set of files to
prevent data loss.
Copy link
Member

@Sharpie Sharpie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@Sharpie Sharpie merged commit 2386c44 into puppetlabs:master Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants