-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the size of files on disk #61
Conversation
Prior to this commit, the system metrics were pretty printed. This leads to additional consumed disk space. The json2timeseries STDIN expects that all data incoming is a single line. This commit changes the system metrics to log minified JSON.
Prior to this commit, the system metrics tidy was not functional. It would error out do to restrictions in the `metrics_tidy` script. This commit adds the system directories to the tidy script.
Prior to this commit, the JSON files were being backed up but only cleaned up every 90. This resulted in the same files being duplicated in each tarball for 90 days. This commit cleans up the json files to ensure that the tarballs only have a single days worth of data.
7876575
to
859beb0
Compare
files/metrics_tidy
Outdated
@@ -55,3 +55,6 @@ find "$metrics_directory" -type f -ctime +"$retention_days" -delete | |||
# The return code of a pipeline is the rightmost command, which means we trigger our trap if tar fails | |||
find "$metrics_directory" -type f -name "*json" | \ | |||
tar --create --gzip --file "${metrics_directory}/${metrics_type}-$(date +%Y.%m.%d.%H.%M.%S).tar.gz" --files-from - | |||
|
|||
# Cleanup the backed up json files so that we do not duplicate files in the tarballs. | |||
find "$metrics_directory" -type f -name "*json" -delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced running a separate find
is the best way to do this since it can result in files being deleted before they are archived depending on how the cron jobs line up.
The old way of running find, saving the results to a file and then having tar
and rm
both operate on the list in that file seemed to work well. Is there a reason we can't return to using that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, my concerns about xargs
are probably unfounded in this case. Pushed a commit to this branch.
Revert to the behavior of storing the output of `find` in a temp file. This guarantees `tar` and `rm` operate on the same set of files to prevent data loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
This PR reduces the file size on disk by doing the following things: