-
Notifications
You must be signed in to change notification settings - Fork 1
Add script with steps to normalize GCS archives #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)
adhoc/normalize_gcs_archive.sh, line 15 at r1 (raw file):
# * rsync again to confirm all data is copied (faster) # * remove data from old locations (slower)
set -e, to avoid delete section if sync section fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)
adhoc/normalize_gcs_archive.sh, line 15 at r1 (raw file):
Previously, gfr10598 (Gregory Russell) wrote…
set -e, to avoid delete section if sync section fails.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)
adhoc/normalize_gcs_archive.sh, line 38 at r1 (raw file):
# Switch for year in 2016 2017 2018 2019 ; do
Just checking - doesn't need 2020?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be super sure, should we do a gsutil du -s ... and compare the results before moving to the REMOVE section?
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)
adhoc/normalize_gcs_archive.sh, line 38 at r1 (raw file):
Previously, gfr10598 (Gregory Russell) wrote…
Just checking - doesn't need 2020?
Correct. We migrated 100% of nodes to the new platform in Nov 2019. In 2020 there are no legacy files for any datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after one more suggestion, at your discretion.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than du
how about listing all files in the src then verifying that 100% of them are found in the dst after the rsync. If that's true, then proceed to the remove section?
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented a simple function to run a "safe_rsync" that:
- lists files in src
- runs rsync
- lists files in dst
- verifies that src files are found in dst.
That may take a little while but it should be safe.
PTAL?
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @gfr10598)
adhoc/normalize_gcs_archive.sh, line 43 at r2 (raw file):
} function safe_rsync() {
Very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @gfr10598)
adhoc/normalize_gcs_archive.sh, line 70 at r3 (raw file):
# Sidestream web100 for year in 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ; do safe_rsync gs://${archive}/sidestream/${year}/ gs://${archive}/host/sidestream/${year}/
The last change after feedback from @critzo & @yachang was to change vserver -> host for sidestream & paris-traceroute.
This change adds an adhoc script that will be run twice.
<exp>/<datatype>/YYYY/...
names.These steps can optionally be automated using cloud build; let me know if you'd prefer that.
FYI: @pboothe @yachang @critzo
Part of https://github.com/m-lab/dev-tracker/issues/548
This change is