Skip to content

Conversation

stephen-soltesz
Copy link
Contributor

@stephen-soltesz stephen-soltesz commented Feb 13, 2020

This change adds an adhoc script that will be run twice.

  1. to perform an rsync between all legacy, parsed datatypes to their canonical <exp>/<datatype>/YYYY/... names.
  2. to do the same operation again, followed by removing the legacy names.

These steps can optionally be automated using cloud build; let me know if you'd prefer that.

FYI: @pboothe @yachang @critzo

Part of https://github.com/m-lab/dev-tracker/issues/548


This change is Reviewable

@coveralls
Copy link

coveralls commented Feb 13, 2020

Pull Request Test Coverage Report for Build 88

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 77: 0.0%
Covered Lines: 332
Relevant Lines: 332

💛 - Coveralls

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)


adhoc/normalize_gcs_archive.sh, line 15 at r1 (raw file):

# * rsync again to confirm all data is copied (faster)
# * remove data from old locations (slower)

set -e, to avoid delete section if sync section fails.

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)


adhoc/normalize_gcs_archive.sh, line 15 at r1 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

set -e, to avoid delete section if sync section fails.

Done.

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)


adhoc/normalize_gcs_archive.sh, line 38 at r1 (raw file):

# Switch
for year in 2016 2017 2018 2019 ; do

Just checking - doesn't need 2020?

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be super sure, should we do a gsutil du -s ... and compare the results before moving to the REMOVE section?

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598 and @stephen-soltesz)

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)


adhoc/normalize_gcs_archive.sh, line 38 at r1 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Just checking - doesn't need 2020?

Correct. We migrated 100% of nodes to the new platform in Nov 2019. In 2020 there are no legacy files for any datatype.

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: after one more suggestion, at your discretion.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than du how about listing all files in the src then verifying that 100% of them are found in the dst after the rsync. If that's true, then proceed to the remove section?

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a simple function to run a "safe_rsync" that:

  • lists files in src
  • runs rsync
  • lists files in dst
  • verifies that src files are found in dst.

That may take a little while but it should be safe.

PTAL?

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @gfr10598)

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @gfr10598)


adhoc/normalize_gcs_archive.sh, line 43 at r2 (raw file):

}

function safe_rsync() {

Very nice.

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @gfr10598)


adhoc/normalize_gcs_archive.sh, line 70 at r3 (raw file):

# Sidestream web100
for year in 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ; do
  safe_rsync gs://${archive}/sidestream/${year}/ gs://${archive}/host/sidestream/${year}/

The last change after feedback from @critzo & @yachang was to change vserver -> host for sidestream & paris-traceroute.

@stephen-soltesz stephen-soltesz merged commit 889e78c into master Feb 14, 2020
@stephen-soltesz stephen-soltesz deleted the sandbox-soltesz-adhoc branch August 12, 2022 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants