Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

Yacht 361 #106

Merged
merged 6 commits into from
Jan 4, 2019
Merged

Yacht 361 #106

merged 6 commits into from
Jan 4, 2019

Conversation

marek-tabor
Copy link
Contributor

No description provided.

Orphaned Backup GC(garbage collector) service + handler.
Cron job with custom schedule invoked once a month.
… to not provide projectId to external users. Added endpoint configuration to app.yaml
@coveralls
Copy link

coveralls commented Dec 21, 2018

Pull Request Test Coverage Report for Build 987

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 83.349%

Totals Coverage Status
Change from base Build 983: 0.0%
Covered Lines: 2578
Relevant Lines: 3093

💛 - Coveralls

config/cron.yaml Outdated
@@ -14,3 +14,7 @@ cron:
schedule: every 6 hours
retry_parameters:
job_retry_limit: 5
- description: execute cleanup of orphaned backups from Big Query
url: /cleanup/orphaned/backup
schedule: 1 of month 23:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use UTC (it's open source)

@@ -346,6 +346,11 @@ def disable_partition_expiration(self, project_id, dataset_id, table_id):
tableId=table_id,
body=table_data).execute()

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this query is not common, is only about orphaned backups. It shouldn't be in commmon big query class

query = <<EOF
SELECT tableId, datasetId, projectId
FROM
[${var.gcp_census_project}:bigquery_view.table_metadata_deduplicated_aggregated]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's dangerous.
let's consider this situation:

  1. datastore_export is made ( it's a cron, eg.every 6hours) and has data from 0:00
  2. backup for new table is made (eg. 0.15)
  3. census is taking snapshot which contains newly created table (eg. 0.30) (census has data about newly created table in table_metadata_deduplicated_aggregate view)

if this query will be executed, it returns newly created table as an orphaned backup (and we will delete it after all).
Its bug - it will produce situation when we are thinking that we have backup (it is in datastore - which will be exported later), but we don't have the backup table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning tables created earlier than eg. 1 day ago could probably solve this problem, (assuming that we have datastore export not older than 6h, and assuming that export works properly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

query = <<EOF
SELECT tableId, datasetId, projectId
FROM
[${var.gcp_census_project}:bigquery_view.table_metadata_deduplicated_aggregated]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning tables created earlier than eg. 1 day ago could probably solve this problem, (assuming that we have datastore export not older than 6h, and assuming that export works properly)

…n 7 days.

Move query from commons big_query. Cron job setup for once a week 9:00 in UTC timezone.
@przemyslaw-jasinski przemyslaw-jasinski merged commit 0b06612 into master Jan 4, 2019
@przemyslaw-jasinski przemyslaw-jasinski deleted the YACHT-361 branch January 4, 2019 13:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants