-
Notifications
You must be signed in to change notification settings - Fork 7
Conversation
Orphaned Backup GC(garbage collector) service + handler. Cron job with custom schedule invoked once a month.
… to not provide projectId to external users. Added endpoint configuration to app.yaml
config/cron.yaml
Outdated
@@ -14,3 +14,7 @@ cron: | |||
schedule: every 6 hours | |||
retry_parameters: | |||
job_retry_limit: 5 | |||
- description: execute cleanup of orphaned backups from Big Query | |||
url: /cleanup/orphaned/backup | |||
schedule: 1 of month 23:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use UTC (it's open source)
src/commons/big_query/big_query.py
Outdated
@@ -346,6 +346,11 @@ def disable_partition_expiration(self, project_id, dataset_id, table_id): | |||
tableId=table_id, | |||
body=table_data).execute() | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this query is not common, is only about orphaned backups. It shouldn't be in commmon big query class
query = <<EOF | ||
SELECT tableId, datasetId, projectId | ||
FROM | ||
[${var.gcp_census_project}:bigquery_view.table_metadata_deduplicated_aggregated] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's dangerous.
let's consider this situation:
- datastore_export is made ( it's a cron, eg.every 6hours) and has data from 0:00
- backup for new table is made (eg. 0.15)
- census is taking snapshot which contains newly created table (eg. 0.30) (census has data about newly created table in table_metadata_deduplicated_aggregate view)
if this query will be executed, it returns newly created table as an orphaned backup (and we will delete it after all).
Its bug - it will produce situation when we are thinking that we have backup (it is in datastore - which will be exported later), but we don't have the backup table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning tables created earlier than eg. 1 day ago could probably solve this problem, (assuming that we have datastore export not older than 6h, and assuming that export works properly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
query = <<EOF | ||
SELECT tableId, datasetId, projectId | ||
FROM | ||
[${var.gcp_census_project}:bigquery_view.table_metadata_deduplicated_aggregated] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning tables created earlier than eg. 1 day ago could probably solve this problem, (assuming that we have datastore export not older than 6h, and assuming that export works properly)
…n 7 days. Move query from commons big_query. Cron job setup for once a week 9:00 in UTC timezone.
No description provided.