Conversation
attempt to try and provide a way to allow the purge_ttl script to
complete.
* Adds arguments (ENV VARS):
--instance_id (INSTANCE_ID) Spanner instance id
--database_id (DATABASE_ID) Spanner database id
--sync_database_url (SYNC_DATABASE_URL) Spanner DSN
`spanner://instance/database`
--collection_ids (COLLECTION_IDS)
JSON formatted list of collections to limit deletions
e.g. `--collection_ids=123` limits to just collection 123
`--collection_ids=[123,456]` limits to both 123 & 456
default is all collections
Issue #631
|
TBC, the syntax for the env var and providing |
|
Yep. It uses the same logic. Let me know if that's a problem and I'll come up with something different. I tried it out locally on bash and it works, provided the value is quoted like you're doing. |
tools/spanner/purge_ttl.py
Outdated
| if args.sync_database_url and not ( | ||
| args.instance_id and args.database_id): | ||
| (args.instance_id, | ||
| args.collection_id) = from_env(args.sync_database_url) |
There was a problem hiding this comment.
| args.collection_id) = from_env(args.sync_database_url) | |
| args.database_id) = from_env(args.sync_database_url) |
tools/spanner/purge_ttl.py
Outdated
| collections = [collections] | ||
| args.collection_ids = collections | ||
| if args.sync_database_url and not ( | ||
| args.instance_id and args.database_id): |
There was a problem hiding this comment.
INSTANCE/DATABASE_ID have a default so they'll pretty much always override a SYNC_DATABASE_URL here. I think we can kill these defaults (or maybe these args entirely unless @erkolson or anyone else uses them, I don't), the cron job uses SYNC_DATABASE_URL.
There was a problem hiding this comment.
I have always used the SYNC_DATABASE_URL and was unaware of the other options.
There was a problem hiding this comment.
Heh, I use the others because I can never remember what the spanner DSN is. I've reworked things a bit to be smarter and more clear what's going on. Thanks!
tools/spanner/purge_ttl.py
Outdated
| logging.info("{}: removed {} rows, batches_duration: {}".format( | ||
| name, result, end - start)) |
There was a problem hiding this comment.
| logging.info("{}: removed {} rows, batches_duration: {}".format( | |
| name, result, end - start)) | |
| logging.info("{}: removed {} rows, {}_duration: {}".format( | |
| name, result, name, end - start)) |
| query += " = {:d}".format(args.collection_ids[0]) | ||
| else: | ||
| query += " in ({})".format( | ||
| ', '.join(map(str, args.collection_ids))) |
There was a problem hiding this comment.
I always prefer bind params :) but I guess this'll do for now
There was a problem hiding this comment.
Heh, so do I, but it would have made things really complicated. Hopefully we trust @erkolson not to insert ";drop tables;"
|
@jrconlin , what is the behavior when this is passed |
tools/spanner/purge_ttl.py
Outdated
| database = instance.database(args.database_id) | ||
|
|
||
| logging.info("For {}:{}".format(args.instance_id, args.database_id)) | ||
| batch_query = add_conditions( |
There was a problem hiding this comment.
Oh sorry, almost forgot: let's not apply this to batches.
They hopefully won't ever have timeout problems, their expiry index is sane: a "global" index or whatever spanner calls it, not interleaved in user_collections. They're always going to have much less expired data to purge and their expiry logic is also a little different: all batches expire (and rather quickly) regardless of collection_id.
The batch delete job currently takes less than 10 min on prod.
Description
attempt to try and provide a way to allow the purge_ttl script to complete.
Adds arguments (ENV VARS):
--instance_id (INSTANCE_ID) Spanner instance id
--database_id (DATABASE_ID) Spanner database id
--sync_database_url (SYNC_DATABASE_URL) Spanner DSN
spanner://instance/database--collection_ids (COLLECTION_IDS)
JSON formatted list of collections to limit deletions
e.g.
--collection_ids=123limits to just collection 123--collection_ids=[123,456]limits to both 123 & 456default is all collections
Testing
May be used with the stage spanner database instance
Issue(s)
Issue #631