Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

NON-JIRA: Cleaning unused old datastore entities during retention pro… #149

Conversation

radkomateusz
Copy link
Contributor

…cess.

Process for removing entities about tables not existing for a long time and their backup entities.
After one month after removing last backup by retention process, the table entity and backup entities connected with them are deleted.

Hardcoded constants for managing retention are now moved to configuration.
Documentation and tests updated. Renaming filter name to not contain hardcoded value in class name.

@coveralls
Copy link

coveralls commented May 17, 2019

Pull Request Test Coverage Report for Build 1196

  • 45 of 47 (95.74%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 84.445%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/retention/should_perform_datastore_cleanup_predicate.py 17 18 94.44%
src/retention/table_retention.py 17 18 94.44%
Totals Coverage Status
Change from base Build 1188: 0.2%
Covered Lines: 2443
Relevant Lines: 2893

💛 - Coveralls

@radkomateusz radkomateusz force-pushed the cleanup_not_used_datastore_entities_during_retention_process branch from d02b2bd to df069fe Compare May 17, 2019 10:35
self.assertTrue(Backup.get_by_key(backup2.key) is not None)

@freeze_time("2019-05-16")
def test_should_not_delete_table_and_backups_entities_if_table_not_exist_for_more_than_threshold_value_and_backups_are_not_deleted_yet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test case when only 1 backup is deleted from the group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no situation like removing one entity but leaving another. Everything stays, or everything is deleted (if all backups entities are marked as deleted and source table last seen is above grace deletion period)

backup1.put()
backup2 = Backup(
parent=table.key,
last_modified=datetime(2017, 02, 1, 16, 30),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

02 -> 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…cess.

Process for removing entities about tables not existing for a long time and their backup entities.
After one month after removing last backup by retention process, the table entity and backup entities connected with them are deleted.

Hardcoded constants for managing retention are now moved to configuration.
Documentation and tests updated. Renaming filter name to not contain hardcoded value in class name.
@radkomateusz radkomateusz force-pushed the cleanup_not_used_datastore_entities_during_retention_process branch from df069fe to b98996d Compare May 17, 2019 11:34

retention_settings:

# young_old_generation_threshold_in_months - number of months from above retention process will left only one backup older than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"number of months from above retention process will left only one backup older than" -> "for all backups older than this number of months, retention process will delete all backups except the most recent one. This doesn't affect backups younger than this threshold"

@@ -46,6 +46,13 @@ def default_restoration_project_id(self):
def projects_to_skip(self):
return self.__project_config['backup_settings'].get('projects_to_skip')

@property
def grace_period_after_source_table_deletion_in_months(self):
return self.__project_config['retention_settings'].get('grace_period_after_source_table_deletion_in_months') \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add some validation for integers larger than one?

return self.__project_config['retention_settings'].get('grace_period_after_source_table_deletion_in_months') \

@property
def young_old_generation_threshold_in_months(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also print all configuration during start up

table_key = ndb.Key(urlsafe=url_safe_key)
backups_entities_to_delete = Backup.get_all_backups_sorted(table_key)
logging.info("Removing table and backup entites for table:'%s'", table_reference)
for backup in backups_entities_to_delete:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete entities in bulk: https://cloud.google.com/appengine/docs/standard/python/ndb/creating-entities#Python_Deleting_entities_in_bulk
After successful deletion of backup entities, remove table entity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also try to delete all of them within transaction (but there might be some limits for number of entities)

def test(url_safe_key, existing_backups):
if existing_backups:
return False
if ShouldPerformDatastoreCleanupPredicate.\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just return the value from method

@@ -164,3 +165,89 @@ def test_should_not_perform_retention_if_no_backups(self, delete_table):

# then
delete_table.assert_not_called()

@freeze_time("2019-05-16")
def test_should_delete_both_table_and_backups_entities_if_table_not_exist_for_a_long_time_and_any_not_deleted_backup_exists(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_should_delete_both_table_and_backups_entities_if_table_not_exist_for_a_long_time_and_any_not_deleted_backup_exists => test_should_delete_both_table_and_backups_entities_if_table_does_not_exist_for_a_long_time_and_there_is_no_not_deleted_backup

self.assertTrue(Backup.get_by_key(backup2.key) is None)

@freeze_time("2019-05-16")
def test_should_not_delete_table_and_backups_entities_if_table_not_exist_for_less_than_threshold_value_and_backups_are_already_deleted(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_should_not_delete_table_and_backups_entities_if_table_not_exist_for_less_than_threshold_value_and_backups_are_already_deleted => test_should_not_delete_table_and_backup_entities_if_table_does_not_exist_for_less_than_threshold_value_and_backups_are_already_deleted

@radkomateusz
Copy link
Contributor Author

We decided to chose more safety option of optimisation, which doesn't require deleting anything.

@radkomateusz radkomateusz deleted the cleanup_not_used_datastore_entities_during_retention_process branch May 31, 2019 14:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants