Skip to content

Commit

Permalink
Cleanup missing worker entries after seven days
Browse files Browse the repository at this point in the history
Missing worker entries are now kept in the database for seven days
before being cleaned up. This gives time for post-mortem analysis.

It also switches the cleanup of those records to be a bulk operation.

closes #8988
  • Loading branch information
bmbouter committed Nov 15, 2021
1 parent d3c5ae6 commit f23c318
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGES/8988.bugfix
@@ -0,0 +1,2 @@
Missing worker records are now kept in the database for seven days allowing time for post-mortem
analysis of them. The user-facing data in the status API remains unmodified.
15 changes: 9 additions & 6 deletions pulpcore/app/models/task.py
Expand Up @@ -42,20 +42,23 @@ def online_workers(self):

return self.filter(last_heartbeat__gte=age_threshold)

def missing_workers(self):
def missing_workers(self, age=timedelta(seconds=settings.WORKER_TTL)):
"""
Returns a queryset of workers meeting the criteria to be considered 'missing'
To be considered missing, a worker must have a stale timestamp. Stale is defined here as
"beyond the pulp process timeout interval".
To be considered missing, a worker must have a stale timestamp. By default, stale is
defined here as longer than the ``settings.WORKER_TTL``, or you can specify age as a
timedelta.
Args:
age (datetime.timedelta): Workers who have heartbeats older than this time interval are
considered missing.
Returns:
:class:`django.db.models.query.QuerySet`: A query set of the Worker objects which
are considered by Pulp to be 'missing'.
"""
now = timezone.now()
age_threshold = now - timedelta(seconds=settings.WORKER_TTL)

age_threshold = timezone.now() - age
return self.filter(last_heartbeat__lt=age_threshold)

def resource_managers(self):
Expand Down
4 changes: 2 additions & 2 deletions pulpcore/tasking/pulpcore_worker.py
Expand Up @@ -110,11 +110,11 @@ def shutdown(self):
_logger.info(_("Worker %s was shut down."), self.name)

def worker_cleanup(self):
qs = Worker.objects.missing_workers()
qs = Worker.objects.missing_workers(age=timedelta(days=7))
if qs:
for worker in qs:
_logger.info(_("Clean missing worker %s."), worker.name)
worker.delete()
qs.delete()

def beat(self):
if self.worker.last_heartbeat < timezone.now() - timedelta(seconds=self.heartbeat_period):
Expand Down

0 comments on commit f23c318

Please sign in to comment.