New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple task to finish inactive builds #3312

Merged
merged 4 commits into from Dec 14, 2017

Conversation

Projects
None yet
4 participants
@humitos
Member

humitos commented Nov 23, 2017

This is an initial idea to solve these kind of issues and avoid user confusions:

  • I found that we are not running celery beat for periodic tasks so we need to define how this will be executed. Maybe it could be done as a django command and ran it by cron (depending how is the current setup at RTD servers)
  • We should consider the custom time limit for some project, since if it's more than 45 minutes this task will mark them as finished
  • Define how we want to show this custom error

Here is an example on how this will look like

captura de pantalla_2017-11-23_17-45-22

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Nov 27, 2017

Ah yes, this could be updated in our docs probably.

We run celery beat just as we run celery:

celery -A readthedocs worker -B

So this would be another command to run for local development. You want to run on a separate worker, as it should only have 1 concurrent process. That is, this will result in multiple scheduled tasks, as there is 4 concurrency:

celery -A readthedocs worker -Q default,celery,web,builder -B

I'm fine not supporting this when running celery as ALWAYS_EAGER -- maybe throwing a warning through a Django system check?

Custom time limit should be 150-200% of the project build time. The project build time is the container_time_limit or the default docker time limit. The celery task time limit is 120% of the project build time, I believe. So 150-200% should extend past the timeout on celery and docker.

Also, thinking more on this, it might be helpful to include more information in the error message. Adding something similar to our general error reporting would help -- "If you continue to encounter this error, file a support request with and reference this build id (#)"

@agjohnson

I think this is in the right direction. I added some feedback on the questions you've had so far, so we'll probably need another review when those changes are added.

for build in Build.objects.filter(query):
build.success = False
build.state = BUILD_STATE_FINISHED
build.error = 'This build was terminated due to inactivity.'

This comment has been minimized.

@agjohnson

agjohnson Nov 27, 2017

Contributor

On my PR, I'm accumulating these error messages in one spot, which might make sense. I'm storing these error messages all in a central projects.exceptions file. This could be implemented as an exception, but I'm not sure if the exception will be raised otherwise -- we could combine the message for this and the timeout kill though.

query = (~Q(state=BUILD_STATE_FINISHED) &
Q(date__lte=datetime.datetime.now() - datetime.timedelta(minutes=45)))
# TODO: consider ``poject.container_time_limit`` since it could be bigger than 45 minutes
for build in Build.objects.filter(query):

This comment has been minimized.

@agjohnson

agjohnson Nov 27, 2017

Contributor

This could be build.update() instead.

This comment has been minimized.

@humitos

humitos Dec 5, 2017

Member

If we are going to use update we can't put the build id # in the message.

So, for now, I'm adding this message:

        error=('This build was terminated due to inactivity. If you continue '
               'to encounter this error, file a support request.')

This comment has been minimized.

@humitos

humitos Dec 5, 2017

Member

Also, I think I will limit the query to the first 25 or 50 elements because otherwise the first time this task will be ran it will take tons of existing builds from all the times...

This comment has been minimized.

@humitos

humitos Dec 5, 2017

Member

Besides, this is not valid...

AssertionError: Cannot update a query once a slice has been taken.

So, maybe go back to my approach is better.

@safwanrahman

This comment has been minimized.

Member

safwanrahman commented Nov 27, 2017

I think we should use celery beat for this. otherwise this task does not have any benifit

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Nov 28, 2017

Celery beat is going to be used. The task above is will be a scheduled task

@humitos

This comment has been minimized.

Member

humitos commented Dec 5, 2017

I just update this PR with:

  • do not use update since I want an slice of the filter and also we want to the build.pk inside of the message
  • 1:30hs as a grace time to the build to finish (I think it's enough) --I tried to make use of the project__container_time_limit but it was complicated and I wasn't able to end up with the correct query
  • add a log.info with the information about how many builds were marked as "Terminated..."

Let me know if this is OK for you and also how to configure this as an scheduled task since I don't find the CELERY BEAT setting.

@humitos

This comment has been minimized.

Member

humitos commented Dec 5, 2017

We can always improve this task but it should be enough (for some definition) at this moment. It could be a good starting point.

@agjohnson

So this method is close. Perhaps my suggestion is the best compromise between complexity and the most correct UX.

@app.task()
def finish_inactive_builds():
delta = datetime.timedelta(hours=1, minutes=30)

This comment has been minimized.

@agjohnson

agjohnson Dec 7, 2017

Contributor

Lets use a timedelta of our base task timeout. For an example of how we add overhead to this, see:
https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/utils/__init__.py#L118-L128

There's a second part to this though, see below.

builds = Build.objects.filter(query)[:50]
for build in builds:
build.success = False

This comment has been minimized.

@agjohnson

agjohnson Dec 7, 2017

Contributor

Let's check for the timeout date here, ie - if build.date + build.project.container_time_limit > now: continue.

So the query will surface extra builds, but the custom timeout builds will be an exception to this case. They keep running through this loop until the condition above is true.

Does this seem reasonable?

This comment has been minimized.

@humitos

humitos Dec 7, 2017

Member

Yes. I think this is the easiest and more reasonable thing to do.

"I was overengeering it" as Eric said :)

'This build was terminated due to inactivity. If you '
'continue to encounter this error, file a support '
'request with and reference this build id ({0}).'.format(build.pk)
)

This comment has been minimized.

@agjohnson

agjohnson Dec 7, 2017

Contributor

This should be translated, though I'm not sure if we're equipped to actually translate this yet.

This comment has been minimized.

@humitos

humitos Dec 7, 2017

Member

Equipped? In what sense?

I don't understand how/where the translation are working yet.

@humitos

This comment has been minimized.

Member

humitos commented Dec 7, 2017

This PR should be ready.

Do I need to do something in particular to enable this task on celery beat?

@agjohnson

Looks good!

I think we'll want at least 3 test cases out of this:

  • Legitimate build isn't marked finished early
  • Build is custom time limit isn't marked finished early
  • Build without custom time limit is marked finished
if build.project.container_time_limit:
custom_delta = datetime.timedelta(
minutes=int(build.project.container_time_limit))

This comment has been minimized.

@agjohnson

agjohnson Dec 11, 2017

Contributor

container_time_limit is in seconds, so delta will need to be changed here

@humitos

This comment has been minimized.

Member

humitos commented Dec 12, 2017

I added a test case for this escenario. I thought it was better to have 3 builds at the same time instead of one build per test. Let me know if you agree with me or want me to split them.

I think it's easier to understand and also it's more close to the reality.

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Dec 14, 2017

Giddyup. The tests look great, I don't think it's important to split them up at all

@agjohnson agjohnson merged commit a700bbd into master Dec 14, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@agjohnson agjohnson deleted the humitos/inactive-build-cleanup branch Dec 14, 2017

@humitos

This comment has been minimized.

Member

humitos commented Dec 14, 2017

@humitos

This comment has been minimized.

Member

humitos commented Dec 23, 2017

@agjohnson @ericholscher what do we need to do to enable this task to be ran by celery beat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment