-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow task SLA and execution_timeout #2134
Conversation
sla=datetime.timedelta(minutes=5), | ||
execution_timeout=datetime.timedelta(minutes=10), | ||
email_on_retry=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only changing the bqetl_public_data_json
DAG here for testing. If everything works out we can update the remaining DAGs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this plan.
sla: Optional[str] = attr.ib(None) # todo: set to 3h | ||
execution_timeout: Optional[str] = attr.ib(None) # todo: set to 24h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the testing is successful and we decide to go for setting sla
and execution_timeout
for the other tasks, then we should set them to the default values in the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm very interested to see how this behaves.
@@ -85,6 +91,12 @@ with DAG('{{ name }}', default_args=default_args{%+ if schedule_interval != None | |||
{%+ if task.email_on_retry != None -%} | |||
email_on_retry={{ task.email_on_retry }}, | |||
{%+ endif -%} | |||
{%+ if task.sla != None -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{%+ if task.sla != None -%} | |
{%+ if task.sla is not none -%} |
Jinja has a none
test: https://jinja.palletsprojects.com/en/2.11.x/templates/#none
My intuition says we should be using that rather than != None
, but I don't fully understand the implications in the Jinja context.
sla=datetime.timedelta(minutes=5), | ||
execution_timeout=datetime.timedelta(minutes=10), | ||
email_on_retry=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this plan.
Based on proposal https://docs.google.com/document/d/1V8FwVH_oZonh-SLKRbghkzglofRlec7S51QwIIsLZbQ/edit#
I think it makes sense to set an
sla
for tasks but also ensure they time out after a while, otherwise we might end up with a lot of tasks retrying taking up all the slots if not detected during triage.