You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some users getting started with Kestra are confused as to why a list of tasks specified within the EachParallel task is not executed sequentially, one after the other, for each parallel group of tasks. Many find it confusing that they need to wrap these tasks into a Sequential task to achieve parallelism for an entire group of tasks.
At the same time, the EachSequential task seems to be functionally equivalent to EachParallel using the concurrent property set to 1.
Lastly, the naming of tasks and properties is confusing:
The name of the task is EachParallel, and then instead of parallel, it has a concurrent property. It would make more sense to have EachConcurrent in this case to be precise or change that property to parallel
Both of these tasks accept a list of values. However, the property name is configured as a singular value, which is a little confusing given that it accepts a list of values, not a single value
The task is doing a For loop. Some users may search for such looping task in autocompletion using "for" and they may not find it. Using the name ForEach will be easy to find for users looking for "each" and "for" keywords.
Solution
Before the 1.0 release, it's important to unify both tasks into a single ForEach task that:
Allows sequential and concurrent execution of tasks simply by configuring it as task property concurrent, not as a dedicated task
Uses values property in plural or items to unify with ForEachItem
Proposed syntax:
id: pythonPartitionsMetricsnamespace: blueprintdescription: Process partitions in paralleltasks:
- id: getPartitionstype: io.kestra.plugin.scripts.python.Scriptrunner: DOCKERdocker:
image: ghcr.io/kestra-io/pydata:latestscript: | from kestra import Kestra partitions = [f"file_{nr}.parquet" for nr in range(1, 1000)] Kestra.outputs({'partitions': partitions})
- id: processPartitionstype: io.kestra.core.tasks.flows.ForEachitems: '{{outputs.getPartitions.vars.partitions}}'concurrent: 10# set to 1 to get EachSequential-like processingtasks:
- id: log_starttype: io.kestra.core.tasks.log.Logmessage: staring processing {{ taskrun.item }}
- id: partitiontype: io.kestra.plugin.scripts.python.Scriptrunner: DOCKERdocker:
image: ghcr.io/kestra-io/pydata:latestscript: | import random import time from kestra import Kestra filename = '{{ taskrun.item }}' print(f"Reading and processing partition {filename}") nr_rows = random.randint(1, 1000) processing_time = random.randint(1, 20) time.sleep(processing_time) Kestra.counter('nr_rows', nr_rows, {'partition': filename}) Kestra.timer('processing_time', processing_time, {'partition': filename})
- id: log_endtype: io.kestra.core.tasks.log.Logmessage: finished processing {{ taskrun.item }}
Allowing failure of parallel child tasks
This new task should make it possible for all child tasks to run till completion in a non-blocking way even if some child task fails. We were considering continueOnError: boolean. However, if allowFailure will be added to the core tasks, there seems to be no need for the extra continueOnError -- see #2248.
The text was updated successfully, but these errors were encountered:
Feature description
Problem
Some users getting started with Kestra are confused as to why a list of tasks specified within the
EachParallel
task is not executed sequentially, one after the other, for each parallel group of tasks. Many find it confusing that they need to wrap these tasks into a Sequential task to achieve parallelism for an entire group of tasks.At the same time, the
EachSequential
task seems to be functionally equivalent toEachParallel
using theconcurrent
property set to 1.Lastly, the naming of tasks and properties is confusing:
EachParallel
, and then instead ofparallel
, it has aconcurrent
property. It would make more sense to haveEachConcurrent
in this case to be precise or change that property toparallel
values
. However, the property name is configured as a singularvalue
, which is a little confusing given that it accepts a list ofvalues
, not a single valueForEach
will be easy to find for users looking for "each" and "for" keywords.Solution
Before the 1.0 release, it's important to unify both tasks into a single
ForEach
task that:sequential
andconcurrent
execution of tasks simply by configuring it as task propertyconcurrent
, not as a dedicated taskvalues
property in plural oritems
to unify withForEachItem
Proposed syntax:
Allowing failure of parallel child tasks
This new task should make it possible for all child tasks to run till completion in a non-blocking way even if some child task fails. We were considering
continueOnError: boolean
. However, ifallowFailure
will be added to the core tasks, there seems to be no need for the extracontinueOnError
-- see #2248.The text was updated successfully, but these errors were encountered: