-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] map_task downloads all pickled outputs from previous map_task #5180
Comments
Thank you for opening your first issue here! 🛠 |
Hey there, @tomtou-bspace! Great to see you back diving into the intricacies of workflow orchestration. Hope you're ready for another round of troubleshooting and insights. Let's get to it! Based on the information you've provided, it seems like the issue you're encountering is due to the way outputs are being passed between
For more information, you can refer to the Flyte documentation on map tasks. Regarding the structure and type of the output from I hope these suggestions help you resolve the issue. If you need further assistance, please provide more details about your implementation, and I'll be glad to help. Sources
|
@tomtou-bspace , this is a known limitation of the current implementation of map tasks. Unfortunately, we're not going to revisit this decision in the short / medium term as the team is focused on other projects right now. |
Describe the bug
Consequent map_tasks download all outputs of entire previous map task, instead of specific outputs of a previous sub-task.
This leads to a the subsequent map task running for an hour instead of 5 minutes.
Expected behavior
if map_task B's inputs are map_task A's outputs, then:
the i'th sub_task of map task B, should download the outputs of the i'th sub_task of map task A
Additional context to reproduce
When running map_task A, where each sub_task returns a type Dict[str,Dict[str,np.ndarray]]
then map_task B receives the output of A, where each sub task of B downloads a list of Dict[str,Dict[str,np.ndarray]] instead of a single object.
Screenshots
map_task of task "save_as_parquet_to_s3" takes more than an hour, instead of 5 minutes
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: