Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only transform the literal that matches the map task index #2484

Merged
merged 13 commits into from
Jun 17, 2024

Conversation

EngHabu
Copy link
Collaborator

@EngHabu EngHabu commented Jun 17, 2024

Tracking issue

fixes flyteorg/flyte#5483

Why are the changes needed?

Existing code only trims map task inputs before invoking the python subtask. This is incorrect as it leads to flytekit unnecessarily downloading all inputs for the entire maptask (potentially 10s of thousands) for each subtask.

What changes were proposed in this pull request?

For local execution, continue with the current behavior of handling all subtasks inputs when the map task is invoked.
For remote execution, index into the inputs list using the maptask index and only transform this input.

How was this patch tested?

  • Ran tests locally
  • Stepped through the code to make sure it correctly computes the interface, trims inputs... etc.
  • Ran a 3k map task job remotely and looked into logs before and after to confirm it no longer attempts to download all inputs

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

EngHabu and others added 8 commits June 17, 2024 01:02
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
…/flytekit into array-node-input-transformation
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 13.63636% with 19 lines in your changes missing coverage. Please review.

Project coverage is 50.98%. Comparing base (8562fd9) to head (188a7bf).
Report is 4 commits behind head on master.

Current head 188a7bf differs from pull request most recent head 1fbc324

Please upload reports for the commit 1fbc324 to get more accurate results.

Files Patch % Lines
flytekit/core/array_node_map_task.py 13.63% 19 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2484       +/-   ##
===========================================
- Coverage   73.97%   50.98%   -23.00%     
===========================================
  Files         250      185       -65     
  Lines       21592    18677     -2915     
  Branches     3644     3645        +1     
===========================================
- Hits        15973     9522     -6451     
- Misses       4973     8674     +3701     
+ Partials      646      481      -165     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EngHabu EngHabu marked this pull request as ready for review June 17, 2024 18:12
pingsutw and others added 5 commits June 17, 2024 13:03
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@EngHabu EngHabu merged commit e85ab34 into master Jun 17, 2024
44 of 46 checks passed
bgedik pushed a commit to bgedik/flytekit that referenced this pull request Jul 3, 2024
…2484)

* Only transform the literal that matches the map task index

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* kevin's update

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* make it work locally

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* add a test

Signed-off-by: Kevin Su <pingsutw@apache.org>

* check (flyteorg#2486)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Only fail for unbound batch pickled inputs for map tasks

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* Unit tests

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: bugra.gedik <bugra.gedik@predera.ai>
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
* Only transform the literal that matches the map task index

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* kevin's update

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* make it work locally

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* add a test

Signed-off-by: Kevin Su <pingsutw@apache.org>

* check (#2486)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Only fail for unbound batch pickled inputs for map tasks

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* Unit tests

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>

* lint

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Jan Fiedler <jan@union.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ArrayNodes downloads all inputs for every subtasks
2 participants