-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] fix split read output blocks #41070
Conversation
7ef3852
to
26ea5b0
Compare
if cur_additional_split_factor: | ||
size_based_splits *= cur_additional_split_factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i feel like we still need to keep this logic and cur_additional_split_factor
as a parameter in this function. @stephanie-wang as original author to confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this seems still useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I'll add it back
@@ -921,6 +921,28 @@ def yield_five(block_iter: Iterable[Block], ctx) -> Iterable[Block]: | |||
# 100 inputs -> 100 / 10 = 10 tasks -> 10 * 5 = 50 output blocks | |||
assert op._estimated_output_blocks == 50 | |||
|
|||
# Test read output splitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make it a standalone test?
if cur_additional_split_factor: | ||
size_based_splits *= cur_additional_split_factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this seems still useful.
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
4b6eb29
to
cbe94c1
Compare
I'm not sure why we were calling `_set_num_output_blocks` on the upstream input operator when we were splitting blocks for the downstream read map operator. The split happens in the read map operator, which shouldn't affect the input operator. Also adds a test for estimating output blocks with additional split factor. --------- Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Why are these changes needed?
I'm not sure why we were calling
_set_num_output_blocks
on the upstream input operator when we were splitting blocks for the downstream read map operator. The split happens in the read map operator, which shouldn't affect the input operator.Also adds a test for estimating output blocks with additional split factor.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.