-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Release streaming generator task metadata based on ref counter #43584
Conversation
// caller. The executor side should just assume everything is consumed if it | ||
// is -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Let me finish review asap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also remove this in this PR?
It's a data-level workaround for this issue
It would be better if we can keep the changes separate. This PR is risky as is. Also, if there is a workaround in Data, should we still mark this as a release blocker? It's not complete yet and the amount of refactoring needed is going to be significant. |
…lock Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Btw, premrege seems to fail. Should I review before it passes CI? |
ASSERT_TRUE(retry_signal_called); | ||
CompletePendingStreamingTask(spec, caller_address, 2); | ||
} | ||
// TEST_F(TaskManagerTest, TestObjectRefStreamDeletedStreamIgnored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these tests temporarily disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, working on putting these back before we merge. I made some changes to TaskManager interface so these were failing to compile.
Please review even though premerge is failing! Thanks!
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
4969bd8
to
b5bab88
Compare
Fix lineage reconstruction bug for streaming generators by only garbage-collecting stream metadata once the refs' lineage have gone out of scope. Changes: When generator goes out of scope, add it to a list of streaming generator tasks that we scan periodically For each generator task, check if the task and streaming metadata can be removed. It can be removed if the generator task and all generated return refs have gone out of scope. Fixes an existing potential leak where task completes after the generator ref and returned refs have gone out of scope, by deleting the task metadata with the stream metadata. #43584 is probably better long-term as it refactors stream metadata to be GCed through the normal ref counting path, and lineage can be GCed eagerly. However, this fix is safer. Related issue number Closes #39151. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Closing for now in favor of #43772 but we should revisit once the TaskManager <> RefCounter deadlock situation is resolved. The merged PR is less ideal because it requires periodic GC and may have edge cases depending on order of task completion vs references out of scope. |
Why are these changes needed?
Update streaming generator metadata management to use the standard ref counting path. Changes:
Related issue number
Closes #39151.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.