Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post to studio in thread to avoid blocking #814

Merged
merged 4 commits into from
Apr 19, 2024
Merged

post to studio in thread to avoid blocking #814

merged 4 commits into from
Apr 19, 2024

Conversation

dberenbaum
Copy link
Contributor

@@ -882,7 +883,8 @@ def make_dvcyaml(self):

@catch_and_warn(DvcException, logger)
def post_to_studio(self, event: Literal["start", "data", "done"]):
post_to_studio(self, event)
thread = threading.Thread(target=post_to_studio, args=(self, event))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  • do we still need @catch_and_warn(DvcException, logger) above?
  • do we start a new thread per each request now? sound expensive and can for example exhaust OS if post_to_studio is running faster then we process them
  • can we use a single thread (and a queue for example)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points. I ended up leaving the start and done events in the main thread and create a single queue for all data events to not overwhelm the OS or Studio. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not run start and done through the same queue? it's better a bit since start will initialize the queue - makes it a bit more thread-safer

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dberenbaum for a prompt fix! A few questions / comments.

post_to_studio(item, "data")
self._studio_queue.task_done()

threading.Thread(target=worker, daemon=True).start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is post_to_studio defined now?

One thing to test - Ctrl-C handling for this. Just to make sure that our threads and queue wait doesn't mess up apps in any way. Thanks @dberenbaum for iterating on this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wrap the call to post_to_studio in @catch_and_warn(DvcException, logger) ?

Copy link
Contributor Author

@dberenbaum dberenbaum Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is post_to_studio defined now?

def post_to_studio(live: Live, event: Literal["start", "data", "done"]): # noqa: C901

One thing to test - Ctrl-C handling for this. Just to make sure that our threads and queue wait doesn't mess up apps in any way. Thanks @dberenbaum for iterating on this!

No, thanks for keeping me honest here. It's a pretty simple change but could easily see it breaking some things. This is one reason I wanted to keep start and done separate per your question above. The queue is set up in daemon mode which can exit without making all expected calls (without it I noticed tests were hanging). By separating start and end, we ensure those are called and returned in most scenarios. If a particular live data call gets missed, I don't think it's as critical. (edit: it also simplifies testing to have those calls in the main thread)

should we wrap the call to post_to_studio in @catch_and_warn(DvcException, logger) ?

Sure, pushed an update with that change

@dberenbaum dberenbaum merged commit 228e9a8 into main Apr 19, 2024
14 checks passed
@dberenbaum dberenbaum deleted the post-in-thread branch April 19, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming live metrics to DVC studio severely limits training speed
2 participants