Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how server/js client handle unexpected errors #6798

Merged
merged 14 commits into from Dec 15, 2023

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Dec 14, 2023

Description

Noticed two things when I was debugging flaky tests:

  • If there's an exception in the queue/data/ route the client will hang forever. the server now catches general exceptions
  • If the message didn't have an event_id, the client would error out, e.g. heartbeat, server_stopped. This wouldn't cause any noticeable errors to users (except console exceptions) but wanted to fix.My guess is that the playwright tests are mysteriously erroring server side so printing the exception message in the UI can help us debug.
  • gr.Info and gr.Warning would also raise exceptions in the client. Would not be obvious to users but not desirable anyways.
  • Think I found the issue for the weird HTTPExceptions in logs

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Tests

  1. PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests: bash scripts/run_all_tests.sh

  2. You may need to run the linters: bash scripts/format_backend.sh and bash scripts/format_frontend.sh

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Dec 14, 2023

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
Storybook ready! Storybook preview
Visual tests 1 failing test Build review
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-builds.s3.amazonaws.com/6966114b9f36274e85c2196ab551de949e36a226/gradio-4.9.1-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@6966114b9f36274e85c2196ab551de949e36a226#subdirectory=client/python"

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Dec 14, 2023

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/client patch
gradio patch
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Improve how server/js client handle unexpected errors

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@freddyaboulton freddyaboulton added the v: patch A change that requires a patch release label Dec 15, 2023
}
}
} catch (e) {
console.error("Unexpected client exception", e);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton
Copy link
Collaborator Author

Not sure how to test this actually. Should be good for review though!

@abidlabs
Copy link
Member

Nice! cc @aliabid94

@abidlabs
Copy link
Member

Is this expected to resolve #6713?

@freddyaboulton
Copy link
Collaborator Author

@abidlabs Hard to know for sure since we don't know how to repro those issues. However, I did just verify that exceptions in user functions don't break the queue/app

https://www.loom.com/share/0b7d99654b444b6a8ff6d0c320c878ed?sid=bb08ee79-eb06-48f0-822d-c89d0819a1db

gradio/routes.py Outdated
except asyncio.CancelledError as e:
del blocks._queue.pending_messages_per_session[session_hash]
await blocks._queue.clean_events(session_hash=session_hash)
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to be BaseException to catch everything - asyncio.CancelledError is not an instance of Exception

event_callbacks[event_id] = callback;
if (!stream_open) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you move this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured the stream should be opened after the callback got added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure, but if a second request is sent while a first one is pending, the stream will be open from the first request, so we shouldn't be expecting it to be the case that the stream is receiving messages for an event only after post request is complete.
Therefore, it can technically happen that the event stream starts sending messages for an event before the POST request to upload data finishes, and therefore the event callback to handle the message is not ready, which is a race condition that I do not know if we are hitting. I think I do resolve this in a PR I'm working on though

@@ -1014,6 +1053,12 @@ export function api_factory(
event_stream = new EventSource(url);
event_stream.onmessage = async function (event) {
let _data = JSON.parse(event.data);
if (_data.session_hash) {
await Promise.all(
event_ids.map((event_id) => event_callbacks[event_id](_data))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only reason we're keeping track of event_ids, we already have that list in the keys of event_callbacks. No need to create a second list right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would prefer if the logic for targetting every event listener was more explicit than if the session_hash key is present. We don't use the session_hash value anyway. I think it'd make more sense that if an event_id is not provided, we trigger every event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also made the change about checking the absence of event_id to trigger all listeners!

@aliabid94
Copy link
Collaborator

If there's an exception in the queue/data/ route the client will hang forever. the server now catches general exceptions

What exceptions were happening?

@freddyaboulton
Copy link
Collaborator Author

What exceptions were happening?

We don't know because the client would hang forever. Although I think it's the reason the playwright tests were just hanging forever in the cases they failed.

Copy link
Collaborator

@aliabid94 aliabid94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@freddyaboulton
Copy link
Collaborator Author

Thanks @aliabid94 !

@freddyaboulton freddyaboulton merged commit 245d58e into main Dec 15, 2023
15 checks passed
@freddyaboulton freddyaboulton deleted the client-handle-server-failures branch December 15, 2023 21:01
@pngwn pngwn mentioned this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v: patch A change that requires a patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants