New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve how server/js client handle unexpected errors #6798
Conversation
🪼 branch checks and previews
Install Gradio from this PR pip install https://gradio-builds.s3.amazonaws.com/6966114b9f36274e85c2196ab551de949e36a226/gradio-4.9.1-py3-none-any.whl Install Gradio Python Client from this PR pip install "gradio-client @ git+https://github.com/gradio-app/gradio@6966114b9f36274e85c2196ab551de949e36a226#subdirectory=client/python" |
🦄 change detectedThis Pull Request includes changes to the following packages.
With the following changelog entry.
Maintainers or the PR author can modify the PR title to modify this entry.
|
} | ||
} | ||
} catch (e) { | ||
console.error("Unexpected client exception", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix the HTTPException
errors we keep seeing in the logs.
Explanation
https://www.loom.com/share/5f7abfdb2f2143c58982f11dab2538b2?sid=d176fd1c-b5f5-43f5-be5f-f72af9b3abea
Fix
https://www.loom.com/share/d6b9d87d05b64f69bfb8158ac565593f?sid=ca833ba1-4579-4536-8e14-50396ba3261c
Not sure how to test this actually. Should be good for review though! |
Nice! cc @aliabid94 |
Is this expected to resolve #6713? |
@abidlabs Hard to know for sure since we don't know how to repro those issues. However, I did just verify that exceptions in user functions don't break the queue/app https://www.loom.com/share/0b7d99654b444b6a8ff6d0c320c878ed?sid=bb08ee79-eb06-48f0-822d-c89d0819a1db |
gradio/routes.py
Outdated
except asyncio.CancelledError as e: | ||
del blocks._queue.pending_messages_per_session[session_hash] | ||
await blocks._queue.clean_events(session_hash=session_hash) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has to be BaseException to catch everything - asyncio.CancelledError is not an instance of Exception
event_callbacks[event_id] = callback; | ||
if (!stream_open) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you move this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured the stream should be opened after the callback got added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure, but if a second request is sent while a first one is pending, the stream will be open from the first request, so we shouldn't be expecting it to be the case that the stream is receiving messages for an event only after post request is complete.
Therefore, it can technically happen that the event stream starts sending messages for an event before the POST request to upload data finishes, and therefore the event callback to handle the message is not ready, which is a race condition that I do not know if we are hitting. I think I do resolve this in a PR I'm working on though
client/js/src/client.ts
Outdated
@@ -1014,6 +1053,12 @@ export function api_factory( | |||
event_stream = new EventSource(url); | |||
event_stream.onmessage = async function (event) { | |||
let _data = JSON.parse(event.data); | |||
if (_data.session_hash) { | |||
await Promise.all( | |||
event_ids.map((event_id) => event_callbacks[event_id](_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only reason we're keeping track of event_ids, we already have that list in the keys of event_callbacks. No need to create a second list right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also would prefer if the logic for targetting every event listener was more explicit than if the session_hash key is present. We don't use the session_hash value anyway. I think it'd make more sense that if an event_id is not provided, we trigger every event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also made the change about checking the absence of event_id to trigger all listeners!
What exceptions were happening? |
We don't know because the client would hang forever. Although I think it's the reason the playwright tests were just hanging forever in the cases they failed. |
20c5461
to
6d0f576
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @aliabid94 ! |
Description
Noticed two things when I was debugging flaky tests:
queue/data/
route the client will hang forever. the server now catches general exceptionsevent_id
, the client would error out, e.g. heartbeat, server_stopped. This wouldn't cause any noticeable errors to users (except console exceptions) but wanted to fix.My guess is that the playwright tests are mysteriously erroring server side so printing the exception message in the UI can help us debug.gr.Info
andgr.Warning
would also raise exceptions in the client. Would not be obvious to users but not desirable anyways.🎯 PRs Should Target Issues
Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.
Not adhering to this guideline will result in the PR being closed.
Tests
PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests:
bash scripts/run_all_tests.sh
You may need to run the linters:
bash scripts/format_backend.sh
andbash scripts/format_frontend.sh