Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the CI to pass consistently on Windows (hopefully) #4624

Merged
merged 19 commits into from Jun 22, 2023

Conversation

abidlabs
Copy link
Member

@abidlabs abidlabs commented Jun 21, 2023

We had an issue where the backend CI tests on Windows would sometimes time out, maybe 40% of the time. Upon investigation, the issue was that one of the tests would try to run a Gradio demo on a port that was not available:

image

This shouldn't have been possible, because we first check to find an available port get_first_available_port method. However, it was possible that a port could become occupied between the time this method is run and the Gradio server launches. In that case, the server launch would silently fail and the script would hang forever.

What this PR does is:

  • Decrease the steps between the method call and server launch
  • More importantly, it also adds a server timeout -- if the server is not successfully launched within 5 seconds, then next port is tried
  • Also does some minor cleanups of the code

This PR is a little tricky to test besides re-running the CI, which I've done several times and it seems like everything passes consistently now.

-- also feel free to merge to this branch directory, or merge this PR in if ready, as I'll be OOO. Thanks!

@gradio-pr-bot
Copy link
Contributor

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-4624-all-demos

while not self.started:
time.sleep(1e-3)
if time.time() - start > 5:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes <10 ms on my computer to start the server. So 5 seconds is a very generous allotment for slow computers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be good to add "to start within 5 seconds." for more context. Will push this up since you're away

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this message is not displayed to the user so won't make the change


for port in server_ports:
try:
# The fastest way to check if a port is available is to try to bind to it with socket.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the copious comments, but I wanted to make sure the new logic is clear to someone who is reading the code for the first time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate it!

@abidlabs abidlabs changed the title [Test PR] Debugging Windows Jobs Getting the CI to pass consistently on Windows (hopefully) Jun 22, 2023
server_name = server_name or LOCALHOST_NAME
# if port is not specified, search for first available port
if server_port is None:
port = get_first_available_port(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't removed this method get_first_available_port from the code since we use it in reload mode. I do think we need to overhaul reload mode at some point, and then we could remove this method too

Copy link
Member

@pngwn pngwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave the code for others to review properly but looks good to me and seems to work.

@gradio-pr-bot
Copy link
Contributor

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-4624-all-demos

Copy link
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abidlabs !

Tested reload mode, server_name, and the repro from pr #3675 and everything is working correctly

@freddyaboulton freddyaboulton merged commit 86dc064 into main Jun 22, 2023
11 checks passed
@freddyaboulton freddyaboulton deleted the windows-jobs branch June 22, 2023 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants