Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from SSH tunneling to FRP #2509

Merged
merged 52 commits into from
Dec 14, 2022
Merged

Switch from SSH tunneling to FRP #2509

merged 52 commits into from
Dec 14, 2022

Conversation

abidlabs
Copy link
Member

@abidlabs abidlabs commented Oct 21, 2022

Continuing from: #2396

Remaining TODOs:

  • Add back md5 encryption (@Wauplin would you be able to take a look at this?)
  • Add https for the share links (@abidlabs)
  • Fix the connection persistency issue that @freddyaboulton brought up below (@XciD)
  • Add a better page when Gradio Interface is no longer active (@abidlabs)
  • Add tests (@abidlabs)
  • Expire links in 72 hours (@XciD would you be able to take a look at this?)
  • Do some load testing to figure out if we need to set up horizontal scaling (@aliabid94)
  • Don't hardcode IP address and port (@abidlabs)

Also:

XciD and others added 2 commits October 20, 2022 18:01
* FRP Poc

* Gracefully handle exceptions in thread tunneling

* comments

* Fix share error message when files are built locally (#2502)

* fix share error message

* changelog

* formatting

* tunneling rename

* version

* formatting

* remove test

* changelog

* version

Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
Co-authored-by: Wauplin <lucainp@gmail.com>
@abidlabs abidlabs mentioned this pull request Oct 21, 2022
@github-actions
Copy link
Contributor

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2509-all-demos

@freddyaboulton
Copy link
Collaborator

This works pretty well on the demos I've tried!

The one thing I noticed that's weird is that the frp client seems to drop the connection to the gradio demo after a couple of minutes of inactivity.

I have this demo running on a jupyter notebook:

image

It worked great the first times I tried it locally and with the share link. However, I left it alone for a couple of minutes and then when I went back to the share link url I got a "Connection Error" and then the "not found" page even though the demo was still running on my machine.

frp_timeout

Refreshing the page seems to fix it but might be confusing to users. I'm wondering what would happen if the prediction takes a couple of minutes to run?

@abidlabs
Copy link
Member Author

@Wauplin do you have any ideas of what could be causing the behavior @freddyaboulton is describing?

@abidlabs
Copy link
Member Author

Thanks to @XciD, this is now working with http://*.testing.gradiodash.com/!

@abidlabs
Copy link
Member Author

Added this page to appear if a link expires or is invalid:

image

@abidlabs abidlabs self-assigned this Oct 24, 2022
@Wauplin
Copy link
Contributor

Wauplin commented Oct 25, 2022

@abidlabs I've made a small PR to generate the privilege key dynamically based on timestamp: #2519 (including also some cosmetic changes). It generates exactly the same privilege key for the example that was previously hard-coded. I've tested it locally and the tunnel works fine.

But when I was talking about encryption I was referring encrypting the json payloads. @XciD haven't you talked about something like that in your presentation ? Something about the fact that you created a special docker image where encryption is skipped just to do the tests but that we should set it back once we have a working version ? Or have I hallucinated this ? 😄 (related internal slack thread)

@Wauplin
Copy link
Contributor

Wauplin commented Oct 25, 2022

The one thing I noticed that's weird is that the frp client seems to drop the connection to the gradio demo after a couple of minutes of inactivity.

@Wauplin do you have any ideas of what could be causing the behavior @freddyaboulton is describing?

I'm sorry, I haven't been able to reproduce this error. I left a tunnel opened (from script in terminal) with a pending google chrome tab connected to it. Tried it again 1 hour later and haven't got any connection issue 😕

@XciD
Copy link
Contributor

XciD commented Oct 25, 2022

But when I was talking about encryption I was referring encrypting the json payloads. @XciD haven't you talked about something like that in your presentation ? Something about the fact that you created a special docker image where encryption is skipped just to do the tests but that we should set it back once we have a working version ? Or have I hallucinated this ? 😄 (related internal slack thread)

Nan you did not hallucinated:
I've commented this code:
https://github.com/huggingface/frp/pull/1/files#diff-6e71c8e7a9485928fab7bf90204fd6404bccf67194bc471d3aa3bcd51f794facR302

Coming from: https://github.com/fatedier/golib/tree/dev/crypto

@aliabid94
Copy link
Collaborator

aliabid94 commented Oct 25, 2022

I created a few colab notebooks to do some load testing. This was the setup:

  • 3 identical colab notebooks for INTERFACE GENERATION (1, 2, 3) that create 100 interfaces each with share=True for a total of 300 interfaces using share simultaneously. The interface predictions take on average ~15 seconds to run (random duration between 1 and 30 seconds).
  • 2 identical test notebooks for LOAD TEST (1, 2) that send 5 kb of data at regular intervals in parallel via threads to the list of interfaces generated by the previous notebooks.

I varied the sleep duration between sending requests (note: requests are sent in parallel in separate threads, but there is a sleep in between each thread launch) in the LOAD TEST notebooks. The success rate indicates how often the POST requests came back successfully.

When I ran both LOAD TEST notebooks simultaneously with the following sleep durations:

  • 0.5 second (avg 60 concurrent requests): 100% success
  • 0.25 second (avg 120 concurrent requests): 99.5% success
  • 0.1 second (avg 240 concurrent requests): 91% success

I believe the drop in success rate is not due to the infrastructure, but because the INTERFACE GENERATION colab notebooks cannot handle the load. The interfaces that failed cannot accept POST requests anymore, even at slower rates.

I'm not sure if this setup is the ideal way to load test, open to suggestions.

@abidlabs
Copy link
Member Author

Great thanks @aliabid94! This is very helpful and I think generally looks promising. Would it be possible to investigate two additional things:

  • What is our capacity for total shared connections? I.e. how many Interfaces with share=True can we support at the same time? This will tell us if we need to add horizontal scaling. Given that we currently have about ~2k concurrent requests, we should make sure that our capacity is well above that. Note that it might be faster to use networking.create_tunnel() rather than to create individual Interfaces (maybe?)
  • Is there any increase in prediction latency as the number of connections increases? Since the Gradio server instance is very small (t2.micro I believe), I was thinking maybe CPU can't handle switching between different connections very efficiently. If so, this might be reflected in an increase in latency as the number of connections increases. We could fix the prediction time and see if this increases as the number of concurrent connections increases?

@aliabid94
Copy link
Collaborator

  • We max out at 2k concurrent connections, probably a lot less than 2k concurrent requests. The best way to test this would be to launch more INTERFACE GENERATION notebooks, however google maxes out my colab sessions. If we can sync (maybe with one more person) and each run the colabs from our account, we can probably get close to 2k sessions running together
  • I can test this. Btw isn't the gradio server a t2.2xlarge?

@Wauplin
Copy link
Contributor

Wauplin commented Dec 9, 2022

Following @abidlabs's message on slack (internal link) about issues to run FRP server in a notebook due to asyncio, I investigated it and decided to completely remove the async stuff as it was more painful that anything.

Pushed the fix in c0b6801. Changes are:

  1. Normal execute of subprocess (no asyncio)
  2. No need to have a pending thread that runs indefinitely (since to loop to run anymore)
  3. Changed the CURRENT_TUNNEL singleton to a CURRENT_TUNNELS list in case someone runs several demos in the same notebook.
  4. Registered to kill the subprocess (see atexit.register) when script exits. It's not bullet-proof since if Python is killed too hard the cleanup code is never called. In general, it will happen that some users get pending processes running on their machine and I don't think we can avoid that 😕

@abidlabs
Copy link
Member Author

Normal execute of subprocess (no asyncio)
No need to have a pending thread that runs indefinitely (since to loop to run anymore)
Changed the CURRENT_TUNNEL singleton to a CURRENT_TUNNELS list in case someone runs several demos in the same notebook.
Registered to kill the subprocess (see atexit.register) when script exits. It's not bullet-proof since if Python is killed too hard the cleanup code is never called. In general, it will happen that some users get pending processes running on their machine and I don't think we can avoid that 😕

Amazing @Wauplin! Testing right now

@abidlabs
Copy link
Member Author

Thank you so much @Wauplin, tested and looks awesome!

I just made a beta release gradio==3.12.0b7. Let's do some more testing early next week and plan to release mid-next week if everything looks good.

@abidlabs
Copy link
Member Author

Testing has gone quite well! @aliabid94 is going to do some load-testing -- assuming that goes well, we should be good to merge tomorrow.

@abidlabs
Copy link
Member Author

Thank you everyone, particularly @XciD and @Wauplin for putting this together. Excited to get this out to all of our users :)

@easrng
Copy link

easrng commented Jan 31, 2023

Is the source for the frpc binary available? https://github.com/fatedier/frp seems to be incompatible with the gradio.live server and it looks like https://github.com/huggingface/frp is private, why?

@abidlabs
Copy link
Member Author

As a security precaution, we haven't released the full configuration of our FPRS server. We may consider doing so in the future

@speaknowpotato
Copy link

speaknowpotato commented May 14, 2023

As a security precaution, we haven't released the full configuration of our FPRS server. We may consider doing so in the future

hi @abidlabs , is it possible to use my own FRPS server for the shared link? for example, instead of using ***.gradio.live, i can have my.example.com to access my gradio app running in local.
if yes, would you mind sharing some sample code for the FRPS server configuration. thanks!

@abidlabs
Copy link
Member Author

Hi @speaknowpotato this is something we might consider in the future, but right now, this isn't really on our roadmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants