Switch from SSH tunneling to FRP #2509

abidlabs · 2022-10-21T01:02:19Z

Continuing from: #2396

Remaining TODOs:

Add back md5 encryption (@Wauplin would you be able to take a look at this?)
Add https for the share links (@abidlabs)
Fix the connection persistency issue that @freddyaboulton brought up below (@XciD)
Add a better page when Gradio Interface is no longer active (@abidlabs)
Add tests (@abidlabs)
Expire links in 72 hours (@XciD would you be able to take a look at this?)
Do some load testing to figure out if we need to set up horizontal scaling (@aliabid94)
Don't hardcode IP address and port (@abidlabs)

Also:

* FRP Poc * Gracefully handle exceptions in thread tunneling * comments * Fix share error message when files are built locally (#2502) * fix share error message * changelog * formatting * tunneling rename * version * formatting * remove test * changelog * version Co-authored-by: Abubakar Abid <abubakar@huggingface.co> Co-authored-by: Wauplin <lucainp@gmail.com>

github-actions · 2022-10-21T01:04:38Z

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2509-all-demos

freddyaboulton · 2022-10-21T16:14:57Z

This works pretty well on the demos I've tried!

The one thing I noticed that's weird is that the frp client seems to drop the connection to the gradio demo after a couple of minutes of inactivity.

I have this demo running on a jupyter notebook:

It worked great the first times I tried it locally and with the share link. However, I left it alone for a couple of minutes and then when I went back to the share link url I got a "Connection Error" and then the "not found" page even though the demo was still running on my machine.

Refreshing the page seems to fix it but might be confusing to users. I'm wondering what would happen if the prediction takes a couple of minutes to run?

abidlabs · 2022-10-21T16:58:08Z

@Wauplin do you have any ideas of what could be causing the behavior @freddyaboulton is describing?

abidlabs · 2022-10-21T17:32:07Z

Thanks to @XciD, this is now working with http://*.testing.gradiodash.com/!

abidlabs · 2022-10-21T19:55:42Z

Added this page to appear if a link expires or is invalid:

Wauplin · 2022-10-25T11:33:20Z

@abidlabs I've made a small PR to generate the privilege key dynamically based on timestamp: #2519 (including also some cosmetic changes). It generates exactly the same privilege key for the example that was previously hard-coded. I've tested it locally and the tunnel works fine.

But when I was talking about encryption I was referring encrypting the json payloads. @XciD haven't you talked about something like that in your presentation ? Something about the fact that you created a special docker image where encryption is skipped just to do the tests but that we should set it back once we have a working version ? Or have I hallucinated this ? 😄 (related internal slack thread)

Wauplin · 2022-10-25T11:42:18Z

The one thing I noticed that's weird is that the frp client seems to drop the connection to the gradio demo after a couple of minutes of inactivity.

@Wauplin do you have any ideas of what could be causing the behavior @freddyaboulton is describing?

I'm sorry, I haven't been able to reproduce this error. I left a tunnel opened (from script in terminal) with a pending google chrome tab connected to it. Tried it again 1 hour later and haven't got any connection issue 😕

XciD · 2022-10-25T11:59:16Z

But when I was talking about encryption I was referring encrypting the json payloads. @XciD haven't you talked about something like that in your presentation ? Something about the fact that you created a special docker image where encryption is skipped just to do the tests but that we should set it back once we have a working version ? Or have I hallucinated this ? 😄 (related internal slack thread)

Nan you did not hallucinated:
I've commented this code:
https://github.com/huggingface/frp/pull/1/files#diff-6e71c8e7a9485928fab7bf90204fd6404bccf67194bc471d3aa3bcd51f794facR302

Coming from: https://github.com/fatedier/golib/tree/dev/crypto

aliabid94 · 2022-10-25T19:00:54Z

I created a few colab notebooks to do some load testing. This was the setup:

3 identical colab notebooks for INTERFACE GENERATION (1, 2, 3) that create 100 interfaces each with share=True for a total of 300 interfaces using share simultaneously. The interface predictions take on average ~15 seconds to run (random duration between 1 and 30 seconds).
2 identical test notebooks for LOAD TEST (1, 2) that send 5 kb of data at regular intervals in parallel via threads to the list of interfaces generated by the previous notebooks.

I varied the sleep duration between sending requests (note: requests are sent in parallel in separate threads, but there is a sleep in between each thread launch) in the LOAD TEST notebooks. The success rate indicates how often the POST requests came back successfully.

When I ran both LOAD TEST notebooks simultaneously with the following sleep durations:

0.5 second (avg 60 concurrent requests): 100% success
0.25 second (avg 120 concurrent requests): 99.5% success
0.1 second (avg 240 concurrent requests): 91% success

I believe the drop in success rate is not due to the infrastructure, but because the INTERFACE GENERATION colab notebooks cannot handle the load. The interfaces that failed cannot accept POST requests anymore, even at slower rates.

I'm not sure if this setup is the ideal way to load test, open to suggestions.

abidlabs · 2022-10-25T19:12:12Z

Great thanks @aliabid94! This is very helpful and I think generally looks promising. Would it be possible to investigate two additional things:

What is our capacity for total shared connections? I.e. how many Interfaces with share=True can we support at the same time? This will tell us if we need to add horizontal scaling. Given that we currently have about ~2k concurrent requests, we should make sure that our capacity is well above that. Note that it might be faster to use networking.create_tunnel() rather than to create individual Interfaces (maybe?)
Is there any increase in prediction latency as the number of connections increases? Since the Gradio server instance is very small (t2.micro I believe), I was thinking maybe CPU can't handle switching between different connections very efficiently. If so, this might be reflected in an increase in latency as the number of connections increases. We could fix the prediction time and see if this increases as the number of concurrent connections increases?

aliabid94 · 2022-10-25T19:15:39Z

We max out at 2k concurrent connections, probably a lot less than 2k concurrent requests. The best way to test this would be to launch more INTERFACE GENERATION notebooks, however google maxes out my colab sessions. If we can sync (maybe with one more person) and each run the colabs from our account, we can probably get close to 2k sessions running together
I can test this. Btw isn't the gradio server a t2.2xlarge?

Wauplin · 2022-12-09T10:45:15Z

Following @abidlabs's message on slack (internal link) about issues to run FRP server in a notebook due to asyncio, I investigated it and decided to completely remove the async stuff as it was more painful that anything.

Pushed the fix in c0b6801. Changes are:

Normal execute of subprocess (no asyncio)
No need to have a pending thread that runs indefinitely (since to loop to run anymore)
Changed the CURRENT_TUNNEL singleton to a CURRENT_TUNNELS list in case someone runs several demos in the same notebook.
Registered to kill the subprocess (see atexit.register) when script exits. It's not bullet-proof since if Python is killed too hard the cleanup code is never called. In general, it will happen that some users get pending processes running on their machine and I don't think we can avoid that 😕

abidlabs · 2022-12-10T04:49:35Z

Normal execute of subprocess (no asyncio)
No need to have a pending thread that runs indefinitely (since to loop to run anymore)
Changed the CURRENT_TUNNEL singleton to a CURRENT_TUNNELS list in case someone runs several demos in the same notebook.
Registered to kill the subprocess (see atexit.register) when script exits. It's not bullet-proof since if Python is killed too hard the cleanup code is never called. In general, it will happen that some users get pending processes running on their machine and I don't think we can avoid that 😕

Amazing @Wauplin! Testing right now

abidlabs · 2022-12-10T06:14:05Z

Thank you so much @Wauplin, tested and looks awesome!

I just made a beta release gradio==3.12.0b7. Let's do some more testing early next week and plan to release mid-next week if everything looks good.

abidlabs · 2022-12-13T07:44:59Z

Testing has gone quite well! @aliabid94 is going to do some load-testing -- assuming that goes well, we should be good to merge tomorrow.

abidlabs · 2022-12-14T14:00:13Z

Thank you everyone, particularly @XciD and @Wauplin for putting this together. Excited to get this out to all of our users :)

easrng · 2023-01-31T05:35:40Z

Is the source for the frpc binary available? https://github.com/fatedier/frp seems to be incompatible with the gradio.live server and it looks like https://github.com/huggingface/frp is private, why?

abidlabs · 2023-01-31T22:44:10Z

As a security precaution, we haven't released the full configuration of our FPRS server. We may consider doing so in the future

speaknowpotato · 2023-05-14T21:45:24Z

As a security precaution, we haven't released the full configuration of our FPRS server. We may consider doing so in the future

hi @abidlabs , is it possible to use my own FRPS server for the shared link? for example, instead of using ***.gradio.live, i can have my.example.com to access my gradio app running in local.
if yes, would you mind sharing some sample code for the FRPS server configuration. thanks!

abidlabs · 2023-05-15T11:29:51Z

Hi @speaknowpotato this is something we might consider in the future, but right now, this isn't really on our roadmap

XciD and others added 2 commits October 20, 2022 18:01

Merge branch 'main' into frp

160e610

abidlabs mentioned this pull request Oct 21, 2022

FRP Poc #2396

Merged

2509

a1cd740

abidlabs added 2 commits October 21, 2022 10:22

updated url to testing.gradiodash.com

71cfff9

gradiotesting

7f8f70b

abidlabs added 2 commits October 21, 2022 10:35

format, version

4926921

gradio.live

2dc5fdf

abidlabs added 3 commits October 21, 2022 16:22

temp fix for https

889b6b8

remove unnecessary tests

45eec71

version

5144929

abidlabs self-assigned this Oct 24, 2022

abidlabs and others added 5 commits October 24, 2022 08:50

Merge branch 'main' into frp

90a041a

updated tunnel logic

a5f261a

formatting and tests

0523e4f

load testing

3945568

changes

6c3c8d7

Wauplin mentioned this pull request Oct 25, 2022

Make private method + generate privilege key #2519

Merged

Make private method + generate privilege key (#2519)

ffb60aa

abidlabs and others added 8 commits December 8, 2022 20:05

Merge branch 'main' into frp

e94ff92

Merge branch 'frp' of github.com:gradio-app/gradio into frp

500b89a

formatting

730b069

testing tunneling exists

0bfffca

tests

f6a7db7

formatting

81a27ed

lint

9f12be3

Remove asyncio + kill proc on exit

c0b6801

Merge branch 'main' into frp

6c771f0

abidlabs added 4 commits December 9, 2022 21:06

version

1b73287

version

0c7d487

Merge branch 'frp' of github.com:gradio-app/gradio into frp

510f8f7

Merge branch 'main' into frp

9d0100e

abidlabs added 3 commits December 13, 2022 01:45

Merge branch 'main' into frp

c3da682

Merge branch 'main' into frp

7e67489

update changelog

4875c41

explicit message about reporting

18e7ae3

abidlabs merged commit 53005ab into main Dec 14, 2022

abidlabs deleted the frp branch December 14, 2022 14:10

This was referenced Dec 14, 2022

Outputs that are larger than 2 MB do not work with share=True #2260

Closed

Socket exception: An existing connection was forcibly closed by the remote host (10054) #2553

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from SSH tunneling to FRP #2509

Switch from SSH tunneling to FRP #2509

abidlabs commented Oct 21, 2022 •

edited

Loading

github-actions bot commented Oct 21, 2022

freddyaboulton commented Oct 21, 2022

abidlabs commented Oct 21, 2022

abidlabs commented Oct 21, 2022

abidlabs commented Oct 21, 2022

Wauplin commented Oct 25, 2022 •

edited

Loading

Wauplin commented Oct 25, 2022 •

edited

Loading

XciD commented Oct 25, 2022

aliabid94 commented Oct 25, 2022 •

edited

Loading

abidlabs commented Oct 25, 2022

aliabid94 commented Oct 25, 2022

Wauplin commented Dec 9, 2022

abidlabs commented Dec 10, 2022

abidlabs commented Dec 10, 2022

abidlabs commented Dec 13, 2022

abidlabs commented Dec 14, 2022

easrng commented Jan 31, 2023

abidlabs commented Jan 31, 2023

speaknowpotato commented May 14, 2023 •

edited

Loading

abidlabs commented May 15, 2023

Switch from SSH tunneling to FRP #2509

Switch from SSH tunneling to FRP #2509

Conversation

abidlabs commented Oct 21, 2022 • edited Loading

github-actions bot commented Oct 21, 2022

freddyaboulton commented Oct 21, 2022

abidlabs commented Oct 21, 2022

abidlabs commented Oct 21, 2022

abidlabs commented Oct 21, 2022

Wauplin commented Oct 25, 2022 • edited Loading

Wauplin commented Oct 25, 2022 • edited Loading

XciD commented Oct 25, 2022

aliabid94 commented Oct 25, 2022 • edited Loading

abidlabs commented Oct 25, 2022

aliabid94 commented Oct 25, 2022

Wauplin commented Dec 9, 2022

abidlabs commented Dec 10, 2022

abidlabs commented Dec 10, 2022

abidlabs commented Dec 13, 2022

abidlabs commented Dec 14, 2022

easrng commented Jan 31, 2023

abidlabs commented Jan 31, 2023

speaknowpotato commented May 14, 2023 • edited Loading

abidlabs commented May 15, 2023

abidlabs commented Oct 21, 2022 •

edited

Loading

Wauplin commented Oct 25, 2022 •

edited

Loading

Wauplin commented Oct 25, 2022 •

edited

Loading

aliabid94 commented Oct 25, 2022 •

edited

Loading

speaknowpotato commented May 14, 2023 •

edited

Loading