Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Async Behavior in ChatGPT with Gradio Interface #6749

Closed
1 task done
podkd7226 opened this issue Dec 11, 2023 · 2 comments
Closed
1 task done

Understanding Async Behavior in ChatGPT with Gradio Interface #6749

podkd7226 opened this issue Dec 11, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@podkd7226
Copy link

Describe the bug

I've been experimenting with ChatGPT's async capabilities and I'm curious about how it handles multiple simultaneous inputs. Specifically, I'm trying to understand if ChatGPT truly processes inputs concurrently or if it handles them sequentially.

Experiment 1: I set up an async function with a time.sleep of 4 seconds and opened 4 sessions simultaneously to see if the responses would come back at the same time. My assumption was that if it's truly multiprocessing, each session would return results based on the time they were initiated.

Result: The responses came back at 4-second intervals, suggesting that it might not be multiprocessing.

Experiment 2: I repeated the same format but wondered if Gradio's lack of async.run wrapping could be causing the issue.

Result: The outcome was the

Gradio.multiple_mp4.mp4

same as the first experiment.

Question: Am I misunderstanding something here? If my experiment is correct, does it mean that when Gradio is hosted on a server, and many users connect, everyone has to wait for the preceding conversations to complete before receiving their responses? Is this how it's supposed to work? Or when hosted on Hugging Face, is Gradio deployed on several different servers, allowing multiple users to connect to individual hosts that are available?

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

import gradio as gr
import asyncio

async def async_process_prompt(prompt):
await asyncio.sleep(4) # async simulation
return prompt

async def process_prompt(prompt):
try:
return await asyncio.wait_for(async_process_prompt(prompt), timeout=6)
except asyncio.TimeoutError:
return "Processing timed out."

iface = gr.Interface(
fn=process_prompt,
inputs="text",
outputs="text",
title="Async Prompt Processor with Timeout",
description="Enter a prompt to process asynchronously. Times out after 5 seconds."
)

iface.launch()

Screenshot

No response

Logs

No response

System Info

gradio version = '4.2.0'

Severity

I can work around it

@podkd7226 podkd7226 added the bug Something isn't working label Dec 11, 2023
@abidlabs
Copy link
Member

Hi @podkd7226 the reason for this is because by default, Gradio only lets each backend function execute once at a time. I.e. a single worker is assigned to that function, and if the worker is currently executing a particular function, no other requests to do the same function can be started until the first one is finished.

You can change this behavior by setting the concurrency_limit parameter in Interface. For example, this code should allow you to run up to 10 executions of your function in parallel:

import gradio as gr
import asyncio

async def async_process_prompt(prompt):
    await asyncio.sleep(4) # async simulation
    return prompt

async def process_prompt(prompt):
    try:
        return await asyncio.wait_for(async_process_prompt(prompt), timeout=6)
    except asyncio.TimeoutError:
        return "Processing timed out."

iface = gr.Interface(
    fn=process_prompt,
    inputs="text",
    outputs="text",
    title="Async Prompt Processor with Timeout",
    description="Enter a prompt to process asynchronously. Times out after 5 seconds.",
    concurrency_limit=10
)

iface.launch()

Note that increasing the concurrency_limit won't cause any issues in this mock example, but if you were running a "real" function that consumed your GPU for example, then you should set the concurrency limit to ensure that you don't run out of memory, etc.

@podkd7226
Copy link
Author

Thank you incredibly much. It was extremely beneficial!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants