You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been experimenting with ChatGPT's async capabilities and I'm curious about how it handles multiple simultaneous inputs. Specifically, I'm trying to understand if ChatGPT truly processes inputs concurrently or if it handles them sequentially.
Experiment 1: I set up an async function with a time.sleep of 4 seconds and opened 4 sessions simultaneously to see if the responses would come back at the same time. My assumption was that if it's truly multiprocessing, each session would return results based on the time they were initiated.
Result: The responses came back at 4-second intervals, suggesting that it might not be multiprocessing.
Experiment 2: I repeated the same format but wondered if Gradio's lack of async.run wrapping could be causing the issue.
Result: The outcome was the
Gradio.multiple_mp4.mp4
same as the first experiment.
Question: Am I misunderstanding something here? If my experiment is correct, does it mean that when Gradio is hosted on a server, and many users connect, everyone has to wait for the preceding conversations to complete before receiving their responses? Is this how it's supposed to work? Or when hosted on Hugging Face, is Gradio deployed on several different servers, allowing multiple users to connect to individual hosts that are available?
iface = gr.Interface(
fn=process_prompt,
inputs="text",
outputs="text",
title="Async Prompt Processor with Timeout",
description="Enter a prompt to process asynchronously. Times out after 5 seconds."
)
iface.launch()
Screenshot
No response
Logs
No response
System Info
gradio version = '4.2.0'
Severity
I can work around it
The text was updated successfully, but these errors were encountered:
Hi @podkd7226 the reason for this is because by default, Gradio only lets each backend function execute once at a time. I.e. a single worker is assigned to that function, and if the worker is currently executing a particular function, no other requests to do the same function can be started until the first one is finished.
You can change this behavior by setting the concurrency_limit parameter in Interface. For example, this code should allow you to run up to 10 executions of your function in parallel:
importgradioasgrimportasyncioasyncdefasync_process_prompt(prompt):
awaitasyncio.sleep(4) # async simulationreturnpromptasyncdefprocess_prompt(prompt):
try:
returnawaitasyncio.wait_for(async_process_prompt(prompt), timeout=6)
exceptasyncio.TimeoutError:
return"Processing timed out."iface=gr.Interface(
fn=process_prompt,
inputs="text",
outputs="text",
title="Async Prompt Processor with Timeout",
description="Enter a prompt to process asynchronously. Times out after 5 seconds.",
concurrency_limit=10
)
iface.launch()
Note that increasing the concurrency_limit won't cause any issues in this mock example, but if you were running a "real" function that consumed your GPU for example, then you should set the concurrency limit to ensure that you don't run out of memory, etc.
Describe the bug
I've been experimenting with ChatGPT's async capabilities and I'm curious about how it handles multiple simultaneous inputs. Specifically, I'm trying to understand if ChatGPT truly processes inputs concurrently or if it handles them sequentially.
Experiment 1: I set up an async function with a time.sleep of 4 seconds and opened 4 sessions simultaneously to see if the responses would come back at the same time. My assumption was that if it's truly multiprocessing, each session would return results based on the time they were initiated.
Result: The responses came back at 4-second intervals, suggesting that it might not be multiprocessing.
Experiment 2: I repeated the same format but wondered if Gradio's lack of async.run wrapping could be causing the issue.
Result: The outcome was the
Gradio.multiple_mp4.mp4
same as the first experiment.
Question: Am I misunderstanding something here? If my experiment is correct, does it mean that when Gradio is hosted on a server, and many users connect, everyone has to wait for the preceding conversations to complete before receiving their responses? Is this how it's supposed to work? Or when hosted on Hugging Face, is Gradio deployed on several different servers, allowing multiple users to connect to individual hosts that are available?
Have you searched existing issues? 🔎
Reproduction
import gradio as gr
import asyncio
async def async_process_prompt(prompt):
await asyncio.sleep(4) # async simulation
return prompt
async def process_prompt(prompt):
try:
return await asyncio.wait_for(async_process_prompt(prompt), timeout=6)
except asyncio.TimeoutError:
return "Processing timed out."
iface = gr.Interface(
fn=process_prompt,
inputs="text",
outputs="text",
title="Async Prompt Processor with Timeout",
description="Enter a prompt to process asynchronously. Times out after 5 seconds."
)
iface.launch()
Screenshot
No response
Logs
No response
System Info
gradio version = '4.2.0'
Severity
I can work around it
The text was updated successfully, but these errors were encountered: