Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why can I send multiple requests at once without a TPM limit? #21359

Open
5 tasks done
vsvn-ThuyTQ opened this issue May 7, 2024 · 0 comments
Open
5 tasks done

Why can I send multiple requests at once without a TPM limit? #21359

vsvn-ThuyTQ opened this issue May 7, 2024 · 0 comments
Labels
🔌: openai Primarily related to OpenAI integrations 🤖:question A specific question about the codebase, product, project, or how to use a feature

Comments

@vsvn-ThuyTQ
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

def llm_with_callback():
    return AzureChatOpenAI(
        azure_deployment = gpt-4-32k,
        azure_endpoint=os.environ.get('AZURE_OPENAI_ENDPOINT'),
        api_key = os.environ.get('AZURE_OPENAI_KEY'),
        api_version="2023-09-01-preview",
        cache=False,
        model_kwargs={"seed": 4},
        max_retries=1,
        temperature=0,
    )

async def test(spec_data, id):
    llm = llm_with_callback()
    now = datetime.now()
    sttime = now.strftime("%H:%M:%S")
    features_raw = await llm.ainvoke(str(spec_data))
    now = datetime.now()
    entime = now.strftime("%H:%M:%S")
    print(sttime)
    print(entime)
    print(f'Result{id}: ', str(features_raw))

async def main():
    with open("prompt_sample.txt", 'r', encoding='utf8') as f:
        spec_data = f.read()

    tasks = [
        test(spec_data, "1"),
        test(spec_data, "2"),
        test(spec_data, "3"),
        test(spec_data, "4"),
        test(spec_data, "5"),
        test(spec_data, "6"),
        test(spec_data, "7"),
    ]

    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

Error Message and Stack Trace (if applicable)

No response

Description

I calculated that each time I submit to GPT, it will cost me prompt: 20214 tokens and completion: 358 tokens. The TPM limit of gpt-4-32k is 80k TPM. So why do I make 7 requests at the same time in the same minute and why are no requests blocked?

System Info

AzureChatOpenAI
langchain

@dosubot dosubot bot added 🔌: openai Primarily related to OpenAI integrations 🤖:question A specific question about the codebase, product, project, or how to use a feature labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌: openai Primarily related to OpenAI integrations 🤖:question A specific question about the codebase, product, project, or how to use a feature
Projects
None yet
Development

No branches or pull requests

1 participant