Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI async support #15

Open
bcordo opened this issue Jan 8, 2024 · 7 comments · Fixed by #16
Open

OpenAI async support #15

bcordo opened this issue Jan 8, 2024 · 7 comments · Fixed by #16

Comments

@bcordo
Copy link

bcordo commented Jan 8, 2024

@ishanjainn

What happened?
I'm following this guide: https://grafana.com/blog/2023/11/02/monitor-your-openai-usage-with-grafana-cloud/ and it works great with normal synchronous OpenAI client "from openai import OpenAI; client = OpenAI(apikey=OPENAIAPIKEY)" but it doesn't work with the asynchronous client "from openai import AsyncOpenAI; aclient = AsyncOpenAI(apikey=OPENAIAPIKEY)"

I get the error:
"Exception has occurred: AttributeError
'coroutine' object has no attribute 'usage'
File ..., in asynccall_chatgpt
response = await aclient.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1, in
AttributeError: 'coroutine' object has no attribute 'usage'"

It has to do with the fact that internally it's not using await and so it's not expecting a coroutine.

What was expected to happen?
I was expecting it to also work, or to have access to an async version of the monitoring code.

Steps to reproduce the problem:

  1. Run
    from openai import AsyncOpenAI
    aclient = AsyncOpenAI(apikey=OPENAIAPIKEY) aclient.chat.completions.create = chatv2.monitor(
    aclient.chat.completions.create,
    metricsurl="[redacted]",
    metricsusername=[redacted],
    logsusername=[redacted],
    access_token="[redacted]"
  2. Run :
    response = await aclient.chat.completions.create(
    model=model,
    temperature=temperature,
    maxtokens=maxtokens,
    n=maxresponses, topp=topp, frequencypenalty=frequencypenalty, presencepenalty=presence_penalty,
    messages=messages,
    stream=stream,
    )
  3. Get error:
    Exception has occurred: AttributeError
    'coroutine' object has no attribute 'usage'
    File ... in asynccall_chatgpt
    response = await aclient.chat.completions.create(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "", line 1, in
    AttributeError: 'coroutine' object has no attribute 'usage'

Version numbers (grafana, prometheus, graphite, plugins, operating system, etc.):
Mac, grafana cloud, OpenAI plugin

@ishanjainn
Copy link
Member

ishanjainn commented Jan 10, 2024

Hey @bcordo , Thanks for raising this issue. I have added support for AsyncOpenAI, It should be available in the python library version 0.0.8. To use with AsyncOpenAI, You'll now need to pass a flag to the function - use_async and set its value as True. Here's a snippet

from openai import OpenAI
import asyncio
from grafana_openai_monitoring import chat_v2

client = OpenAI(
    api_key="sk-***",
)

# Apply the custom decorator to the OpenAI API function
client.chat.completions.create = chat_v2.monitor(
    client.chat.completions.create,
    metrics_url="https://prometheus.grafana.net/api/prom",  # Example: "https://prometheus.grafana.net/api/prom"
    logs_url="https://logs.grafana.net/loki/api/v1/push",  # Example: "https://logs.example.com/loki/api/v1/push/"
    metrics_username=123456,  # Example: "123456"
    logs_username=987654,  # Example: "987654"
    access_token="glc_ey....",
    use_async=True,  # Set to True if the function is async
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
    )

    print(chat_completion)

asyncio.run(main())

Please feel free to reopen the issue if you still encounter issues

@bcordo
Copy link
Author

bcordo commented Jan 10, 2024

Wow @ishanjainn that was super fast! I appreciate it, you're awesome. Thanks.

@bcordo
Copy link
Author

bcordo commented Jan 10, 2024

Hi @ishanjainn. Thanks for your fast response. I ran your sample code:

import os
from openai import OpenAI
import asyncio
from grafana_openai_monitoring import chat_v2
from dotenv import load_dotenv, find_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

client = OpenAI(
    api_key=OPENAI_API_KEY,
)

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
DEEPL_AUTH_KEY = os.getenv("DEEPL_AUTH_KEY")
GRAFANA_API_KEY = os.getenv("GRAFANA_API_KEY")
GRAFANA_METRICS_URL = os.getenv("GRAFANA_METRICS_URL")
GRAFANA_LOGS_URL = os.getenv("GRAFANA_LOGS_URL")
GRAFANA_METRICS_USERNAME = os.getenv("GRAFANA_METRICS_USERNAME")
GRAFANA_LOGS_USERNAME = os.getenv("GRAFANA_LOGS_USERNAME")

client = OpenAI(api_key=OPENAI_API_KEY)

# Apply the custom decorator to the OpenAI API client functions to measure usage
client.chat.completions.create = chat_v2.monitor(
    client.chat.completions.create,
    metrics_url=GRAFANA_METRICS_URL,
    logs_url=GRAFANA_LOGS_URL,
    metrics_username=GRAFANA_METRICS_USERNAME,
    logs_username=GRAFANA_LOGS_USERNAME,
    access_token=GRAFANA_API_KEY,
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
    )

    print(chat_completion)

asyncio.run(main())

But I get the error:

Traceback (most recent call last):
  File "test.py", line 48, in <module>
    asyncio.run(main())
  File ".conda/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "test.py", line 36, in main
    chat_completion = await client.chat.completions.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object ChatCompletion can't be used in 'await' expression

I thought there may have been a typo so I reran using AsyncOpenAI instead of OpenAI:

client = AsyncOpenAI(
    api_key=OPENAI_API_KEY,
)

And I get the error:

Traceback (most recent call last):
  File "test.py", line 48, in <module>
    asyncio.run(main())
  File ".conda/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "test.py", line 36, in main
    chat_completion = await client.chat.completions.create(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/site-packages/grafana_openai_monitoring/chat_v2.py", line 161, in wrapper
    response.usage.prompt_tokens,
    ^^^^^^^^^^^^^^
AttributeError: 'coroutine' object has no attribute 'usage'
sys:1: RuntimeWarning: coroutine 'AsyncCompletions.create' was never awaited
python test.py
Traceback (most recent call last):
  File "test.py", line 48, in <module>
    asyncio.run(main())
  File ".conda/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "test.py", line 36, in main
    chat_completion = await client.chat.completions.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object ChatCompletion can't be used in 'await' expression

And just to make sure I was using the updated version, which seems right:

pip show grafana_openai_monitoring 
Name: grafana-openai-monitoring
Version: 0.0.8
Summary: Library to monitor your OpenAI usage and send metrics and logs to Grafana Cloud
Home-page: https://github.com/grafana/grafana-openai-monitoring
Author: Ishan Jain
Author-email: ishan.jain@grafana.com
License: 
Location: .conda/lib/python3.11/site-packages
Requires: requests

Maybe I am just missing something here.

@ishanjainn
Copy link
Member

ishanjainn commented Jan 10, 2024

use_async=True

This seems to missing

@bcordo
Copy link
Author

bcordo commented Jan 15, 2024

@ishanjainn Thanks again for the quick response. Yes, that worked thanks, the problem was I added use_async=True on my main script which had the error, and I see what the difference is. I need to use the streaming API, So when you add stream=True you get the error. In detail:

import os
import asyncio

from openai import OpenAI
from grafana_openai_monitoring import chat_v2
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
DEEPL_AUTH_KEY = os.getenv("DEEPL_AUTH_KEY")
GRAFANA_API_KEY = os.getenv("GRAFANA_API_KEY")
GRAFANA_METRICS_URL = os.getenv("GRAFANA_METRICS_URL")
GRAFANA_LOGS_URL = os.getenv("GRAFANA_LOGS_URL")
GRAFANA_METRICS_USERNAME = os.getenv("GRAFANA_METRICS_USERNAME")
GRAFANA_LOGS_USERNAME = os.getenv("GRAFANA_LOGS_USERNAME")

client = OpenAI(api_key=OPENAI_API_KEY)

# Apply the custom decorator to the OpenAI API client functions to measure usage
client.chat.completions.create = chat_v2.monitor(
    client.chat.completions.create,
    metrics_url=GRAFANA_METRICS_URL,
    logs_url=GRAFANA_LOGS_URL,
    metrics_username=GRAFANA_METRICS_USERNAME,
    logs_username=GRAFANA_LOGS_USERNAME,
    access_token=GRAFANA_API_KEY,
    use_async=True,
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
        stream=True,
    )
    
    for chunk in chat_completion:
        current_content = chunk.choices[0].delta.content
        print(current_content)

asyncio.run(main())
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "test.py", line 51, in <module>
    asyncio.run(main())
  File ".conda/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "test.py", line 37, in main
    chat_completion = await client.chat.completions.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".conda/lib/python3.11/site-packages/grafana_openai_monitoring/chat_v2.py", line 65, in async_wrapper
    response.usage.prompt_tokens,
    ^^^^^^^^^^^^^^
AttributeError: 'Stream' object has no attribute 'usage'

I guess the problem as reported here How_to_stream_completions.ipynb

Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.

Any good solutions to this? Thanks again.

@ishanjainn ishanjainn reopened this Jan 15, 2024
@ishanjainn
Copy link
Member

ishanjainn commented Jan 15, 2024

Hey @bcordo , For the streaming responses, More than token usage, The library doesnt yet supports streaming as it needs a bit different processing, We have it as an open issue right now #13.
Ill see what can be done for it

For token calculation when streaming, yeah tiktoken is the way to go (Although i don't think its 100% accurate also but gives a near good estimate)

@ishanjainn
Copy link
Member

Also reopened this issue while we get streaming implemented here. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants