Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ValueError: Usage object must have either {input, output, total, unit} or {promptTokens, completionTokens, totalTokens} #1249

Closed
pavelm10 opened this issue Feb 25, 2024 · 12 comments
Labels

Comments

@pavelm10
Copy link

Describe the bug

When using langfuse callback handler for from langchain_openai import OpenAI class the tracking fails. When using from langchain_openai import ChatOpenAI tracking works.

traceback:

Usage object must have either {input, output, total, unit} or {promptTokens, completionTokens, totalTokens}
Traceback (most recent call last):
File "/app/.venv/lib/python3.11/site-packages/langfuse/client.py", line 1125, in update
"usage": _convert_usage_input(usage) if usage is not None else None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/langfuse/utils/__init__.py", line 100, in _convert_usage_input
raise ValueError(
ValueError: Usage object must have either {input, output, total, unit} or {promptTokens, completionTokens, totalTokens}

To reproduce

How we use it

callbacks = []
lf_handler = CallbackHandler(
            langfuse_public_key,
            langfuse_secret_key,
            user_id=user_id,
        )
callbacks.append(lf_handler)

chain: Runnable = (
            {
                "text": itemgetter("text"),
                "change_instruction": itemgetter("change_instruction"),
            }
            | prompt
            | OpenAI(
                model="some-instruct-model-name",
                temperature=0,
                api_key="some-api-key",
                callbacks=callbacks,
                max_retries=3,
                timeout=30,
            )
            | StrOutputParser()
        )

Is the error expected because we are using langfuse wrongly with OpenAI class? Please note we cannot use in our use case ChatOpenAI class as we want to use instruct model.

Additional information

python 3.11
poetry 1.7.1
langfuse 2.16.2
langchain 0.1.9
langchain-openai 0.0.6

@maxdeichmann
Copy link
Member

@pavelm10 thanks for raising this issue! Does it only happen with "some-instruct-model-name"? Which model exactly are you using?

@maxdeichmann
Copy link
Member

I am trying to reproduce the issue here: https://github.com/langfuse/langfuse-python/pull/413/files

@pavelm10
Copy link
Author

@maxdeichmann I am using gpt-3.5-turbo-instruct. I will try to reproduce your test as well and come back to you.

@pavelm10
Copy link
Author

@maxdeichmann I tried your test and it indeed does not generate the traceback. However, our usecase is providing list of inputs to batch() method of the chain - sorry for not including it in the original description.

input_list = [
        {"question": "where did harrison work", "language": "english"},
        {"question": "how is your day", "language": "english"},
    ]
runnable_chain.batch(input_list)

@maxdeichmann
Copy link
Member

input_list = [
{"question": "where did harrison work", "language": "english"},
{"question": "how is your day", "language": "english"},
]
runnable_chain.batch(input_list)

Thanks for the clarification. This helped me to reproduce. I will ship a fix shortly

@maxdeichmann
Copy link
Member

@pavelm10 this is an issue by Langchain, but i added a workaround on our end to ensure tokens are correct for total LLM chains. You will see the total sum of tokens on the first Generation in Langfuse, all succeeding ones will have 0 tokens.

Background:
For batches, Langchain sends the sum for all tokens in the first event. All succeeding events send an empty dict. This hack ensures that we do not calculate tokens on our end for all the empty dicts which would result in wrong numbers for our users.

https://github.com/langfuse/langfuse-python/pull/413/files#diff-d9c0f4dec8a45fe6c408de5493f7580c8ad07ec7e251957afc57888405f35c8eR671-R673

@istandleet
Copy link

istandleet commented Feb 29, 2024

@maxdeichmann I believe I am running into the same issue. I tracked it down to https://github.com/langfuse/langfuse-python/blame/main/langfuse/utils/__init__.py#L55 . In particular, the object we are passing in from langchain/openAI is such that if you ran dict(usage) you would get out what you want, but if you run usage.__dict__ you get out some wrapper dict where there's a key called _previous that has the dict you want. (If you did nothing to it it also behaves as you want wrt key lookup)

@maxdeichmann
Copy link
Member

Ok do you have more details of that such as a screenshot? I get an empty object from langchain before even putting it into langfuse and the code line you were sharing.

@istandleet
Copy link

image

Hopefully this helps! it appears to be something called an OpenAIObject.

@maxdeichmann
Copy link
Member

@istandleet this release should fix your issue. Let me know, when you run into issues. Regarding the tokenisation, please create an issue at Langchain. Unfortunately, they only send tokens on the first generation.

@pavelm10
Copy link
Author

pavelm10 commented Mar 2, 2024

@maxdeichmann thanks for quick fix on your end.

@gautamcrhythmx
Copy link

gautamcrhythmx commented Mar 12, 2024

@maxdeichmann i am still facing this issue. as @istandleet pointed out, at this point calling __dict__ is causing wrong object to be passed on. can we make it as dict(usage) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants