Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt token consumption grows until /reset #34

Closed
k3it opened this issue Mar 3, 2023 · 8 comments
Closed

Prompt token consumption grows until /reset #34

k3it opened this issue Mar 3, 2023 · 8 comments

Comments

@k3it
Copy link
Contributor

k3it commented Mar 3, 2023

Hi.
Thank you for the updates and support of the turbo model!

I noticed that each subsequent query within a conversation uses up more prompt tokens. this continues until I reset the session. Does this sound like the correct behavior?

here is an example of the same prompt, with each iteration increasing token usage:

image

@n3d1117
Copy link
Owner

n3d1117 commented Mar 3, 2023

Hi @k3it, good catch. Yeah, I think it's showing the sum of the prompt tokens used within a conversation.

I would say that's the expected behavior, since I'm not doing any calculations myself, just printing the usage object that is returned by the API. Anyway I will keep this issue open in case anyone has more knowledge on this!

@k3it
Copy link
Contributor Author

k3it commented Mar 3, 2023

I did some more checking and it looks like "Token used" is an individual counter for each prompt/response transaction. it is not a cumulative for the current conversation. So each new question within the same conversation becomes more expensive to ask.

After a long session without a /reset it may be cheaper to buy and weight the banana yourself instead of asking the bot about it :)

edit: i see now that message query and answer history is added to each completion. that explains the growth I think

self.__add_to_history(chat_id, role="user", content=query)
response = openai.ChatCompletion.create(
model=self.config['model'],
messages=self.sessions[chat_id],
temperature=self.config['temperature'],

self.__add_to_history(chat_id, role="assistant", content=answer)

edit2: here is what the bot thinks about this (might not be accurate?)
The GPT completion API can remember the context of the conversation by itself using its internal memory, without needing to send the full message history back each time a new message is sent.

When you create a new completion request using the GPT API, you can include some context that the endpoint can use to better understand the request. This context can come from the last few messages in the conversation, as well as any additional information you provide. The GPT model can then use this context to generate a more accurate and relevant response.

@n3d1117
Copy link
Owner

n3d1117 commented Mar 4, 2023

Indeed it looks like sending history also cosumes tokens.

Here's what the docs say:

Including the conversation history helps when user instructions refer to prior messages. [...] Because the models have no memory of past requests, all relevant information must be supplied via the conversation. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.

So I agree that history needs to be truncated or summarized somehow. I commented on your PR for possible solutions

@n3d1117
Copy link
Owner

n3d1117 commented Mar 5, 2023

@k3it Should be fixed in 946f6a4. Feel free to reopen if issue persists

@n3d1117 n3d1117 closed this as completed Mar 5, 2023
@em108
Copy link

em108 commented Mar 6, 2023

Hello and thanks a lot for the updates and fixes
I've seen this application of embeddings in another bot (link below) to solve the token issue. Would it be a good idea to implement to further reduce on the token consumption?

https://github.com/LagPixelLOL/ChatGPTCLIBot

@n3d1117
Copy link
Owner

n3d1117 commented Mar 6, 2023

Hi @em108 I'm not familiar with embeddings or how they can be implemented in python. What are the advantages?

@em108
Copy link

em108 commented Mar 6, 2023

From what I've gathered from multiple sources including the article below, embeddings can aid in long term memory / a way to store conversation data. Based on the cost and application, they can cost up to 5 - ~9 times less than sending chat history.

Article:
https://towardsdatascience.com/generative-question-answering-with-long-term-memory-c280e237b144

Also an example of a notebook utilizing embeddings:
https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

@n3d1117
Copy link
Owner

n3d1117 commented Mar 7, 2023

Very interesting, thanks @em108. Would be great, although I currently don't have the time to implement this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants