Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiktoken version is too old for gpt-3.5-turbo #1881

Closed
xingfanxia opened this issue Mar 22, 2023 · 11 comments · Fixed by #1882
Closed

Tiktoken version is too old for gpt-3.5-turbo #1881

xingfanxia opened this issue Mar 22, 2023 · 11 comments · Fixed by #1882

Comments

@xingfanxia
Copy link
Contributor

Traceback (most recent call last):
  File "/Users/xingfanxia/projects/notion-qa/qa.py", line 25, in <module>
    result = chain({"question": args.question})
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/base.py", line 116, in __call__
    raise e
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/qa_with_sources/base.py", line 118, in _call
    answer, _ = self.combine_documents_chain.combine_docs(docs, **inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py", line 143, in combine_docs
    return self._process_results(results, docs, token_max, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py", line 173, in _process_results
    num_tokens = length_func(result_docs, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 83, in prompt_length
    return self.llm_chain.llm.get_num_tokens(prompt)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/chat_models/openai.py", line 331, in get_num_tokens
    enc = tiktoken.encoding_for_model(self.model_name)
  File "/opt/homebrew/lib/python3.10/site-packages/tiktoken/model.py", line 51, in encoding_for_model
    raise KeyError(
KeyError: 'Could not automatically map gpt-3.5-turbo to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'
hwchase17 pushed a commit that referenced this issue Mar 23, 2023
Fix #1881
This issue occurs when using `'gpt-3.5-turbo'` with
`VectorDBQAWithSourcesChain`
@harithzulfaizal
Copy link

harithzulfaizal commented Mar 30, 2023

I seem to be encountering the same issue when using gpt-4 despite having the latest version of Tiktoken. Any ideas as to why?

KeyError                                  Traceback (most recent call last)
Cell In[8], line 2, in answer(question)
      1 def answer(question):
----> 2     return chain({"question": question}, return_only_outputs=True)

File c:\Users\gpharith\Documents\langchain-policydoc\langchain4u\lib\site-packages\langchain\chains\base.py:116, in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File c:\Users\gpharith\Documents\langchain-policydoc\langchain4u\lib\site-packages\langchain\chains\base.py:113, in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
    112 try:
--> 113     outputs = self._call(inputs)
...
     70         "Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect."
     71     ) from None
     73 return get_encoding(encoding_name)

KeyError: 'Could not automatically map gpt-4 to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'

@plchld
Copy link

plchld commented Apr 26, 2023

I get the same issue when i use AzureOpenAI gpt 3.5

KeyError                                  Traceback (most recent call last)
Cell In[69], line 22
     20 PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
     21 chain = load_summarize_chain(gpt35, chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
---> 22 chain({"input_documents": docs}, return_only_outputs=True)

File [~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:116](https://file+.vscode-resource.vscode-cdn.net/Users/nikolasmolyndris/OSINT/Projects/News-Copilot/~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:116), in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File [~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:113](https://file+.vscode-resource.vscode-cdn.net/Users/nikolasmolyndris/OSINT/Projects/News-Copilot/~/.pyenv/versions/3.10.0/envs/local/lib/python3.10/site-packages/langchain/chains/base.py:113), in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
    112 try:
--> 113     outputs = self._call(inputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
...
     72         "Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect."
     73     ) from None
     75 return get_encoding(encoding_name)

KeyError: 'Could not automatically map gpt-35-turbo to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'

@awhillas
Copy link

Seems to be setup to handle the latest https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L13

@SamOyeAH
Copy link

Has anyone been able to solve this?

@peterjhwang
Copy link

I had the same issue. It works for me after updating tiktoken version.

@sangeetkumar1988
Copy link

sangeetkumar1988 commented May 30, 2023

Hi Peter, I'm facing the same issue. Can you please let me know the tiktoken version you used to resolve the issue. As I have updated it to the latest version and langchain also has the updated version. But still it's giving the error

@peterjhwang
Copy link

At the moment, I am using.
tiktoken==0.4.0
langchain==0.0.178
You can check the model you are using is included in MODEL_TO_ENCODING from here.
https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

@sangeetkumar1988
Copy link

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization?

@sangeetkumar1988
Copy link

sangeetkumar1988 commented Jun 1, 2023

At the moment, I am using. tiktoken==0.4.0 langchain==0.0.178 You can check the model you are using is included in MODEL_TO_ENCODING from here. https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization or RetrievalQuestionAnswering kind of work?

@sangeetkumar1988
Copy link

Any idea why chain_type='map_reduce' can't be used with custom prompt template. Like if we mention chain_type='map_reduce' the method doesn't accept prompt=PROMPT.

@NageshMashette
Copy link

At the moment, I am using. tiktoken==0.4.0 langchain==0.0.178 You can check the model you are using is included in MODEL_TO_ENCODING from here. https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

Thanks for your response. just to know if we can use gpt-35-turbo for text summarization or RetrievalQuestionAnswering kind of work?

yes you can use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants