Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There seems to be a problem with tokenizer installation #64

Closed
rahilshah105 opened this issue Mar 15, 2023 · 2 comments
Closed

There seems to be a problem with tokenizer installation #64

rahilshah105 opened this issue Mar 15, 2023 · 2 comments

Comments

@rahilshah105
Copy link

rahilshah105 commented Mar 15, 2023

I have install tokenizer using when I try to call it with:

import tiktoken

tokenizer = tiktoken.get_encoding("cl100k_base")

df = pd.read_csv('processed/scraped.csv', index_col=0)
df.columns = ['title', 'text']

df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))

df.n_tokens.hist()

I get the following problem

ModuleNotFoundError Traceback (most recent call last)
Cell In [1], line 1
----> 1 import tiktoken
3 # Load the c1100k base tokenizer which is designed to work with the ada-002 model
4 tokenizer = tiktoken.get_encoding("c1100k_base")
ModuleNotFoundError: No module named 'tiktoken'

@hauntsaninja
Copy link
Collaborator

pip install tiktoken is the way to install tiktoken

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
@rahilshah105
Copy link
Author

Hi. I have already installed tiktoken using pip install tiktoken and got this problem
Screenshot 2023-03-15 at 4 38 45 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants