-
Couldn't load subscription status.
- Fork 169
[EIS] Adding more info on tokens #3673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔍 Preview links for changed docs |
| EIS is billed per million "tokens" used. Tokens can be thought of loosely as "words" which are given to a machine learning model to operate upon. The model may also produce a number of tokens in response. | ||
|
|
||
| For example, the sentence: | ||
|
|
||
| "It was the best of times, it was the worst of times." | ||
|
|
||
| contains 52 characters, but would be tokenised into 14 tokens - one for each of the 12 words, one for the comma, and one for the period character. | ||
|
|
||
| This is because machine learning models use words to denote meaning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd revise to something like this:
Token-based billing
EIS is billed per million tokens used. Tokens are the fundamental units that language models process for both input and output.
Tokenizers convert text into numerical data by segmenting it into subword units. A token may be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leemthompo !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the docs updates @seanhandley 👍
Added information about token usage in conversations.
|
@seanhandley added a bit of practical info to be super clear, lmkwyt |
|
Thanks @leemthompo ! One point here is that we do bill for output tokens for Chat models. But for embeddings models, we only bill for input tokens into the model - we don't bill for the embeddings that the model generates. I think it's important to be specific here. |
|
good point, let me clarify that 👍 |
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
No description provided.