Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token Usage Tracking #85

Merged
merged 33 commits into from
Dec 18, 2022
Merged

Token Usage Tracking #85

merged 33 commits into from
Dec 18, 2022

Conversation

teoh
Copy link
Collaborator

@teoh teoh commented Dec 7, 2022

What is this?

From #56. This PR adds support for counting tokens used during calls to the LLM. This is done via the decorator llm_token_counter() that lives in gpt_index/utils.py.

At the moment, this decorator can only be used on class instance methods with a _llm_predictor attribute.

e.g.

    class GPTTreeIndexBuilder:
        ...
        @llm_token_counter("build_from_text")
        def build_from_text(self, documents: Sequence[BaseDocument]) -> IndexGraph:
            ...

If you run build_from_text(), it will print the output in the form below:

    [build_from_text] Total token usage: <some-number> tokens

Why do we need this?

Calls to LLMs such as GPT3 cost money. For example, from OpenAI's pricing, the Davinci endpoint is $0.02 for every 1000 tokens.

Since gpt_index makes multiple LLM calls when building the index, it's handy to know how many tokens we're going through.

Remaining TODOs for this PR

  • add tests specific to token tracking. may have to patch the OpenAI object in chain_wrapper.py; we cannot mock the whole LLMPredictor object since we need the _total_tokens_used instance attribute to do its thing
  • add support for remaining index/query classes that call the LLM
  • test with openai billing and usage
  • consider adding this change on the abstract class level so that we're not repeating the same code everywhere, or turn it into a decorator (this may be helpful).

Other comments

Other implementations I considered

  • add a variable to the LLM prediction response: adding more seemed complicated. today you'll have 3 return values but what if you have 4 tomorrow
  • make a separate class to count this: seemed unnecessary

We might also miss token counts if you call the llm where we're not surrounding with token_start and end.

For the future

  • estimate cost before running indexing or querying
  • actual dollar cost

Known issues:

Sometimes the token count is off by a few. See this issue for an example: openai/openai-python#150

@teoh teoh changed the title [Work in progress] Token Usage Tracking Token Usage Tracking Dec 12, 2022
Copy link
Collaborator

@jerryjliu jerryjliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this! a few comments/questions

gpt_index/indices/base.py Show resolved Hide resolved
gpt_index/utils.py Show resolved Hide resolved
gpt_index/utils.py Outdated Show resolved Hide resolved
tests/indices/embedding/test_base.py Outdated Show resolved Hide resolved
tests/indices/keyword_table/test_base.py Show resolved Hide resolved
@jerryjliu jerryjliu merged commit 9b3c262 into run-llama:main Dec 18, 2022
viveksilimkhan1 pushed a commit to viveksilimkhan1/llama_index that referenced this pull request Oct 30, 2023
…pt-3.5-turbo` (run-llama#85)

* update default recommended openai model from text-davinci-003 to gpt-3.5-turbo

* fix unintended update in models list in README
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants