Semantic search token usage tracking #63591

lbrdnk · 2025-09-16T18:18:16Z

Closes https://linear.app/metabase/issue/BOT-410/implement-token-usage-tracking-for-semantic-search

This PR adds tracking of tokens consumed by embeddings api calls initiated from semantic search. Number of tokens of every request is stored in appdb. The tables is trimmed every day, storing rolling 2 months of data.

This PR

add migration with the new appdb table,
adds new model, :model/SemanticSearchTokenTracking,
modifies get-embedding... implementations to take opts,
passes opts from near toplevel (upsert-index!, query-index) with appropriate request :type, that is stored with token count in the new table,
adds new daily job to trim that table,
adds tests.

github-actions · 2025-09-19T21:14:39Z

e2e tests failed on `b8c2364b945cc6f2e1af9b401adb46b76f2baddd-3`

e2e test run

File	Test Name
`embedding-reproductions.cy.spec.js`	(flaky) issue 40660 > static dashboard content shouldn't overflow its container (#40660)

piranha · 2025-09-23T11:16:55Z

resources/migrations/056_update_migrations.yaml

+                  constraints:
+                    nullable: false
+              - column:
+                  name: total_tokens


do we not want to store in/out separately? :)

@piranha probably no need as it's an embedding model, so tracking total tokens is enough

piranha

So the logic seems fine, but I kind of want confirmation that total is fine and we don't want separate in/out figures.

piranha · 2025-09-23T11:27:13Z

enterprise/backend/test/metabase_enterprise/semantic_search/embedding_test.clj

+                  (is (= 1 (t2/count :model/SemanticSearchTokenTracking)))
+                  (let [{:keys [request_type total_tokens]}
+                        (t2/select-one :model/SemanticSearchTokenTracking)]
+                    (is (= :index request_type))
+                    (is (= 13 total_tokens))))


This is by no means a request to change, but I personally find it easier to read =? stuff, like this:

(is (=? [{:request_type :index :total_tokens 13}] (t2/select :model/SemanticSearchTokenTracking)))

* Pass type of embedding request * Initial migration * Add remaining opts args * Add SemanticSearchTokenTracking module * Connect token tracking to ai-service impl * Add test for token tracking writes. * Remove prompt_tokens including migration * Add usage trimmer job * Trimmer test * Activate trimmer job * test * Record tokens for openai * Update test * Use total-tokens * Comment * linter * Add index * Exclude from copy

* Pass type of embedding request * Initial migration * Add remaining opts args * Add SemanticSearchTokenTracking module * Connect token tracking to ai-service impl * Add test for token tracking writes. * Remove prompt_tokens including migration * Add usage trimmer job * Trimmer test * Activate trimmer job * test * Record tokens for openai * Update test * Use total-tokens * Comment * linter * Add index * Exclude from copy Co-authored-by: lbrdnk <lbrdnk@users.noreply.github.com>

* Pass type of embedding request * Initial migration * Add remaining opts args * Add SemanticSearchTokenTracking module * Connect token tracking to ai-service impl * Add test for token tracking writes. * Remove prompt_tokens including migration * Add usage trimmer job * Trimmer test * Activate trimmer job * test * Record tokens for openai * Update test * Use total-tokens * Comment * linter * Add index * Exclude from copy

metabase-bot bot assigned lbrdnk Sep 16, 2025

metabase-bot bot added the .Team/Metabot Metabot team label Sep 16, 2025

lbrdnk force-pushed the sem-search-token-tracking branch from 3b8e11b to f39c72d Compare September 17, 2025 14:14

lbrdnk added 10 commits September 19, 2025 13:06

Pass type of embedding request

89b8d3d

Initial migration

5e736ac

Add remaining opts args

4399be1

Add SemanticSearchTokenTracking module

0a61b16

Connect token tracking to ai-service impl

8da8fcd

Add test for token tracking writes.

e945bf2

Remove prompt_tokens including migration

93abdbf

Add usage trimmer job

6b7a2f6

Trimmer test

c4a0141

Activate trimmer job

ba98ce2

lbrdnk force-pushed the sem-search-token-tracking branch from f39c72d to ba98ce2 Compare September 19, 2025 14:01

lbrdnk added the backport Automatically create PR on current release branch on merge label Sep 19, 2025

lbrdnk added 7 commits September 19, 2025 16:41

test

cc0585e

Record tokens for openai

2041251

Update test

0a4d392

Use total-tokens

a647048

Comment

87562e2

linter

d7d0bd4

Add index

c4ee59e

lbrdnk changed the title ~~[WIP] Semantic search token usage tracking~~ Semantic search token usage tracking Sep 19, 2025

lbrdnk requested a review from a team September 19, 2025 15:37

Exclude from copy

b8c2364

piranha reviewed Sep 23, 2025

View reviewed changes

piranha approved these changes Sep 23, 2025

View reviewed changes

lbrdnk merged commit 63cc5c3 into master Sep 23, 2025
506 of 516 checks passed

lbrdnk deleted the sem-search-token-tracking branch September 23, 2025 14:50

github-automation-metabase mentioned this pull request Sep 23, 2025

🤖 backported "Semantic search token usage tracking" #63900

Merged

github-actions bot added this to the 0.56.7 milestone Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Semantic search token usage tracking #63591

Semantic search token usage tracking #63591

Uh oh!

lbrdnk commented Sep 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

piranha Sep 23, 2025

Uh oh!

retro Sep 23, 2025

Uh oh!

piranha left a comment

Uh oh!

piranha Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Semantic search token usage tracking #63591

Semantic search token usage tracking #63591

Uh oh!

Conversation

lbrdnk commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

e2e tests failed on b8c2364b945cc6f2e1af9b401adb46b76f2baddd-3

Uh oh!

piranha Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

retro Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

piranha left a comment

Choose a reason for hiding this comment

Uh oh!

piranha Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lbrdnk commented Sep 16, 2025 •

edited

Loading

github-actions bot commented Sep 19, 2025 •

edited

Loading

e2e tests failed on `b8c2364b945cc6f2e1af9b401adb46b76f2baddd-3`

piranha Sep 23, 2025 •

edited

Loading