Release v0.12.0 · sergey-homenko/llm_cost_tracker

Added

bin/rails llm_cost_tracker:rebuild_rollups rebuilds the llm_cost_tracker_call_rollups cache from the calls ledger — populate it after turning on config.cache_rollups for an app with existing calls, or resync it if rollup totals ever drift from the calls.

Removed

BREAKING: the experimental Reconciliation subsystem (provider invoice import + diff, the /reconciliation dashboard page, bin/rails llm_cost_tracker:reconcile:* rake tasks, config.reconciliation_enabled, config.reconciliation_importers, the llm_cost_tracker:reconciliation generator, and the llm_cost_tracker_provider_invoices / _provider_invoice_imports tables) is gone. It was never finished and never billing-accurate. calls.provider_response_id (captured on every call) already covers invoice cross-reference; if invoice-vs-ledger reconciliation ships again it lives in a separate gem. Existing installs can drop the two tables — see docs/upgrading.md.
config.instrument :gemnii (or any other typo / unknown integration name) no longer raises at config time — it now logs Logging.warn("Unknown integration: :gemnii. Known: ...") once when integrations install, and bin/rails llm_cost_tracker:doctor shows the unknown name as a :warn row so the typo is visible without crashing boot.
Pre-call budget enforcement for Azure-hosted OpenAI calls now keys on "azure_openai" (matching the recorded Call.provider), so pricing_overrides for Azure rates actually gate the call. Previously it always keyed on "openai" regardless of the SDK client's base_url.
BREAKING: removed the batch: keyword argument from LlmCostTracker.track, LlmCostTracker.track_stream, and stream.usage (inside track_stream blocks). Signal a batch-tier call via pricing_mode: :batch (or any pricing_mode containing the batch token like :batch_flex) — that's the single source of truth now. Previously batch: and pricing_mode: could disagree, especially after request-side pricing_mode merge inside Tracker.record overwrote the parser's mode but left the stored batch flag stale, so calls.batch could read true while calls.pricing_mode read flex (or vice versa) for the same row.
The bin/rails llm_cost_tracker:prices:explain rake task (and LlmCostTracker::Pricing.explain) is removed — the dashboard's Data Quality page surfaces unknown-pricing models and their effective rates instead.

Changed

The RubyLLM SDK integration now requires ruby_llm >= 1.15.0 (was >= 1.14.1).
Engine no longer adds tag / tag_value to Rails filter_parameters — the Symbol filter was substring-matching unrelated host-app params (tags, meta_tag, etc.) into [FILTERED]. Tags::Sanitizer continues redacting secret-shaped tag values at storage.
BREAKING: the serialized event cost (the llm_request.llm_cost_tracker notification payload and the async-ingestion inbox payload) is now { components: {...}, total:, currency: } (was flat with a top-level total_cost:). Notification subscribers should read cost[:total]; ingestion: :async rolling deploys should drain the inbox first — see docs/upgrading.md.
BREAKING: pricing_mode in the llm_request.llm_cost_tracker notification payload is now a String (e.g. "batch", "fast_data_residency"), not a Symbol — subscribers matching it against a Symbol must compare to the String.
BREAKING: LlmCostTracker.track(tokens:) now takes the same _tokens-suffixed keys as stream.usage and the stored columns — input_tokens, output_tokens, cache_read_input_tokens, audio_input_tokens, etc. (was the short input, output, cache_read_input, …). Update manual track calls. Pricing-file / pricing_overrides field names are unchanged — they stay input, output, … (per-component rates, a separate vocabulary).

Fixed

RubyLLM streaming chats to Anthropic and Gemini (chat.ask { |chunk| }) are now recorded — previously the streamed response's raw body is the SSE text rather than the parsed hash the integration read, so an internal lookup raised and the call was silently dropped from the ledger. Blocking RubyLLM calls were unaffected.
A malformed or very long pricing_mode (or a provider service_tier / speed with many underscore-separated tokens) no longer hangs cost calculation — the call lands cost_status: unknown instead of pinning a CPU.
Gemini preview models dated with a four-digit year (e.g. gemini-2.5-flash-preview-09-2025) now fall back to the stable model's price instead of landing cost_status: unknown.
A typo'd price-key prefix in pricing_overrides or a custom prices_file (e.g. bath_input for batch_input, or any unknown <mode>_<component>) now logs an Unknown price keys warning and is ignored, instead of being silently accepted so the override quietly never applied at the intended mode/tier.
Anthropic responses with service_tier: "priority" now keep :priority as their pricing_mode instead of being silently billed at standard rates — committed-tier customers get cost_status: unknown (signaling to add priority_input/priority_output to pricing_overrides) instead of an over-counted USD figure that ignores their commitment discount.
OpenAI's scale enterprise tier and priority tier are now recognized as pricing modes (no more Logging.warn about unknown tokens); calls land as cost_status: unknown when negotiated rates are absent so you can add them via pricing_overrides.
Gemini responses echoing usageMetadata.serviceTier: "unspecified" (the default) now resolve to standard pricing instead of warning about an unknown token and landing as cost_status: unknown.
Anthropic SDK batch results (client.messages.batches.results_streaming(id).each) land in the ledger with pricing_mode: :batch and the per-result provider_response_id, with a same-process best-effort dedup against already-ledgered provider_response_ids so re-iterating the stream doesn't duplicate rows (concurrent retrieves from multiple processes can still race; async-mode rows in the inbox aren't checked until they drain).
OpenAI SDK batch processing auto-captures: client.batches.retrieve(id) on a completed batch downloads the output JSONL and emits one ledger event per response with pricing_mode: :batch and the per-response provider_response_id, with the same best-effort dedup as Anthropic batches.
OpenRouter pricing is now scraped via openrouter.ai/api/v1/models, so RubyLLM-routed OpenRouter calls (e.g. openrouter/openai/gpt-4o) get a real total_cost from the next prices_file refresh instead of landing as cost_status: unknown. The scrape also captures image / audio per-token rates so OpenRouter calls with multimodal inputs bill against the correct bucket instead of folding image/audio tokens into the text-input rate.
Misspelled pricing_mode: values now log a Logging.warn listing the unrecognized token (e.g. :bach for :batch) so the resulting cost_status: unknown call surfaces a typo instead of silently absorbing it; the warn fires once per unique token.
Whisper-style transcriptions whose response carries usage.type = "duration" now emit a transcription_minute line item (quantity = ceil(seconds / 60)) across both the OpenAI Ruby SDK patch and the Faraday / RubyLLM HTTP path; the call previously recorded with zero tokens and no line item, so audio-minute usage was invisible.
OpenAI Responses-API image_generation_call and computer_call output items now emit line items so per-call hosted-tool usage shows up on the dashboard alongside the existing web_search_call / file_search_call / code_interpreter_call coverage.
LlmCostTracker.track(..., enforce_budget: true) now actually raises BudgetExceededError pre-call when the estimated cost (token cost plus priced service line items) overshoots the budget, even when budget_exceeded_behavior: :notify is configured — previously the kwarg silently no-op'd unless policy was already :block_requests.
Call#pricing_snapshot.rates now includes per-charge rates for non-token service line items (web search, MCP calls, TTS character billing, etc.) — previously only token rates were captured, so audit/replay of service-charge pricing had no record of the rate that was actually applied.
Tags with invalid keys (e.g. containing whitespace or characters outside [\w.-]) are now skipped at write with a Logging.warn instead of being silently written and then raising InvalidFilterError on dashboard read.
A raising default_tags proc is now captured by Logging.warn and falls back to empty default tags, so a broken user callback doesn't take down every tracked call.
LlmCostTracker::Ingestion::Worker.shutdown!(drain: true) always attempts the final inbox flush even if waking the worker thread raises, so pending inbox rows aren't left when the host process exits.
Gemini preview-dated models (e.g. gemini-2.5-flash-preview-04-17) now resolve to the stable entry's pricing — previously the preview-MM-DD suffix didn't match the dated-snapshot regex so the call landed as cost_status: unknown.
Gemini parser now reads usageMetadata.serviceTier from the response body in addition to the x-gemini-service-tier header, so tier-aware pricing applies when only the body carries the tier signal.
Line-item and pricing-snapshot currency is now stored uppercase regardless of prices_file casing, so a prices_file with currency: "eur" shows up as EUR everywhere and service-line items don't get partitioned out of header totals on a case mismatch with cost-data currency.
Async-ingestion inbox rows reaching MAX_ATTEMPTS_BEFORE_QUARANTINE now log a Logging.warn (with row ids) at the moment they quarantine, so production sees the event in Rails.logger instead of needing to run bin/rails llm_cost_tracker:doctor to discover it.
Dashboard "Setup required" page now flags missing llm_cost_tracker_ingestion_inbox_entries and llm_cost_tracker_ingestion_leases tables when ingestion: :async is configured — previously the drift only surfaced as a worker boot crash.
Gemini image-generation models (gemini-2.5-flash-image, gemini-3-pro-image-preview, gemini-3.1-flash-image-preview) and stable preview text models (gemini-3.1-pro-preview, gemini-2.5-flash-lite-preview-09-2025, etc.) are no longer dropped by the price scraper — they flow into the pricing snapshot on the next refresh cycle.
Gemini parser splits IMAGE-modality tokens from promptTokensDetails / candidatesTokensDetails (mirroring the existing AUDIO handling), so image-output usage from Gemini calls routes to image_output rates instead of falling into the text-output bucket.
RubyLLM SDK integration over-subtracted cache-read tokens from recorded input_tokens on chat completions, so the figure landed in the ledger short by the cache-read amount; the gem now passes RubyLLM's net input_tokens through unchanged.
RubyLLM SDK integration captures service_tier from response bodies across Anthropic, OpenAI, and Gemini — previously the field was read from the wrong JSON path so batch and flex modes silently priced against standard rates.
RubyLLM SDK integration records the provider's response id in provider_response_id (previously always nil), so each ledger row carries the upstream id you can cross-reference against provider invoices and logs.
RubyLLM Anthropic chat completions split 1-hour and 5-minute cache writes into separate token buckets so 1h writes bill at the 2x extended rate instead of being lumped into the 5m bucket at 1.25x.
Async-inbox total_cost now round-trips through the JSON payload without losing precision; previously the payload coerced BigDecimal to Float and dropped digits past ~15 significant figures, so high-volume aggregate billing under ingestion: :async came out systematically short. BREAKING for subscribers to the llm_request.llm_cost_tracker ActiveSupport::Notifications event: payload[:cost] numeric values are now decimal strings (was Float) — wrap with BigDecimal(value) before arithmetic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Removed

Changed

Fixed

Uh oh!