Skip to content

v1.2.0 — Multimodal token estimation + pricing inventory refresh

Choose a tag to compare

@Mandark-droid Mandark-droid released this 28 Apr 19:32
· 15 commits to main since this release

What's new

Closes the cost-coverage gap for multimodal traces where the provider response omits usage data. Validated against demo-multimodal_edge spans on OpenSearch where ~80% of multimodal traffic previously showed cost=0.

Added

  • New public module: genai_otel.cost_estimation — token / cost estimation helpers usable from external custom providers (tracesense / chaos-lab providers that bypass transformers.pipeline and the standard ollama entry points).
    • estimate_pipeline_usage(task, args, kwargs, result, pipe=None)
    • estimate_chat_usage(messages, response_text, image_count=0, audio_seconds_in=0.0)
    • count_images(args, kwargs), audio_seconds(args, kwargs, sampling_rate=None)
    • coerce_text(value), result_text(result)
    • Overridable module constants: CHARS_PER_TOKEN=4, IMAGE_TOKEN_ESTIMATE=256, AUDIO_TOKENS_PER_SECOND=50
  • BaseInstrumentor._estimate_usage hook — fallback estimator that fires when _extract_usage returns None. Spans are tagged gen_ai.usage.token_count_estimated=true so consumers can distinguish exact from estimated counts.
  • OllamaInstrumentor multimodal token estimation — char-count fallback (4 chars/token) + per-image token floor (256 tok/image) for /api/chat (messages[].content + messages[].images) and /api/generate (prompt + top-level images). Fixes cost=0 on multimodal Ollama spans.
  • HuggingFaceInstrumentor pipeline-aware token estimation_record_pipeline_usage_and_cost now handles image-text-to-text, image-to-text, visual-question-answering, image-classification, automatic-speech-recognition, audio-classification, audio-to-audio, text-to-image, text-to-speech. Emits per-modality attributes gen_ai.usage.image_count and gen_ai.usage.audio_seconds.
  • Pricing inventory:
    • New: gpt-image-1, gpt-image-2 (low/medium/high quality × 1024/1536 sizes)
    • New: black-forest-labs/FLUX.2-pro, FLUX.2-max, FLUX.2-flex, FLUX.2-klein-4b, FLUX.2-klein-9b, FLUX.2-dev (sourced from official BFL pricing docs)
    • New: gemini-3-pro-image-preview alias for nano-banana-pro

Changed

  • Refined Gemini image pricing to current 2026 rates:
    • nano-banana / gemini-2-5-flash-image: from $0.03/MP to per-resolution ($0.039 @ 1024×1024) plus a new batch quality tier (50% off).
    • nano-banana-pro / gemini-3-pro-image-preview: $0.134 for 1K-2K, $0.24 for 4K (standard); 50% off batch.
    • nano-banana-2 / gemini-3.1-flash-image: gain a batch quality tier.
  • Imagen 3.0 / 4.0 entries reshaped from scalar floats into the standard {quality: {dimension: price}} shape so the calculator can actually evaluate them.

Fixed

  • Non-chat call types (image, audio, embedding, …) now also set gen_ai.usage.cost.total on the span. Previously cost was added to the metric counter but dropped from span attributes, so backends couldn't aggregate cost per image-gen / audio / embedding span.

Tests

  • +35 tests vs baseline (1270 → 1305 passing). 0 regressions.
  • 2 pre-existing failures (test_mcp_instrumentor_integration, test_smolagents_instrumentor_integration) unchanged — unrelated to this release.

Docs

  • CHANGELOG: 1.2.0 section
  • docs/reference/semantic-conventions.md: documented the three new attributes (gen_ai.usage.token_count_estimated, gen_ai.usage.image_count, gen_ai.usage.audio_seconds)

Migration

No breaking changes. The new estimation paths only fire when the provider response would otherwise leave usage empty, and the per-modality attributes (image_count, audio_seconds) are additive. Existing callers see strictly more cost coverage with the same span / metric surface.

External custom providers (e.g. tracesense providers that bypass transformers.pipeline) can opt in to the same estimation logic by importing genai_otel.cost_estimation:

from genai_otel.cost_estimation import estimate_pipeline_usage, estimate_chat_usage

usage = estimate_pipeline_usage(
    task="image-text-to-text",
    args=(),
    kwargs={"images": [pil_image], "inputs": prompt_text},
    result=pipeline_output,
    pipe=hf_pipeline,
)
# usage = {prompt_tokens, completion_tokens, total_tokens, image_count, audio_seconds, estimated: True}