v1.2.0 — Multimodal token estimation + pricing inventory refresh
What's new
Closes the cost-coverage gap for multimodal traces where the provider response omits usage data. Validated against demo-multimodal_edge spans on OpenSearch where ~80% of multimodal traffic previously showed cost=0.
Added
- New public module:
genai_otel.cost_estimation— token / cost estimation helpers usable from external custom providers (tracesense / chaos-lab providers that bypasstransformers.pipelineand the standardollamaentry points).estimate_pipeline_usage(task, args, kwargs, result, pipe=None)estimate_chat_usage(messages, response_text, image_count=0, audio_seconds_in=0.0)count_images(args, kwargs),audio_seconds(args, kwargs, sampling_rate=None)coerce_text(value),result_text(result)- Overridable module constants:
CHARS_PER_TOKEN=4,IMAGE_TOKEN_ESTIMATE=256,AUDIO_TOKENS_PER_SECOND=50
BaseInstrumentor._estimate_usagehook — fallback estimator that fires when_extract_usagereturnsNone. Spans are taggedgen_ai.usage.token_count_estimated=trueso consumers can distinguish exact from estimated counts.- OllamaInstrumentor multimodal token estimation — char-count fallback (4 chars/token) + per-image token floor (256 tok/image) for
/api/chat(messages[].content+messages[].images) and/api/generate(prompt+ top-levelimages). Fixes cost=0 on multimodal Ollama spans. - HuggingFaceInstrumentor pipeline-aware token estimation —
_record_pipeline_usage_and_costnow handlesimage-text-to-text,image-to-text,visual-question-answering,image-classification,automatic-speech-recognition,audio-classification,audio-to-audio,text-to-image,text-to-speech. Emits per-modality attributesgen_ai.usage.image_countandgen_ai.usage.audio_seconds. - Pricing inventory:
- New:
gpt-image-1,gpt-image-2(low/medium/high quality × 1024/1536 sizes) - New:
black-forest-labs/FLUX.2-pro,FLUX.2-max,FLUX.2-flex,FLUX.2-klein-4b,FLUX.2-klein-9b,FLUX.2-dev(sourced from official BFL pricing docs) - New:
gemini-3-pro-image-previewalias fornano-banana-pro
- New:
Changed
- Refined Gemini image pricing to current 2026 rates:
nano-banana/gemini-2-5-flash-image: from$0.03/MPto per-resolution ($0.039@ 1024×1024) plus a newbatchquality tier (50% off).nano-banana-pro/gemini-3-pro-image-preview:$0.134for 1K-2K,$0.24for 4K (standard); 50% off batch.nano-banana-2/gemini-3.1-flash-image: gain abatchquality tier.
- Imagen 3.0 / 4.0 entries reshaped from scalar floats into the standard
{quality: {dimension: price}}shape so the calculator can actually evaluate them.
Fixed
- Non-chat call types (
image,audio,embedding, …) now also setgen_ai.usage.cost.totalon the span. Previously cost was added to the metric counter but dropped from span attributes, so backends couldn't aggregate cost per image-gen / audio / embedding span.
Tests
- +35 tests vs baseline (1270 → 1305 passing). 0 regressions.
- 2 pre-existing failures (
test_mcp_instrumentor_integration,test_smolagents_instrumentor_integration) unchanged — unrelated to this release.
Docs
- CHANGELOG:
1.2.0section docs/reference/semantic-conventions.md: documented the three new attributes (gen_ai.usage.token_count_estimated,gen_ai.usage.image_count,gen_ai.usage.audio_seconds)
Migration
No breaking changes. The new estimation paths only fire when the provider response would otherwise leave usage empty, and the per-modality attributes (image_count, audio_seconds) are additive. Existing callers see strictly more cost coverage with the same span / metric surface.
External custom providers (e.g. tracesense providers that bypass transformers.pipeline) can opt in to the same estimation logic by importing genai_otel.cost_estimation:
from genai_otel.cost_estimation import estimate_pipeline_usage, estimate_chat_usage
usage = estimate_pipeline_usage(
task="image-text-to-text",
args=(),
kwargs={"images": [pil_image], "inputs": prompt_text},
result=pipeline_output,
pipe=hf_pipeline,
)
# usage = {prompt_tokens, completion_tokens, total_tokens, image_count, audio_seconds, estimated: True}