Skip to content

Untyped Map-Like Fields in the Langfuse OpenAPI Spec. #38

@edeandrea

Description

@edeandrea

I came across this while doing #36

There are Untyped Map-Like Fields in the Langfuse OpenAPI Spec.

Problem

The Langfuse OpenAPI spec (openapi.yml) has 28 fields across 18 schemas that represent JSON objects (key-value maps) but lack type: object and additionalProperties declarations. These fields have only a description and optionally nullable: true, with no type information.

In the OpenAPI 3.0 specification, a property with no type is treated as "any type" (AnyType). Code generators for strongly-typed languages map this to the language's top-level type — for example, Object in Java, any in TypeScript, or interface{} in Go — instead of a map/dictionary type. This degrades the developer experience because consumers must manually cast these values.

The spec already correctly types the same fields in other schemas (see Correctly Typed Fields below), making this an internal inconsistency rather than a design choice.

How We Identified Which Fields Should Be Typed

We used four criteria to determine which untyped fields should have type: object + additionalProperties: true added:

1. Internal Consistency Within the Spec

The same field name is correctly typed in some schemas but untyped in others. For example:

Field Correctly Typed In Missing Type In
metadata Project, OrganizationProject, legacyCreateScoreRequest Trace, Observation, Dataset, BaseScore, and 12 others
modelParameters CreateGenerationBody, UpdateGenerationBody, ObservationBody Observation, ObservationV2
config LlmConnection, UpsertLlmConnectionRequest BasePrompt, CreateChatPromptRequest, CreateTextPromptRequest

If the field is typed as type: object with additionalProperties in one schema, it should be typed the same way everywhere it appears.

2. Langfuse Documentation

The Langfuse documentation explicitly describes these fields as object/dictionary types:

  • metadata: Documented as Record<string, unknown>. The v2→v3 migration guide states: "Only the Record type is supported within our UI and endpoints to perform queries and filter events." Non-object values sent as metadata are coerced into objects (e.g., "test" becomes { "metadata": "test" }).
  • modelParameters: Documented as a string-keyed map ({ "property1": "string", "property2": "string" }). The UI renders it using Object.entries(), confirming it is always an object.
  • config (on prompts): Documented as a freeform JSON dictionary. Accessed via dictionary methods in SDKs (cfg.get("model") in Python, cfg.model in JS/TS).

3. Semantic Analysis

Some fields are structurally always JSON objects by nature:

  • inputSchema / expectedOutputSchema: These hold JSON Schema definitions, which are always JSON objects per the JSON Schema specification.
  • tokenizerConfig: A configuration map for tokenizer settings — inherently key-value.
  • lastConfig (in PromptMeta): The last-used prompt configuration, same semantics as config.

4. Exclusion Criteria — Fields That Should Remain Untyped

We explicitly excluded fields that can legitimately be any JSON value:

  • input / output / expectedOutput: The Langfuse API documentation states these "Can be any JSON" — strings, numbers, arrays, or objects are all valid. Typing these as object would be incorrect.
  • error (in IngestionError): Error payloads can be structured in various ways.
  • log (in SDKLogBody): SDK debug payloads can be any JSON value.

Correctly Typed Fields (8)

These fields already have proper type: object + additionalProperties declarations and serve as the pattern that should be applied to the missing fields:

Schema Field Current Definition
Project metadata type: object, additionalProperties: true
OrganizationProject metadata type: object, additionalProperties: true
legacyCreateScoreRequest metadata type: object, additionalProperties: true
LlmConnection config type: object, additionalProperties: true
UpsertLlmConnectionRequest config type: object, additionalProperties: true
CreateGenerationBody modelParameters type: object, additionalProperties: { $ref: MapValue }
UpdateGenerationBody modelParameters type: object, additionalProperties: { $ref: MapValue }
ObservationBody modelParameters type: object, additionalProperties: { $ref: MapValue }

Fields Missing Type Definitions (28)

These fields should have type: object and additionalProperties: true added:

metadata (19 fields)

Schema Current Definition Nullable
Trace no type specified true
Observation no type specified not set
ObservationV2 no type specified true
ObservationBody no type specified true
OptionalObservationBody no type specified true
BaseScore no type specified not set
BaseScoreV1 no type specified not set
ScoreBody no type specified true
Dataset no type specified not set
DatasetItem no type specified not set
DatasetRun no type specified not set
CreateDatasetItemRequest no type specified true
CreateDatasetRunItemRequest no type specified true
CreateDatasetRequest no type specified true
TraceBody no type specified true
BaseEvent no type specified true

modelParameters (2 fields)

Schema Current Definition Nullable
Observation no type specified not set
ObservationV2 no type specified true

Note: The write-side schemas (CreateGenerationBody, UpdateGenerationBody, ObservationBody) already correctly type modelParameters as type: object with additionalProperties: { $ref: MapValue }.

config (3 fields)

Schema Current Definition Nullable
BasePrompt empty definition ({}) N/A
CreateChatPromptRequest no type specified true
CreateTextPromptRequest no type specified true

Note: LlmConnection.config and UpsertLlmConnectionRequest.config already correctly use type: object with additionalProperties: true.

tokenizerConfig (2 fields)

Schema Current Definition Nullable
Model no type specified not set
CreateModelRequest no type specified true

inputSchema / expectedOutputSchema (4 fields)

Schema Field Current Definition Nullable
Dataset inputSchema no type specified true
Dataset expectedOutputSchema no type specified true
CreateDatasetRequest inputSchema no type specified true
CreateDatasetRequest expectedOutputSchema no type specified true

lastConfig (1 field)

Schema Current Definition Nullable
PromptMeta no type specified not set

Cascading Impact via Schema Composition (30 schemas)

The 28 untyped fields cascade into 30 additional composed schemas via oneOf, allOf, and anyOf. Fixing the base schemas will automatically fix all of these:

Composed Schema Inherits From Untyped Field
TraceWithDetails Trace metadata
TraceWithFullDetails Trace metadata
ObservationsView Observation metadata, modelParameters
BooleanScore BaseScore metadata
CategoricalScore BaseScore metadata
CorrectionScore BaseScore metadata
NumericScore BaseScore metadata
TextScore BaseScore metadata
BooleanScoreV1 BaseScoreV1 metadata
CategoricalScoreV1 BaseScoreV1 metadata
NumericScoreV1 BaseScoreV1 metadata
TextScoreV1 BaseScoreV1 metadata
DatasetRunWithItems DatasetRun metadata
ChatPrompt BasePrompt config
TextPrompt BasePrompt config
CreatePromptRequest CreateChatPromptRequest / CreateTextPromptRequest config
CreateEventBody OptionalObservationBody metadata
UpdateEventBody OptionalObservationBody metadata
CreateEventEvent BaseEvent metadata
CreateGenerationEvent BaseEvent metadata
CreateObservationEvent BaseEvent metadata
CreateSpanEvent BaseEvent metadata
UpdateGenerationEvent BaseEvent metadata
UpdateObservationEvent BaseEvent metadata
UpdateSpanEvent BaseEvent metadata
SDKLogEvent BaseEvent metadata
ScoreEvent BaseEvent metadata
TraceEvent BaseEvent metadata

Fields Correctly Left Untyped (18)

These fields are intentionally untyped because they can be any JSON value (string, number, array, object, or null). No changes needed:

Schema Field Reason
Trace input, output Documented as "Can be any JSON"
Observation input, output Documented as "Can be any JSON"
ObservationV2 input, output Documented as "Can be any JSON"
ObservationBody input, output Documented as "Can be any JSON"
OptionalObservationBody input, output Documented as "Can be any JSON"
TraceBody input, output Documented as "Can be any JSON"
DatasetItem input, expectedOutput Documented as "Can be any JSON"
CreateDatasetItemRequest input, expectedOutput Documented as "Can be any JSON"
IngestionError error Error payloads can be any structure
SDKLogBody log SDK debug payload, any JSON

Suggested Fix

For each of the 28 fields listed above, add type: object and additionalProperties: true to the property definition. For example, change:

metadata:
  nullable: true
  description: Additional metadata of the observation

to:

metadata:
  type: object
  additionalProperties: true
  nullable: true
  description: Additional metadata of the observation

And for the empty-definition case (BasePrompt.config: {}), change to:

config:
  type: object
  additionalProperties: true

Originally posted by @edeandrea in #36 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions