It turns out that the Dart inference logic is not explaining functions in the prompts that is producing.
Actually, it seems like the representation of schema in prompts are very different in the Dart inference library compared to the Python library and evals in the A2UI repository.
There is no one correct way to represent the schema or do inference - any approach is valid if it performs well. But the Dart approach is currently missing function support, and it's likely slower than necessary, because it inlines schemas in cases where there could be $refs to reduce duplication.
To solve this in a durable way, we should actually define an prompt specification, separate to our client libraries, then create multiple implementations (Python, Dart) initially, which have tests that verify the prompt produced for a given configuration and catalog is exactly the same. That way, we can run evaluations using one library (Python) and have confidence the results will be reproducible in Dart etc.
A2UI repository approach
We concatenate multiple named schemas - server_to_client.json, catalog.json and common_types.json which refer to each other.
Dart approach
We create one big schema by inlining the catalog components etc -
|
.map((e) => Component.fromJson(e as JsonMap)) |
etc
It turns out that the Dart inference logic is not explaining functions in the prompts that is producing.
Actually, it seems like the representation of schema in prompts are very different in the Dart inference library compared to the Python library and evals in the A2UI repository.
There is no one correct way to represent the schema or do inference - any approach is valid if it performs well. But the Dart approach is currently missing function support, and it's likely slower than necessary, because it inlines schemas in cases where there could be $refs to reduce duplication.
To solve this in a durable way, we should actually define an prompt specification, separate to our client libraries, then create multiple implementations (Python, Dart) initially, which have tests that verify the prompt produced for a given configuration and catalog is exactly the same. That way, we can run evaluations using one library (Python) and have confidence the results will be reproducible in Dart etc.
A2UI repository approach
We concatenate multiple named schemas - server_to_client.json, catalog.json and common_types.json which refer to each other.
Dart approach
We create one big schema by inlining the catalog components etc -
genui/packages/genui/lib/src/model/a2ui_message.dart
Line 186 in 2af233c