Skip to content

Integrate jsoncons for robust JSON schema validation#1274

Open
nan-yu wants to merge 21 commits intogoogle:mainfrom
nan-yu:schema-validation-cpp
Open

Integrate jsoncons for robust JSON schema validation#1274
nan-yu wants to merge 21 commits intogoogle:mainfrom
nan-yu:schema-validation-cpp

Conversation

@nan-yu
Copy link
Copy Markdown
Collaborator

@nan-yu nan-yu commented Apr 24, 2026

Description

Fixes #1268.

Pre-launch Checklist

If you need help, consider asking for advice on the discussion board.

nan-yu added 21 commits April 24, 2026 22:30
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive conformance test suite for the A2UI SDKs, covering parsing, validation, and catalog management across Python and C++ implementations. Key changes include the migration of the C++ validator to use the jsoncons library for more robust schema validation and the addition of extensive YAML-based test cases. Feedback focuses on improving the robustness of the C++ implementation, specifically regarding dependency management, error handling for unpredictable LLM outputs, performance optimizations in JSON library conversions, and the fragility of schema-based heuristics.

GIT_TAG 2.4.0
jsoncons
GIT_REPOSITORY https://github.com/danielaparker/jsoncons.git
GIT_TAG master
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using master as a GIT_TAG for an external dependency is risky as it makes builds non-reproducible and can introduce breaking changes unexpectedly. It is recommended to use a specific version tag or a commit hash.

  GIT_TAG v0.176.0

Comment on lines +780 to +783
if (it == component_ref_fields_.end()) {
throw std::runtime_error("Schema analysis failed or component type unknown: " + comp_type);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Throwing a std::runtime_error when an unknown component type is encountered during streaming can cause the entire application to crash. Since LLM output can be unpredictable and may include hallucinated component types, it's safer to handle this gracefully (e.g., by skipping the component or logging a warning) rather than throwing an exception.

Suggested change
if (it == component_ref_fields_.end()) {
throw std::runtime_error("Schema analysis failed or component type unknown: " + comp_type);
}
auto it = component_ref_fields_.find(comp_type);
if (it == component_ref_fields_.end()) {
continue;
}

Comment on lines +35 to +38
void inject_additional_properties(jsoncons::json& obj, const jsoncons::json& source_properties)
{
if (obj.is_object())
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The brace style here (Allman style) is inconsistent with the rest of the file and the repository, which generally follows the K&R style (braces on the same line).

void inject_additional_properties(jsoncons::json& obj, const jsoncons::json& source_properties) {
    if (obj.is_object()) {
References
  1. Code should follow the relevant style guides for each language. (link)

Comment on lines +276 to +277
std::string json_str = message.dump();
jsoncons::json cons_message = jsoncons::json::parse(json_str);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Converting nlohmann::json to jsoncons::json by dumping to a string and re-parsing is inefficient. Since this happens for every message in a potentially long stream, it could impact performance. Consider if jsoncons can be used more directly or if the conversion can be optimized.

Comment on lines +901 to +903
if (ref.rfind("ComponentId") != std::string::npos ||
ref.rfind("child") != std::string::npos ||
ref.find("/child") != std::string::npos) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heuristics used to identify component references (e.g., checking if a $ref contains "ComponentId" or "child") are fragile. This logic assumes specific naming conventions in the JSON schema which might not hold true if the schema is updated or if a custom catalog uses different names.

return ss.str();
}

static std::string glob_to_regex(const std::string& glob) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual implementation of glob_to_regex for loading examples may not handle all edge cases of glob patterns (like escaping or complex character classes). While sufficient for simple test data, using a standard library or a more robust implementation would be safer for general path matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[C++ SDK] Switch to JSON Schema Library for Validator and Parser when Draft 2020-12 is supported

1 participant