-
Notifications
You must be signed in to change notification settings - Fork 493
Description
Describe the bug
The ParseMessage
function in genkit/go/ai/format_json.go
fails to parse JSON content when the message text contain other markdown code blocks (such as ```yaml, ```bash, etc.). The function uses base.ExtractJSONFromMarkdown()
with a regex pattern that incorrectly matches any code block, not just JSON code blocks, leading to attempts to parse YAML/other content as JSON.
Full error message:
error: model failed to generate output matching expected schema: data is not valid JSON: invalid character 'y' looking for beginning of value
To Reproduce
Steps to reproduce the behavior:
- Create a Message with Content containing markdown code blocks (e.g., ```yaml)
- Call
ParseMessage
on ajsonHandler
instance - The function calls
base.ExtractJSONFromMarkdown()
which incorrectly extracts YAML content due to the faulty regex - The YAML content gets passed to JSON validation, causing the error
Code sample that triggers the bug:
message := &Message{
Content: []*Part{
{Text: `{"status": "ok", "config": "```yaml\nkey: value\n```"}`},
},
}
handler := &jsonHandler{...}
_, err := handler.ParseMessage(message) // This will fail with "invalid character 'y'"
The issue occurs because the message contains a ```yaml code block. The original regex ```(json)?((\n|.)*?)```
incorrectly matches the ```yaml block and tries to parse the YAML content ("key: value") as JSON, causing the "invalid character 'y'" error.
Expected behavior
When a message contains markdown code blocks content, the function ParseMessage
should run correctly.
Root Cause Analysis
The issue is in the ExtractJSONFromMarkdown
function in go/internal/base/json.go
. The problematic regex pattern:
var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")
The (json)?
part makes the "json" identifier optional, which means the regex will match ANY code block (```yaml, ```bash, ```python, etc.), not just ```json blocks. When it encounters a ```yaml block, it extracts the YAML content and attempts to parse it as JSON, resulting in the error "invalid character 'y' looking for beginning of value" (where 'y' is from "yaml" content).
Runtime:
- OS: macOS (Darwin Kernel Version 24.6.0)
- Version: macOS Sequoia
Go version
go version go1.25.0 darwin/arm64
Files Affected
go/internal/base/json.go
(lines 122-132) - Contains the buggyExtractJSONFromMarkdown
functiongo/ai/format_json.go
(line 94) - CallsExtractJSONFromMarkdown
inParseMessage
go/ai/generate.go
(line 742) - CallsExtractJSONFromMarkdown
inModelResponse.Output
go/internal/base/json.go
(line 137) - Internal usage inGetJsonObjectLines
function
Applied Fix
I introduced a simple but unverified fix.
-var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")
+var jsonMarkdownRegex = regexp.MustCompile("```json((?s:.*?))?```")
func ExtractJSONFromMarkdown(md string) string {
// TODO: improve this
matches := jsonMarkdownRegex.FindStringSubmatch(md)
if matches == nil {
return md
}
- return matches[2]
+ return matches[1]
}
This ensures that only ```json code blocks are matched and processed, preventing the parsing of YAML or other non-JSON code blocks as JSON content, but this can't handle Message
with ``` json. I think it’s better to use another way to mark PartJson
instead of markdown.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status