Skip to content

[Go] JSON Parsing Failure in ParseMessage with Valid Markdown Code Block Content #3657

@stringl1l1l1l

Description

@stringl1l1l1l

Describe the bug
The ParseMessage function in genkit/go/ai/format_json.go fails to parse JSON content when the message text contain other markdown code blocks (such as ```yaml, ```bash, etc.). The function uses base.ExtractJSONFromMarkdown() with a regex pattern that incorrectly matches any code block, not just JSON code blocks, leading to attempts to parse YAML/other content as JSON.

Full error message:

error: model failed to generate output matching expected schema: data is not valid JSON: invalid character 'y' looking for beginning of value

To Reproduce
Steps to reproduce the behavior:

  1. Create a Message with Content containing markdown code blocks (e.g., ```yaml)
  2. Call ParseMessage on a jsonHandler instance
  3. The function calls base.ExtractJSONFromMarkdown() which incorrectly extracts YAML content due to the faulty regex
  4. The YAML content gets passed to JSON validation, causing the error

Code sample that triggers the bug:

message := &Message{
    Content: []*Part{
        {Text: `{"status": "ok", "config": "```yaml\nkey: value\n```"}`},
    },
}

handler := &jsonHandler{...}
_, err := handler.ParseMessage(message) // This will fail with "invalid character 'y'"

The issue occurs because the message contains a ```yaml code block. The original regex ```(json)?((\n|.)*?)``` incorrectly matches the ```yaml block and tries to parse the YAML content ("key: value") as JSON, causing the "invalid character 'y'" error.

Expected behavior
When a message contains markdown code blocks content, the function ParseMessage should run correctly.

Root Cause Analysis
The issue is in the ExtractJSONFromMarkdown function in go/internal/base/json.go. The problematic regex pattern:

var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")

The (json)? part makes the "json" identifier optional, which means the regex will match ANY code block (```yaml, ```bash, ```python, etc.), not just ```json blocks. When it encounters a ```yaml block, it extracts the YAML content and attempts to parse it as JSON, resulting in the error "invalid character 'y' looking for beginning of value" (where 'y' is from "yaml" content).

Runtime:

  • OS: macOS (Darwin Kernel Version 24.6.0)
  • Version: macOS Sequoia

Go version

go version go1.25.0 darwin/arm64

Files Affected

  • go/internal/base/json.go (lines 122-132) - Contains the buggy ExtractJSONFromMarkdown function
  • go/ai/format_json.go (line 94) - Calls ExtractJSONFromMarkdown in ParseMessage
  • go/ai/generate.go (line 742) - Calls ExtractJSONFromMarkdown in ModelResponse.Output
  • go/internal/base/json.go (line 137) - Internal usage in GetJsonObjectLines function

Applied Fix
I introduced a simple but unverified fix.

-var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")
+var jsonMarkdownRegex = regexp.MustCompile("```json((?s:.*?))?```")

 func ExtractJSONFromMarkdown(md string) string {
	// TODO: improve this
 	matches := jsonMarkdownRegex.FindStringSubmatch(md)
 	if matches == nil {
 		return md
 	}
-	return matches[2]
+	return matches[1]
 }

This ensures that only ```json code blocks are matched and processed, preventing the parsing of YAML or other non-JSON code blocks as JSON content, but this can't handle Message with ``` json. I think it’s better to use another way to mark PartJson instead of markdown.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggo

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions