Enforce UTF-8 validity for RFC 8259 string parsing by ionux · Pull Request #22 · ionux/ok_json

ionux · 2026-03-11T02:25:48Z

Motivation

Ensure JSON string handling conforms to RFC 8259 §8.1 by rejecting ill-formed UTF-8 bytes in string contents.

Description

Added okj_is_utf8_continuation() and okj_validate_utf8_sequence() to validate single UTF-8 scalar sequences (1–4 bytes) and detect invalid forms such as overlong encodings, surrogate-range encodings, invalid continuation bytes, truncated sequences, and out-of-range 4-byte forms.
Integrated UTF-8 validation into the string-parsing loop in src/ok_json.c so non-escaped non-ASCII bytes are checked and the parser advances by the validated byte length instead of assuming single-byte characters.
On UTF-8 validation failure the parser now returns OKJ_ERROR_BAD_STRING for the offending string token.
Added unit tests in test/ok_json_tests.c: test_utf8_valid_multibyte, test_utf8_invalid_overlong, test_utf8_invalid_surrogate, and test_utf8_invalid_truncated to cover valid multibyte text and representative invalid UTF-8 cases.

Testing

Built and ran the full test suite with make; the test runner executed and all tests passed including the new UTF-8 tests.
Test files exercised: test/ok_json_tests.c (new UTF-8 cases) and src/ok_json.c (parser changes) and reported All OK_JSON tests passed! on the test runner output.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7745e260b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-11T02:29:11Z

+        return 0U;
+    }
+
+    b2 = (uint8_t)src[pos + 2U];


Prevent UTF-8 validation from reading past buffer end

okj_validate_utf8_sequence() reads src[pos + 2U] (and later src[pos + 3U]) without first verifying those bytes exist, so a malformed input that ends mid-sequence (for example, an unterminated string whose last byte is 0xE0) can trigger an out-of-bounds read before the parser reports an error. In this case the parser enters the string loop because src[pos] != '\0', then dereferences beyond the terminator, which is undefined behavior and can crash under sanitizers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-11T02:29:11Z

+                    break;
+                }
+
+                parser->position += utf8_advance;


Enforce max string length after UTF-8 byte advance

The max-length check is performed before consuming the next character, but parser->position now advances by the full UTF-8 sequence length, so a final multibyte character can push the token past OKJ_MAX_STRING_LEN and still be accepted if the next byte is the closing quote. For example, a 63-byte ASCII prefix plus a 2-byte UTF-8 character parses successfully with length 65, which bypasses the configured hard limit.

Useful? React with 👍 / 👎.

Enforce UTF-8 validity in JSON strings

7745e26

ionux added the codex label Mar 11, 2026 — with ChatGPT Codex Connector

ionux merged commit b019cef into main Mar 11, 2026
2 checks passed

ionux deleted the codex/analyze-for-rfc-8259-compliance branch March 11, 2026 02:27

chatgpt-codex-connector Bot reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce UTF-8 validity for RFC 8259 string parsing#22

Enforce UTF-8 validity for RFC 8259 string parsing#22
ionux merged 1 commit intomainfrom
codex/analyze-for-rfc-8259-compliance

ionux commented Mar 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ionux commented Mar 11, 2026

Motivation

Description

Testing

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant