Skip to content

functions: save FieldType as value instead of ptr in json function#10846

Open
yongman wants to merge 1 commit into
pingcap:masterfrom
yongman:ym/fix-json-function
Open

functions: save FieldType as value instead of ptr in json function#10846
yongman wants to merge 1 commit into
pingcap:masterfrom
yongman:ym/fix-json-function

Conversation

@yongman
Copy link
Copy Markdown
Member

@yongman yongman commented May 15, 2026

What problem does this PR solve?

Issue Number: close #10845

Problem Summary:

When TiFlash nextgen evaluates JSON_EXTRACT on a TEXT column with IS NULL / IS NOT NULL filters, the result can be inconsistent with JSON columns.

For example, JSON_EXTRACT(action_params, '$.popup_id') IS NULL may return rows whose extracted value is actually non-null, while IS NOT NULL returns no rows.

The root cause is that the disaggregated columnar path builds temporary FilterConditions, and JSON cast functions keep raw pointers to tipb::FieldType. After the temporary object is destroyed, those pointers can become dangling, so FunctionCastStringAsJson may read invalid FieldType metadata.

What is changed and how it works?

functions: save FieldType as value instead of ptr in json function

Store TiDB FieldType metadata by value in JSON cast functions instead of keeping raw pointers to caller-owned FieldType objects.

Use std::optional<tipb::FieldType> for optional FieldType metadata and update the missing-metadata checks accordingly in:
- FunctionCastJsonAsString
- FunctionCastIntAsJson
- FunctionCastStringAsJson
- FunctionCastTimeAsJson

This avoids dangling FieldType references when JSON cast functions are created from temporary filter conditions, and keeps TEXT-to-JSON cast behavior stable for pushed-down JSON_EXTRACT filters.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test:

Use the SQL in #10845 to create event_log1 with action_params TEXT and event_log2 with action_params JSON, then run with:

SET SESSION tidb_isolation_read_engines='tiflash';

Verify that both TEXT and JSON columns return consistent results:

WHERE JSON_EXTRACT(action_params, '$.popup_id') IS NULL
-- returns 0 rows

WHERE JSON_EXTRACT(action_params, '$.popup_id') IS NOT NULL
-- returns 5 rows

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: yongman <yming0221@gmail.com>
@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed labels May 15, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 15, 2026

@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 15, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

Refactored JSON casting functions to use std::optional<tipb::FieldType> instead of raw pointers for storing TiDB field type configuration. Updated FunctionCastJsonAsString, FunctionCastIntAsJson, FunctionCastStringAsJson, and FunctionCastTimeAsJson with optional member storage, revised setters, and condition checks using has_value() in place of nullptr comparisons.

Changes

JSON FieldType Optional Refactoring

Layer / File(s) Summary
Header include and FunctionCastJsonAsString refactoring
dbms/src/Functions/FunctionsJson.h
Added <optional> header. Refactored FunctionCastJsonAsString to store tidb_tp as std::optional<tipb::FieldType>. Updated setOutputTiDBFieldType to assign into the optional, and changed execution condition from pointer check to tidb_tp.has_value() before accessing flen().
FunctionCastIntAsJson refactoring
dbms/src/Functions/FunctionsJson.h
Refactored FunctionCastIntAsJson to store input_tidb_tp as std::optional<tipb::FieldType>. Updated setInputTiDBFieldType to assign into the optional, and changed execution condition from input_tidb_tp == nullptr to !input_tidb_tp.has_value().
FunctionCastStringAsJson refactoring
dbms/src/Functions/FunctionsJson.h
Refactored FunctionCastStringAsJson to store both input_tidb_tp and output_tidb_tp as std::optional<tipb::FieldType>. Updated both setters to assign into optionals. Changed execution conditions from pointer checks to has_value() checks; output_tidb_tp is safely dereferenced only when present before calling hasParseToJSONFlag().
FunctionCastTimeAsJson refactoring
dbms/src/Functions/FunctionsJson.h
Refactored FunctionCastTimeAsJson to store input_tidb_tp as std::optional<tipb::FieldType>. Updated setInputTiDBFieldType to assign into the optional, and changed timestamp detection logic from input_tidb_tp == nullptr check to !input_tidb_tp.has_value().

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A whisker-twitch of types so neat,
From pointers raw to optional sweet,
Four functions dance in perfect sync,
With has_value() checks that link,
Safe JSON casting, tip-to-toe!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR references issue #10845 about TEXT to JSON conversion inconsistency, and the code changes refactor JSON function member storage, but the linked issue lacks sufficient detail to validate full requirement alignment. Verify that the optional-based refactoring directly resolves the TEXT-to-JSON query inconsistency reported in issue #10845 by clarifying the bug mechanism and how the pointer-to-optional change fixes it.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring JSON functions to use optional values instead of pointers for FieldType storage.
Out of Scope Changes check ✅ Passed The changes are focused solely on refactoring FieldType storage in JSON functions from pointers to optional values, which is directly related to the referenced issue scope.
Description check ✅ Passed The PR description covers the problem statement, solution with commit message, manual test steps, and test checklist, following the template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dbms/src/Functions/FunctionsJson.h (1)

439-439: 💤 Low value

Refactoring from pointer to value changes ownership semantics.

The change from const tipb::FieldType* to std::optional<tipb::FieldType> is semantically significant: the function now owns a copy of the FieldType rather than holding a reference to external data. This eliminates potential lifetime issues (dangling pointers), which likely addresses the consistency bug mentioned in issue #10845.

The setter copies tipb::FieldType on each call. If tipb::FieldType (a protobuf message) is large, consider adding a move-enabled overload:

void setOutputTiDBFieldType(tipb::FieldType tidb_tp_) { tidb_tp = std::move(tidb_tp_); }

However, the current implementation is correct, and the copy overhead may be acceptable.

Also applies to: 467-467, 530-530

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Functions/FunctionsJson.h` at line 439, The setter currently copies
a potentially large protobuf (setOutputTiDBFieldType) which can be expensive;
add a move-enabled overload that takes tipb::FieldType by value (or an rvalue
ref) and moves it into the std::optional member (tidb_tp) to avoid the extra
copy, and apply the same change to the other setters flagged in this file (the
other setOutputTiDBFieldType occurrences referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dbms/src/Functions/FunctionsJson.h`:
- Line 439: The setter currently copies a potentially large protobuf
(setOutputTiDBFieldType) which can be expensive; add a move-enabled overload
that takes tipb::FieldType by value (or an rvalue ref) and moves it into the
std::optional member (tidb_tp) to avoid the extra copy, and apply the same
change to the other setters flagged in this file (the other
setOutputTiDBFieldType occurrences referenced in the comment).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: aa141322-ab1a-4623-ab1a-810339dd1046

📥 Commits

Reviewing files that changed from the base of the PR and between ed4e382 and e74426d.

📒 Files selected for processing (1)
  • dbms/src/Functions/FunctionsJson.h

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

@yongman: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-asan e74426d link false /test pull-sanitizer-asan

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 15, 2026
Copy link
Copy Markdown
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the fixed in the tiflash-cse columnar branch.

LGTM

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-15 05:48:16.988262699 +0000 UTC m=+417465.521042028: ☑️ agreed by JaySon-Huang.

@JaySon-Huang
Copy link
Copy Markdown
Contributor

/cc @windtalker @gengliqi

@ti-chi-bot ti-chi-bot Bot requested review from gengliqi and windtalker May 15, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

text field convert to json result data query inconsistency

2 participants