Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simdjson for json_parse #7658

Closed
wants to merge 1 commit into from

Conversation

Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Nov 20, 2023

Summary: In json_parse when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation to simdjson and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).

Differential Revision: D51469435

Copy link

netlify bot commented Nov 20, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 75fc8fd
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/655e85244aa3530008834789

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51469435

Yuhta added a commit to Yuhta/velox that referenced this pull request Nov 20, 2023
Summary:

In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing).  To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid.  This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).

Differential Revision: D51469435
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51469435

Yuhta added a commit to Yuhta/velox that referenced this pull request Nov 20, 2023
Summary:

In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing).  To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid.  This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).

Differential Revision: D51469435
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51469435

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever. Thanks.

@@ -114,6 +139,9 @@ class JsonParseFunction : public exec::VectorFunction {
.argumentType("varchar")
.build()};
}

private:
mutable simdjson::dom::parser parser_;
Copy link
Contributor

@PHILO-HE PHILO-HE Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yuhta, I note simdjson's ondemand parser has better performance, but has some limitations, e.g., it will not validate the full input. Is dom parser intentionally used here? Ref. link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I tried ondemand parser first and find it is not able to invalidate some malformed JSON.

Summary:

In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing).  To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid.  This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).

Reviewed By: mbasmanova

Differential Revision: D51469435
Yuhta added a commit to Yuhta/velox that referenced this pull request Nov 22, 2023
Summary:

In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing).  To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid.  This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).

Reviewed By: mbasmanova

Differential Revision: D51469435
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51469435

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51469435

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 0460044.

Copy link

Conbench analyzed the 1 benchmark run on commit 04600443.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants