Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split_with_parser, split_with_scanner #6

Merged
merged 5 commits into from Apr 19, 2022
Merged

feat: split_with_parser, split_with_scanner #6

merged 5 commits into from Apr 19, 2022

Conversation

SKalt
Copy link
Contributor

@SKalt SKalt commented Apr 14, 2022

I'm noticing some weirdness in the split_with_scanner example. Malformed tokens seem to be getting dropped:

assert_eq!(
    pg_query::split_with_scanner("select 1; asdf; select 3;").unwrap(),
    vec!["select 1", "asdf", "select 3"],
); // fails: actually produces ["select 1", "select 3"]

Is this expected?

Resolves #5.

I'm noticing some weirdness in the split_with_scanner example.
Malformed tokens seem to be getting dropped.
@lfittl
Copy link
Member

lfittl commented Apr 14, 2022

I'm noticing some weirdness in the split_with_scanner example. Malformed tokens seem to be getting dropped:

assert_eq!(
    pg_query::split_with_scanner("select 1; asdf; select 3;").unwrap(),
    vec!["select 1", "asdf", "select 3"],
); // fails: actually produces ["select 1", "select 3"]

Is this expected?

Yup, that's expected, if the statement only contains invalid tokens, we skip it here:

https://github.com/pganalyze/libpg_query/blob/13-latest/src/pg_query_split.c#L113

We could change that behavior, its not something that anyone relies on today, to my knowledge. Would it be helpful for your use case?

@SKalt
Copy link
Contributor Author

SKalt commented Apr 15, 2022

Cool, I'll update my comment on the function.
I don't depend on that function, but I did have a use-case for split_with_scanner: I wanted to split the postgres regression tests into individual statements to build a SQL test corpus (skalt/sql_parser_tests). The problem was using split_with_scanner ended up mixing in psql meta-commands, so I bailed and wrote a nom parser that did what I wanted (skalt/pqsl_splitter).

@lfittl
Copy link
Member

lfittl commented Apr 15, 2022

Cool, I'll update my comment on the function. I don't depend on that function, but I did have a use-case for split_with_scanner: I wanted to split the postgres regression tests into individual statements to build a SQL test corpus (skalt/sql_parser_tests). The problem was using split_with_scanner ended up mixing in psql meta-commands, so I bailed and wrote a nom parser that did what I wanted (skalt/pqsl_splitter).

Great, and neat idea!

I'll leave the Rust code review to @seanlinsley, since he's been maintaining most of the library, but looks good from a quick glance :)

#[error("Error scanning: {0}")]
Scan(String),
#[error("Error splitting: {0}")]
Split(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lfittl are there cases where these functions can raise unique error messages, or should we be using the existing Parse error type instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There are no unique Split errors (at least not in the C library), so we could rely on Parse error for split_with_parser.

For split_with_scanner it seems reasonable to have its own error class, since we don't actually call the parser there, so it'd be incorrect to treat errors as "parse errors".

For reference, here is the rather simple code that does the splitting in C: https://github.com/pganalyze/libpg_query/blob/13-latest/src/pg_query_split.c

src/query.rs Outdated Show resolved Hide resolved
Copy link
Member

@seanlinsley seanlinsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 in general, just not sure if we need the extra error types

@SKalt
Copy link
Contributor Author

SKalt commented Apr 16, 2022

First-time contributors need a maintainer to approve running workflows

@seanlinsley would you be willing to approve workflows on this PR so we're all looking at the same unit test output?

@seanlinsley
Copy link
Member

Thanks for submitting this PR @SKalt!

@seanlinsley seanlinsley merged commit 15376c6 into pganalyze:main Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: support pg_query_scan, pg_query_split, pg_query_split_with_scanner
3 participants