Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More bugs when binding insert with unaligned source schema #9036

Closed
y-wei opened this issue Apr 6, 2023 · 8 comments
Closed

More bugs when binding insert with unaligned source schema #9036

y-wei opened this issue Apr 6, 2023 · 8 comments
Assignees
Labels
component/frontend Protocol, parsing, binder. needs-discussion type/bug Something isn't working
Milestone

Comments

@y-wei
Copy link
Contributor

y-wei commented Apr 6, 2023

Describe the bug

We may need to better deal with unaligned source and insert schema.

To Reproduce

dev=> create table tn (v1 int, v2 int);
CREATE_TABLE
dev=> create table ipt (v1 int);
CREATE_TABLE
dev=> insert into tn select * from ipt;
ERROR:  QueryError: Bind error: INSERT has more target columns than expressions
dev=> insert into tn (v1) select * from ipt;
ERROR:  QueryError: Bind error: INSERT has more target columns than expressions

Expected behavior

The two insert should be executed per Postgres.

Additional context

No response

@y-wei y-wei added type/bug Something isn't working component/frontend Protocol, parsing, binder. labels Apr 6, 2023
@github-actions github-actions bot added this to the release-0.19 milestone Apr 6, 2023
@y-wei
Copy link
Contributor Author

y-wei commented Apr 6, 2023

I don't see a solution under current design and maybe some discussion is needed 🥲, together with #9012

@BugenZhao
Copy link
Member

Good point. Actually, I think it's okay that we only support a subset of the DML syntax from Postgres. For example, not supporting only providing the values for a prefix of columns. However, it makes no sense that the second statement cannot be executed. 😄

Would you mind issuing a fix for this?

@y-wei
Copy link
Contributor Author

y-wei commented Apr 7, 2023

Yes I'd love to, yet I haven't figured out a way.😭 Can we pad manual Exprimpls (in this case, nulls) to BoundQuery? If so, how?

@y-wei
Copy link
Contributor Author

y-wei commented Apr 12, 2023

I think this is a side effect of #8770. This PR actually forced the input schema to be aligned with the table schema (by align I mean the field at corresponding index can be casted), while it shouldn't, if the input is a query rather than values. Above is a failed case while I find one more interesting one:

dev=> create table t2 (v1 int, v2 int);
CREATE_TABLE
dev=> insert into t2 values (1, 2);
INSERT 0 1
dev=> insert into t2 (v1) select * from t2;
INSERT 0 1
dev=> select * from t2;
 v1 | v2 
----+----
  1 |  2
  1 |  2
(2 rows)

@y-wei
Copy link
Contributor Author

y-wei commented Apr 12, 2023

My idea is that

// columns that are in the target table but not in the provided target columns
if col_indices_to_insert.len() != cols_to_insert_in_table.len() {
for col in cols_to_insert_in_table {
if let Some(col_to_insert_idx) = col_name_to_idx.get(col.name()) {
if *col_to_insert_idx != usize::MAX {
col_indices_to_insert.push(*col_to_insert_idx);
}
} else {
unreachable!();
}
}
}
shall be removed and (parts of) the "padding nulls" logic in values shall be recovered. Do you have any concerns or points to add? @xxchan @broccoliSpicy

@y-wei
Copy link
Contributor Author

y-wei commented Apr 12, 2023

Actually I'd prefer postpone the logic of padding nulls to the executor and the binder just gives a list of column indices that need to filled with null (or maybe other non-null default value as I'm working on😇). Otherwise I don't see a way to solve the insert from query case. cc. @BugenZhao

@broccoliSpicy
Copy link
Contributor

broccoliSpicy commented Apr 12, 2023

I think this is a side effect of #8770. This PR actually forced the input schema to be aligned with the table schema (by align I mean the field at corresponding index can be casted), while it shouldn't, if the input is a query rather than values. Above is a failed case while I find one more interesting one:

dev=> create table t2 (v1 int, v2 int);
CREATE_TABLE
dev=> insert into t2 values (1, 2);
INSERT 0 1
dev=> insert into t2 (v1) select * from t2;
INSERT 0 1
dev=> select * from t2;
 v1 | v2 
----+----
  1 |  2
  1 |  2
(2 rows)

I don't see how schema alignment gets involved here...
but our current behavior for this query here is semantically wrong, I suppose?
it might be helpful if we have a comprehensive picture of postgres's insert semantic and a conclusion of to what degree risingwave conforms

@y-wei y-wei self-assigned this Apr 15, 2023
@y-wei y-wei modified the milestones: release-0.19, release-0.20 May 12, 2023
@y-wei
Copy link
Contributor Author

y-wei commented May 12, 2023

#9208

@y-wei y-wei closed this as completed May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/frontend Protocol, parsing, binder. needs-discussion type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants