Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner/spannertest): Support generated columns #4742

Merged
merged 17 commits into from Sep 20, 2021

Conversation

philwitty
Copy link
Contributor

@philwitty philwitty commented Sep 10, 2021

No description provided.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the Cloud Spanner API. label Sep 10, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 10, 2021
@philwitty philwitty marked this pull request as ready for review Sep 10, 2021
@philwitty philwitty requested review from hengfengli, skuruppu and a team as code owners Sep 10, 2021
@hengfengli hengfengli requested a review from olavloite Sep 13, 2021
Copy link
Contributor

@olavloite olavloite left a comment

Thanks for this change, I think it is a good step in the right direction, but there are a couple of things that need some looking into:

  1. I get two test failures when I run the tests locally. One seems to simply be a leftover t.Fail() call in one of your tests. The other is in an existing test.
  2. I think it would be better to try to integrate the check which columns a generated column depends on with the existing expression evaluation. This reduces the chances of having differences between the two.

spanner/spannertest/README.md Show resolved Hide resolved
break
}
}
// We skip this column if any of its dependencies are null
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do that?

row: row,
}

// TODO: We would need to do a topoligical sort on dependencies to ensure we can
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// TODO: We would need to do a topoligical sort on dependencies to ensure we can
// TODO: We would need to do a topological sort on dependencies to ensure we can

@@ -634,6 +683,10 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
// TODO: what happens in this case?
return status.Newf(codes.Unimplemented, "can't add NOT NULL columns to non-empty tables yet")
}
if cd.Generated != nil {
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This check should be extended to check whether the table is empty or not. Adding a generated column to an empty tables should be safe in this case.

This PR should also add a check to dropColumn to prevent a column that is a dependent of a generated column from being dropped.

Copy link
Contributor Author

@philwitty philwitty Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

Copy link
Contributor Author

@philwitty philwitty Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checking the table is non empty, as for checking if its a dependency of a generated column do you still think its worth adding that as we talked about dropping all dependency calculations above?

Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that it was already checking it.

As for the dependency checking:

  1. I don't think you need any dependency checking here.
  2. We should preferably use the dependency checking in dropColumn to prevent columns that are used by a generated column from being dropped. But preferably that dependency checking can be done using the method that also does the evaluation, so that we don't have to duplicate that code. Would it for example be possible to use default values for the evaluation context when we only want to check the dependencies?

Copy link
Contributor Author

@philwitty philwitty Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that definitely sounds like the right approach, having a way to run the evaluator in with default values in some "complete" way (i.e. we'd have to check in an OR what is required to evaluate the RHS even if the LHS evaluates to true). It feels out of scope for me right now, given it will be a fair bit of extra work to get going well and I'm not overly confident with the evaluation code and typing. For now I'll just come up with the solution for adding columns as any other features should be able to be built on top without changes so doesn't feel wasteful

@@ -1090,3 +1147,40 @@ func parseAsDate(s string) (civil.Date, error) { return civil.ParseDate(s) }
func parseAsTimestamp(s string) (time.Time, error) {
return time.Parse("2006-01-02T15:04:05.999999999Z", s)
}

// getIDsForGeneratedExpression returns a list of column names the expression depends on
func getIDsForGeneratedExpression(e spansql.Expr) []spansql.ID {
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best way to do this from a maintenance and completeness perspective. Instead I think it might be better to edit the evalContext.evalExpr function to also return the id's that the expression needs to be evaluated. That would keep the logic for what is needed for a certain expression in one place, and reduce the chance of missing something, and also ensure that whenever we add support for new expressions, those are also automatically included in this array.
This implementation currently does not detect any dependencies on other columns if they are used in:

  1. Function calls
  2. Array expressions

Copy link
Contributor Author

@philwitty philwitty Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I really didn't like this approach, it felt like it was prone to error and future failure but couldn't figure out something smarter. I think your suggestion sounds solid I can check into that, So adding a 3rd return argument to evalExpr, potentially returning and error and the list of dependencies?

Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like that was my thought as well.

rows := slurp(t, iter)
// 2 * A * B = 2 * 3 * 3
if rows[0][1].(int64) != 18 {
t.Fatal("Generated value for C should have been 18")
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use got/want format, so something like Generated value for C mismatch\n Got: %v\nWant: %v

t.Fatal("Generated value for C should be nil")
}

t.Fail()
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs here.

if rows[0][1] != nil {
t.Fatal("Generated value for C should be nil")
}

Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for update and delete?

@@ -52,6 +52,14 @@ var functions = map[string]function{
return strings.HasPrefix(s, prefix), spansql.Type{Base: spansql.Bool}, nil
},
},
"LOWER": {
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change belongs in this PR.

Copy link
Contributor Author

@philwitty philwitty Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move it to a separate one, just smuggled it in here as I was testing with it

Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I would prefer to have it in a separate PR, unless it is actually used in one of the test cases (I don't think it is at this moment, right?)

@@ -361,6 +362,83 @@ func TestConcurrentReadInsert(t *testing.T) {
}
}

func TestGeneratedColumn(t *testing.T) {
Copy link
Contributor

@olavloite olavloite Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add tests for:

  1. Adding a generated column to an existing table.
  2. Altering a generated column.
  3. Dropping a column that is a dependant of a generated column.

@philwitty
Copy link
Contributor Author

philwitty commented Sep 13, 2021

Thanks for this change, I think it is a good step in the right direction, but there are a couple of things that need some looking into:

  1. I get two test failures when I run the tests locally. One seems to simply be a leftover t.Fail() call in one of your tests. The other is in an existing test.
  2. I think it would be better to try to integrate the check which columns a generated column depends on with the existing expression evaluation. This reduces the chances of having differences between the two.

Thanks for the quick review, I think that suggestion is great, I was hoping someone could offer a better suggestion!

@philwitty
Copy link
Contributor Author

philwitty commented Sep 14, 2021

@olavloite I did a rewrite dropping all the column dependency stuff, pretty happy with it now, I don't see why that can't be added on top later when we want to support dropping columns that may be dependencies or resolving co-dependent generated columns in a reliable way. It felt like a tricky problem to solve though, needing default values and ensuring complete execution etc, a good rewrite of the db_eval stuff.

@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
if col.Generated != nil {
res, err := ec.evalExpr(col.Generated)
if err != nil {
// We assume that if the expression ended up with a type error a
Copy link
Contributor

@olavloite olavloite Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the evaluation of the different expressions, and the problem that you are running into is that those evaluations do not take into account that some of the input parameters could be NULL. So I think that this is a reasonable workaround for this for now, but we should probably try to fix the evaluation of expressions that may return null values and then remove this.

Copy link
Contributor Author

@philwitty philwitty Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, its very much a best effort but dumb approach. Cheers

@google-cla
Copy link

google-cla bot commented Sep 14, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Sep 14, 2021
@olavloite
Copy link
Contributor

olavloite commented Sep 14, 2021

@googlebot I consent.

@google-cla google-cla bot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Sep 14, 2021
@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
Copy link
Contributor

@olavloite olavloite left a comment

LGTM

@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 14, 2021
@olavloite
Copy link
Contributor

olavloite commented Sep 15, 2021

@psytale The build error is caused by the fact that the implementation uses [Errors.Is](https://pkg.go.dev/errors#Is), which is only supported from Go 1.13 and further. The Spanner client library supports Go 1.11 and onwards, so we need to change that bit as well. There are two files in the repository (errors112.go and errors113.go) that are conditionally added to the build depending on the Go version. You could add the method that you need there.

Feel free to let me know if you want me to look into it for you.

philwitty added 2 commits Sep 15, 2021
* No longer returning code.InvalidArgument when calling db_eval.evalFunc
  but that seems consistent with the rest of the module, no tests failed
  so I assume its not behaviour relied upon, we could check for the
  error type instead
@philwitty
Copy link
Contributor Author

philwitty commented Sep 15, 2021

@psytale The build error is caused by the fact that the implementation uses [Errors.Is](https://pkg.go.dev/errors#Is), which is only supported from Go 1.13 and further. The Spanner client library supports Go 1.11 and onwards, so we need to change that bit as well. There are two files in the repository (errors112.go and errors113.go) that are conditionally added to the build depending on the Go version. You could add the method that you need there.

Feel free to let me know if you want me to look into it for you.

Think I got something going with it but feel free to push any changes if you want to do it in a different way, I'm not overly confident with go. I also fixed (and added a test case) for functions and null columns, I wasn't happy with removing the codes.InvalidArgument but felt it was okay, shout if you think of a better way to do it (or feel free to push commits)

@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 15, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 15, 2021
@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 15, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 15, 2021
@olavloite
Copy link
Contributor

olavloite commented Sep 15, 2021

@psytale I decided to remove the extra error wrapping and checking, and instead add NULL checks to the most used evaluations. That should ensure that generated columns that evaluate to NULL will (in most cases) work as intended. The cases where they don't are not really related to the generated columns implementation, but to the fact that those evaluations would fail anyways before this change. So if you were to for example do a SELECT a*b FROM Foo where some values of a or b are null, the select would fail.

@hengfengli Would you mind taking a quick look at this as well?

// generated columns with fresh data.
pk := r[:t.pkCols]
rowNum, found := t.rowForPK(pk)
// this should never fail as the row was just inserted
Copy link
Contributor

@hengfengli hengfengli Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this -> This

Copy link
Contributor

@hengfengli hengfengli Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the missing period at the end ... inserted.

row: row,
}

// TODO: We would need to do a topological sort on dependencies (i.e. what other columns the expression references)
Copy link
Contributor

@hengfengli hengfengli Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a convention to limit the line length of comments to 80 characters.

@@ -643,6 +677,9 @@ func (t *table) addColumn(cd spansql.ColumnDef, newTable bool) *status.Status {
Name: cd.Name,
Type: cd.Type,
NotNull: cd.NotNull,
// TODO: We should figure out what columns the Generator expression relies on and check validate it at this time
Copy link
Contributor

@hengfengli hengfengli Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please well format the comment.

t.Errorf("Read rows failed: %v", err)
}

// Great writer has nil because NumSongs is nil
Copy link
Contributor

@hengfengli hengfengli Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be Poor writer, right?

@hengfengli
Copy link
Contributor

hengfengli commented Sep 17, 2021

@olavloite I saw some of your comments in the tests are not resolved yet. Do you think they should be fixed before merging?

@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2021
@olavloite
Copy link
Contributor

olavloite commented Sep 20, 2021

@olavloite I saw some of your comments in the tests are not resolved yet. Do you think they should be fixed before merging?

I've added the missing error message and missing delete test.

My comment on tests for dropping and altering columns is void, as we have removed the check altogether and added it to the list of limitations in the README.

So as far as I'm concerned, this is now good to be merged once CI is green.

@hengfengli hengfengli merged commit 324d11d into googleapis:master Sep 20, 2021
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Cloud Spanner API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants