Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spanner]: Statements that start with a comment or statement hint are not recognized #6847

Closed
olavloite opened this issue Jul 31, 2021 · 10 comments · Fixed by #6947 or #7032
Closed

[Spanner]: Statements that start with a comment or statement hint are not recognized #6847

olavloite opened this issue Jul 31, 2021 · 10 comments · Fixed by #6947 or #7032
Assignees
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@olavloite
Copy link
Contributor

Statements that are specified as only text and that start with a comment or a statement hint is not recognized as one of the valid Spanner command types. This means that it is not possible to use for example the tagging feature in Linq when working with Spanner through Entity Framework Core.

Also mentioned in #5857

@olavloite olavloite self-assigned this Jul 31, 2021
@olavloite olavloite added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 31, 2021
olavloite added a commit that referenced this issue Jul 31, 2021
SpannerCommandTextBuilder will try to determine the type of command based on the first
word of the command text if no type is specified. This works well as long as the command
text does not contain any comments or statement hints at the beginning of the command text.
This change adds a simple parser that will remove any comments and statement hints before
checking the type of command. The command that is stripped of comments and hints is only
used for determining the type of command. The original command will be passed on to
Cloud Spanner.

Fixes #6847
@amanda-tarafa
Copy link
Contributor

There are two distinct use cases covered by this issue, hints and comments (for tags). But I think there's no need to add any code for supporting either.

You can use SpannerCommandTextBuilder.CreateDmlTextBuilder and SpannerCommandTextBuilder.CreateSelectTextBuilder which both receive a string only, which is only validated for it not being null or empty. I'm guessing that client side (EF Core or any other client) you have easier wayst than stripping comments for knowing whether you need to call one or the other method. The command will be created with whatever text you passed in, the type of the command will be forced according to me method you called, and if the command is not right, the backend will fail.

For general context, we have always tried to avoid parsing commands as much as possible, just the bare minimum that we need to determine the type of command. For the most part, we let the backend fail if the command is poorly written. And in general, I think it is best if we keep it that way.

@olavloite
Copy link
Contributor Author

There are two distinct use cases covered by this issue, hints and comments (for tags). But I think there's no need to add any code for supporting either.

No, I wouldn't see this as two different use cases, but two different underlying causes of the same problem: Some valid Spanner statements are currently being rejected by the client library as being invalid, even though the statements are valid. This is caused by the fact that the client library looks at the first word in the statement to determine the kind of statement, without taking into consideration that the first keyword might be preceded by other (non-empty) text. There are two types of text that might precede a statement:

  • Comments (all statements)
  • Hints (DML and query statements, but not DDL statements)

So the following valid Spanner statements are all rejected by the client library as being invalid if you use an untyped SpannerCommand:

-- This is a comment
SELECT 1
@{OPTIMIZER_VERSION=2}
SELECT * FROM Foo
/* Create the Foo table */
CREATE TABLE Foo (Id INT64) PRIMARY KEY (Id)

You can use SpannerCommandTextBuilder.CreateDmlTextBuilder and SpannerCommandTextBuilder.CreateSelectTextBuilder which both receive a string only, which is only validated for it not being null or empty.

In a hand-written client where you know what SQL is being used, that is certainly the easiest way of doing it. That is however not always doable when the SQL string is generated or coming from user input. It is also not workable if the command that is being used is created using the ADO.NET CreateDbCommand.

The command will be created with whatever text you passed in, the type of the command will be forced according to me method you called, and if the command is not right, the backend will fail.

That is true, and the intention here is also not to try to validate the SQL string itself, only to correctly determine the type of statement that is being executed in all cases. If the SQL string itself is invalid, it should as far as possible be left to the backend to say so.

Note though that the current implementation will also fail in the following scenario:

var cmd = connection.CreateDmlCommand(...);
cmd.CommandText = "/* Some comment */ INSERT INTO Foo (Id) VALUES (1)";

That is because of this line:

The above specific problem is obviously easily fixable by removing the .FromCommandText(..) in the setter, but illustrates the possible subtle problems that can occur because the SpannerCommandTextBuilder.FromCommandText(..) method does not always return the correct value.

For general context, we have always tried to avoid parsing commands as much as possible, just the bare minimum that we need to determine the type of command. For the most part, we let the backend fail if the command is poorly written. And in general, I think it is best if we keep it that way

I completely agree with that, but I think that the current implementation goes below the bare minimum of what is needed to correctly determine the type of command, especially in cases where the SQL string is generated or coming from user input, or when the DbCommand is dynamically created by a framework.

@amanda-tarafa
Copy link
Contributor

Some valid Spanner statements are currently being rejected by the client library as being invalid, even though the statements are valid.

No they are not, they are being rejected by a specific utility method which is not the only or the main way to build commands. This method is actually fairly well documented in what it accepts, whith expressions describing how the command string should be written.
All valid Spanner commands can be built using the other methods (save for the bug you found, which would need fixing of course).

For me, in the end is just a matter of balancing feature support vs. maintainability. Parsing and stripping comments doesn't make the cut for me tbh (I'd be more onboard with parsing hints, given that they are really part of the command and are fixed values, etc., but still I'd like to see more demand for it). For the use cases you mentioned:

  • Frameworks that generate commands know exactly what they are generating so can call one or the other method.
  • We have never seen demand for supporting the use case of a user input query with comments, and still, client code can be the one stripping comments to find the command type and using the correct command creation methods in that case, so it's not really unsuported, we just don't help with it.
  • For the tagging use case, which is basically, passing some command metadata in the comments, does Spanner support searching/grouping/identifying executed commands by comment content? This is mostly out of curiosity, but also, if Spanner does not support anything like that, then it really doesn't matter if you send the tags in comments, they won't show anywhere nor allow the user to find commands with given tags.

I wouldn't mind adding a method like CreateCommand(string command, SpannerCommandType type, string table = null) that wouldn't validate the command (except for it being null or empty), and would only require the table when the the type is one of the INSERT, UPDATE, ... if that makes things easier.

This is of course my opinion, @jskeet and @skuruppu might think differently?

@amanda-tarafa amanda-tarafa added the api: spanner Issues related to the Spanner API. label Aug 3, 2021
@olavloite
Copy link
Contributor Author

For me the core if this specific issue is not that there are possible workarounds for most use cases, sometimes with very little effort and sometimes with a lot more effort. For me the core of the issue is that the following piece of code will fail with a XYZ is not a valid Spanner command when called with any of the valid statements below:

var connection = new SpannerConnection(..);
var cmd = connection.CreateCommand();
cmd.CommandText = "..."
@{OPTIMIZER_VERSION=1} SELECT * FROM Foo;

-- Tagged query
SELECT * FROM Foo;

/* Create the foo table */
CREATE TABLE Foo (..);

# Insert a new row
INSERT INTO Foo (Id) VALUES (1);

The fact that you can work around it by specifically creating a Select, Dml or Ddl command is of course good, but in my opinion this is not something that should be documented as something that you have to work around to get working.

For the tagging use case, which is basically, passing some command metadata in the comments, does Spanner support searching/grouping/identifying executed commands by comment content?

Yes, but the related issue does not use that. That issue uses Linq tagging. The problem popped up when executing the following piece of code using EF Core:

        var singersOrderedByFullName = context.Singers
            .TagWith("Use hint: force_index FullName")
            .OrderBy(s => s.FullName)
            .AsAsyncEnumerable();

The tag will be added as a comment to the generated SQL string. To be clear: The generation in this case happens in the core EF Core library, and not in the Spanner provider. It is possible to override that, but we try to prevent overriding more than necessary, to keep the implementation as close to the default implementation as possible.

The tag (comment) is then picked up by an interceptor that adds a query hint to the statement. This was while adding a sample based on the generic EF Core documentation on how to use query hints: https://docs.microsoft.com/en-us/ef/core/logging-events-diagnostics/interceptors#example-command-interception-to-add-query-hints

@skuruppu
Copy link
Contributor

skuruppu commented Aug 4, 2021

I guess when you're using a driver/ORM, there are certain restrictions on what the SQL syntax can be. But I think in this case, query hints are a key part of the Spanner SQL syntax so it is expected that our users can prefix their queries with these hints. So we have to find ways to support these through all of our clients, drivers and ORMs.

For something like OPTIMIZER_VERSION, sure, we have client library support to set it in several other ways. But for something like USE_ADDITIONAL_PARALLELISM and LOCK_SCANNED_RANGES (and there are more hints coming), and table hints, if users can't set it in the query, they can't use that functionality. So I do think we need to allow users to set these, preferably without having to use workarounds. While these hints are not used broadly, they get used by customers who are trying to tune the performance of their queries, so they're likely to be the ones that raise support tickets if they can't use it.

As for comments, ideally we'd support them as well. But if it's not generally something that's support by ADO.NET then maybe users know that there is a limitation.

does Spanner support searching/grouping/identifying executed commands by comment content?

Not so much for identifying queries. The hints mostly change the execution behavior of queries.

@amanda-tarafa
Copy link
Contributor

Just to make this point very clear, with the library as is (except for a bug that's easily fixable) both hints and comments can be used, just not by building an untyped command. If we change @olavloite sample code (two comments up) to the following, then it will work. I agree @olavloite code is preferable, I'm not sure about the burden of, specially, having to parse and strip comments.

var connection = new SpannerConnection(..);
var cmdBuilder = SpannerCommandTextBuilder.CreateSelectTextBuilder(...); // Or the appropiate create method.
var cmd = new SpannerCommand(cmdBuilder, connection);

I guess when you're using a driver/ORM, there are certain restrictions on what the SQL syntax can be.

Yes, exactly. I wouldn't expect, or at least I wouldn't be surprised, if a DB driver does not allow me to include comments on the commands, including comments on commands for a driver to execute seems somehow futile? But again, if you really want to add comments, you can use anything but the untyped command.

Hints are a different thing, because they do affect the command execution outcome. Luckily, they are also so much easier to parse than comments, given that they are known values. I still have a question: Are hints at the beginning of the command only possible for queries? If the answer is yes, then it makes it even easier to support on the untyped command, as we only have to look for a { to know that the command is a query.

@olavloite , as per your comment:

The tag (comment) is then picked up by an interceptor that adds a query hint to the statement.

I guess then that the interceptor, which already needed to parse the comment to know which hint to add, can then strip the comment out as well. I don't think you need to override command generation or anything like that.
Then, there's a point in which your code receives the string command again (without comments as they were stripped by the interceptor, but with hints), and creates the SpannerCommand from that. If we add hint support to the untyped command (which I'm more onboard with) then you can continue to use untyped commands, but if we don't, do you, at that point, know whether you are creating a query command or a dml? Or would you have to parse the string to find out?

@olavloite
Copy link
Contributor Author

I'm not sure it's clear to everyone here that the code that is calling Connection.CreateCommand() is generally speaking not in the Spanner EF Core provider, but in the core Entity Framework code. And it's not one line of code, but many spread out through the framework. One random example (first one that turned up during a search on GitHub):

https://github.com/dotnet/efcore/blob/85aaf405c1ab1b63f12f054b26b85e7039475717/src/EFCore.Relational/Storage/RelationalCommand.cs#L701

EF Core will obviously use the standard ADO.NET functionality, which means calling the Connection.CreateCommand method and then setting the command text.

Changing my sample above in our own code to call Connection.CreateSelectCommand instead of Connection.CreateCommand is of course trivial. Looking up every place in the EF Core generic implementation for all places where Connection.CreateCommand is called, figuring out what type of command is needed and figuring out how we can override that piece of code, is not trivial. And it's something that we would have to do for every new version of Entity Framework in case some new pieces of code also call Connection.CreateCommand.

Furthermore, as mentioned above: We do not always know what type of command it is. The most obvious example is when using migrations. Migrations are a core piece of functionality in Entity Framework for updating a database from one version to another. Many updates can be done using standard code (for example adding a column), but sometimes users cannot or will not use these standard operations, and instead define the migration as a set of SQL statements: https://docs.microsoft.com/en-us/ef/core/managing-schemas/migrations/operations#using-migrationbuildersql

Documenting such a SQL script that is part of a migration with comments is not uncommon.

I guess when you're using a driver/ORM, there are certain restrictions on what the SQL syntax can be.

Yes, exactly. I wouldn't expect, or at least I wouldn't be surprised, if a DB driver does not allow me to include comments on the commands, including comments on commands for a driver to execute seems somehow futile? But again, if you really want to add comments, you can use anything but the untyped command.

Sorry, but I strongly disagree with that. I find it highly surprising if a DB driver would not allow me to include comments in statements (see the example for migrations above). The comments are also not futile for generated queries/statements in the application, as they are sometimes used to debug code, and when the SQL statement is generated by a framework it can be helpful to be able to include a comment in a query that is generated in a specific place in the application, so to be able to determine that that piece of code is actually executed, for example by picking it up in an interceptor.

I guess then that the interceptor, which already needed to parse the comment to know which hint to add, can then strip the comment out as well.

Nope, that was my first attempt, but that does not work either. The command interceptor will receive the command object (and not only the SQL string). That means that the command will be created by EF Core by calling Connection.CreateCommand() and then setting the command text. The latter will fail if the command starts with a statement hint or comment, and the command interceptor will never be triggered as the error happens before that.

Are hints at the beginning of the command only possible for queries?

Hints are allowed for both queries and DML statements, but not for DDL statements.

Note that this problem with comments and statement hints is a general problem for generic database drivers with Cloud Spanner, as Cloud Spanner uses different endpoints for different types of statements. The stripping of comments and statement hints is therefore already included in:

@amanda-tarafa
Copy link
Contributor

I'm not sure it's clear to everyone here that the code that is calling Connection.CreateCommand() is generally speaking not in the Spanner EF Core provider, but in the core Entity Framework code. And it's not one line of code, but many spread out through the framework. One random example (first one that turned up during a search on GitHub):

EF Core will obviously use the standard ADO.NET functionality, which means calling the Connection.CreateCommand method and then setting the command text.

I wasn't aware of EF Core relying so heavily on ADO.NET, no. I've assumed that you can build an EF Core provider for a system that does not have an ADO.NET provider without needing to override half of EF Core, and that it was the provider developer's choice to use an underlying ADO.NET provider or not. It does makes sense the EF Core makes things easier if you have an ADO.NET provider, but I didn't think they would be hard if you didn't. This, and your comment on the interceptor being executed after the command has been created and not just with the raw string does makes things different.

Mind you, I'm still very wary of parsing comments and I still think it is not a generally useful feature (and that no one has ever asked for before now), and I would very much prefer not to do it. But @olavloite latest comment does seem to shift the point of most complexity. @jskeet what do you think?

Note that this problem with comments and statement hints is a general problem for generic database drivers with Cloud Spanner, as Cloud Spanner uses different endpoints for different types of statements.

Yep, I've been thinking that as well, not that we can do anything about it unfortunately.

@skuruppu
Copy link
Contributor

Thanks @amanda-tarafa and @olavloite for the detailed discussion.

@jskeet it would be great to hear your thoughts as well on this, especially to enable us to move forward with supporting the hints asap.

@amanda-tarafa
Copy link
Contributor

I'm re-reviewing now, I might end up writing a short PR as an alternative proposal.

amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this issue Aug 11, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this issue Aug 12, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this issue Aug 12, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this issue Aug 24, 2021
Changes in Google.Cloud.Spanner.Data version 3.12.0:

- [Commit 7c6a6f1](googleapis@7c6a6f1): feat: add support for JSON data type in Spanner ([issue 6390](googleapis#6390))
- [Commit ac367e2](googleapis@ac367e2): feat: Regenerate all APIs to support self-signed JWTs
- [Commit 61938b6](googleapis@61938b6):
  - feat(Spanner): Support comments and statement hints in untyped commands.
  - Alternative to [issue 6848](googleapis#6848)
  - Closes [issue 6847](googleapis#6847)
- [Commit d26b04c](googleapis@d26b04c): fix: address review comments
- [Commit d2025be](googleapis@d2025be): fix: use logger from SpannerSettings
- [Commit b34f6f4](googleapis@b34f6f4): cleanup: fix comment + remove unnecessary import
- [Commit fc7a41b](googleapis@fc7a41b): test: remove connectionstring tests and add settings test
- [Commit 6016ef0](googleapis@6016ef0):
  - feat: support custom SpannerSettings in SessionPoolManager
  - Support setting custom SpannerSettings when creating a SessionPoolManager.
- [Commit 0ab6b8b](googleapis@0ab6b8b):
  - feat: allow adding an additional version header
  - Allows adding an additional version header to the connection string. This will be
  - added to the `x-goog-api-client` header that is used by the underlying Spanner client.
  - Only a fixed set of values may be set for the header (currently only 'efcore' is
  - allowed), and the property is not intended for public use.
- [Commit 250124f](googleapis@250124f):
  - fix: synchronize access to the underlying transaction for ambient transactions ([issue 6616](googleapis#6616))
  - * fix: synchronize access to the underlying transaction for ambient transactions
  - Synchronizes access to the underlying Spanner transaction to prevent
  - multiple transactions from being created if parallel commands are
  - executed on the ambient transaction.
  - * test: add integration test
  - * fix: use Lazy for initialization
  - * chore: address review comments
  - * fix: remove unnecessary read call

Packages in this release:
- Release Google.Cloud.Spanner.Admin.Database.V1 version 3.12.0
- Release Google.Cloud.Spanner.Admin.Instance.V1 version 3.12.0
- Release Google.Cloud.Spanner.Common.V1 version 3.12.0
- Release Google.Cloud.Spanner.Data version 3.12.0
- Release Google.Cloud.Spanner.V1 version 3.12.0
gcf-merge-on-green bot pushed a commit that referenced this issue Aug 24, 2021
Changes in Google.Cloud.Spanner.Data version 3.12.0:

- [Commit 7c6a6f1](7c6a6f1): feat: add support for JSON data type in Spanner ([issue 6390](#6390))
- [Commit ac367e2](ac367e2): feat: Regenerate all APIs to support self-signed JWTs
- [Commit 61938b6](61938b6):
  - feat(Spanner): Support comments and statement hints in untyped commands.
  - Alternative to [issue 6848](#6848)
  - Closes [issue 6847](#6847)
- [Commit d26b04c](d26b04c): fix: address review comments
- [Commit d2025be](d2025be): fix: use logger from SpannerSettings
- [Commit b34f6f4](b34f6f4): cleanup: fix comment + remove unnecessary import
- [Commit fc7a41b](fc7a41b): test: remove connectionstring tests and add settings test
- [Commit 6016ef0](6016ef0):
  - feat: support custom SpannerSettings in SessionPoolManager
  - Support setting custom SpannerSettings when creating a SessionPoolManager.
- [Commit 0ab6b8b](0ab6b8b):
  - feat: allow adding an additional version header
  - Allows adding an additional version header to the connection string. This will be
  - added to the `x-goog-api-client` header that is used by the underlying Spanner client.
  - Only a fixed set of values may be set for the header (currently only 'efcore' is
  - allowed), and the property is not intended for public use.
- [Commit 250124f](250124f):
  - fix: synchronize access to the underlying transaction for ambient transactions ([issue 6616](#6616))
  - * fix: synchronize access to the underlying transaction for ambient transactions
  - Synchronizes access to the underlying Spanner transaction to prevent
  - multiple transactions from being created if parallel commands are
  - executed on the ambient transaction.
  - * test: add integration test
  - * fix: use Lazy for initialization
  - * chore: address review comments
  - * fix: remove unnecessary read call

Packages in this release:
- Release Google.Cloud.Spanner.Admin.Database.V1 version 3.12.0
- Release Google.Cloud.Spanner.Admin.Instance.V1 version 3.12.0
- Release Google.Cloud.Spanner.Common.V1 version 3.12.0
- Release Google.Cloud.Spanner.Data version 3.12.0
- Release Google.Cloud.Spanner.V1 version 3.12.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
3 participants