Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): support comments and statement hints in untyped commands #6848

Closed
wants to merge 4 commits into from

Conversation

olavloite
Copy link
Contributor

SpannerCommandTextBuilder will try to determine the type of command based on the first word of the command text if no type is specified. This works well as long as the command text does not contain any comments or statement hints at the beginning of the command text.
This change adds a simple parser that will remove any comments and statement hints before checking the type of command. The command that is stripped of comments and hints is only used for determining the type of command. The original command will be passed on to Cloud Spanner.

Fixes #6847

SpannerCommandTextBuilder will try to determine the type of command based on the first
word of the command text if no type is specified. This works well as long as the command
text does not contain any comments or statement hints at the beginning of the command text.
This change adds a simple parser that will remove any comments and statement hints before
checking the type of command. The command that is stripped of comments and hints is only
used for determining the type of command. The original command will be passed on to
Cloud Spanner.

Fixes #6847
@olavloite olavloite requested a review from a team as a code owner July 31, 2021 14:04
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 31, 2021
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Jul 31, 2021
@olavloite olavloite requested a review from skuruppu July 31, 2021 14:04
olavloite added a commit to googleapis/dotnet-spanner-entity-framework that referenced this pull request Aug 1, 2021
Adds a sample for how to use query hints with Cloud Spanner in Entity Framework Core.

The sample tags a query with a specific text to indicate that the query should be intercepted
by a command interceptor that will add a query hint to the query. This interceptor currently
also removes the tag (comment) from the query, as the current version of the client library
does not support untyped commands that start with either a comment or a statement hint.

Once googleapis/google-cloud-dotnet#6848 has been merged and released,
the sample can be edited so it does not remove the tag from the query. It will then also be
possible to add a sample for a statement hint (as opposed to the current sample which adds a
table hint).
olavloite added a commit to googleapis/dotnet-spanner-entity-framework that referenced this pull request Aug 1, 2021
Adds a sample for how to use query hints with Cloud Spanner in Entity Framework Core.

The sample tags a query with a specific text to indicate that the query should be intercepted
by a command interceptor that will add a query hint to the query.

This sample will only work once googleapis/google-cloud-dotnet#6848
has been merged and released, as the client library currently does not support untyped commands
that start with either a comment or a statement hint.
Copy link
Contributor

@amanda-tarafa amanda-tarafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some code specific comments on the PR.
But I'll leave my general view on this change on the issue #6847 .

@@ -218,9 +220,9 @@ internal string DatabaseToDrop
public static SpannerCommandTextBuilder FromCommandText(string commandText)
{
GaxPreconditions.CheckNotNullOrEmpty(commandText, nameof(commandText));
commandText = commandText.Trim();
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commandText is still being used in many places, let's keep it trimmed.

Suggested change
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
commandText = commandText.Trim();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super tiny nit: And just maybe move the second trim to inside RemoveCommentsAnd... so that we can skip it here.

Suggested change
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Split(new char[0]) splits the string using all whitespace characters.
var commandSections = commandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
var commandSections = trimmedCommandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
if (commandSections.Length < 2)
{
throw new ArgumentException($"'{commandText}' is not a recognized Spanner command.", nameof(commandText));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoul this be:

Suggested change
throw new ArgumentException($"'{commandText}' is not a recognized Spanner command.", nameof(commandText));
throw new ArgumentException($"'{trimmedCommandText}' is not a recognized Spanner command.", nameof(commandText));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Split(new char[0]) splits the string using all whitespace characters.
var commandSections = commandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
var commandSections = trimmedCommandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 262 here (I can't leave a comment there), the code is doing commandText = $"{commandName} {targetTable}"; and then a SpannerCommandTextBuilder is returned based on commandText, which means you loose any comments or hints, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but that is not a command that supports comments or hints anyways. That is a pseudo command text that indicates that the command should use a mutation and not a DML statement, and mutations do not support comments or hints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail if we get a command that does not support comments or hints that has comments or hints? I would be OK if se send it all to the backend for it to fail, but here, we are just silently stripping whatever the user sent. What if they are "expecting" a hint to be used?
Is EF Core generating mutations with comments?

Copy link
Contributor Author

@olavloite olavloite Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In this case the backend will not fail it, as the command text is only used to determine the type of command. The generated mutation will not contain the hint or comment, and will therefore be accepted by the backend. For comments, I would say that it's not really a problem. It basically means that we are introducing support for comments in these mutation command texts. For hints it might be better to fail in the client, as they will otherwise just silently be stripped away without any use.
Note: These mutation command texts is a .NET specific feature. Cloud Spanner (and also the client libraries for the other programming languages) do not support a command text like INSERT <table_name> or UPDATE <table_name>.

Is EF Core generating mutations with comments?

No.

private static readonly char s_doubleQuote = '"';
private static readonly char s_backtickQuote = '`';
private static readonly char s_hyphen = '-';
private static readonly char s_dash = '#';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sharp?

Suggested change
private static readonly char s_dash = '#';
private static readonly char s_sharp = '#';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm open to calling it anything =) But formally this is the number sign, hash (so certainly not dash...) or pound sign, while the sharp sign is slightly different: https://en.wikipedia.org/wiki/Number_sign and https://en.wikipedia.org/wiki/Sharp_(music)

{
if ((c == '\n' || c == '\r') && !isTripleQuoted)
{
throw new SpannerException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this being a parsing issue, it should throw the same exception as the FromCommandText method, which is InvalidOperationException. And applies to the rest of the exceptions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean ArgumentException (which is thrown by FromCommandText), so I've changed it to that.

// keywords SELECT, UPDATE, DELETE, WITH, and that we can keep the check simple by just
// searching for the first occurrence of a keyword that should be preceded by a closing curly
// brace at the end of the statement hint.
int startStatementHintIndex = sql.IndexOf('{');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the curly bracket is in a string literal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It returns the correct value in that case, as a string literal (in a valid SQL statement) can only come after the statement hint and keyword that determines the type of statement (SELECT, UPDATE, ...). That means that the index of the first curly brace will always be after the first keyword, which means that the substring that is taken on line 455 will never contain a closing curly brace. That again means that the parser sees it as an invalid statement and just returns the original sql string.

It is however better to specifically detect this situation and document that it is a valid statement, but that we can ignore the curly braces as they come after the first keyword, and are therefore not part of a statement hint.

Note also that:

  1. We have at this point already removed all comments, so we don't have to care about that (which is also not your point in this case, but just to be complete)
  2. Only a statement hint, as opposed to a table hint, may be added at the beginning of a statement. Statement hints have a very limited number of possible values, non of which include string literals: https://cloud.google.com/spanner/docs/query-syntax#statement-hints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I think this can be made simpler, just for the purposes of maintenance.

  • If there's no { at the start of the command, we just assume it has not hints and return it as is, caller code or backend will handle the failure.
  • If there was a { at the start then:
    • We find and strip everything from the start until the first }.
    • If there are no } we return as is and the caller will fail.

WDYT?

Copy link
Contributor Author

@olavloite olavloite Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this can be done a lot simpler, especially when we assume that the statement is valid and well-formed. I've updated the implementation according to the above suggestion.

Copy link
Contributor

@amanda-tarafa amanda-tarafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm proposing a few simplifications that I think are possible, given that we really just want to know the type of the command and that we can let the backend fail for ill formatted commands.

Hope it makes sense. Else I can try a draft PR next week.

@@ -218,9 +220,9 @@ internal string DatabaseToDrop
public static SpannerCommandTextBuilder FromCommandText(string commandText)
{
GaxPreconditions.CheckNotNullOrEmpty(commandText, nameof(commandText));
commandText = commandText.Trim();
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super tiny nit: And just maybe move the second trim to inside RemoveCommentsAnd... so that we can skip it here.

Suggested change
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText).Trim();
var trimmedCommandText = RemoveCommentsAndStatementHint(commandText);

// Split(new char[0]) splits the string using all whitespace characters.
var commandSections = commandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
var commandSections = trimmedCommandText.Split((char[]) null, StringSplitOptions.RemoveEmptyEntries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail if we get a command that does not support comments or hints that has comments or hints? I would be OK if se send it all to the backend for it to fail, but here, we are just silently stripping whatever the user sent. What if they are "expecting" a hint to be used?
Is EF Core generating mutations with comments?

// keywords SELECT, UPDATE, DELETE, WITH, and that we can keep the check simple by just
// searching for the first occurrence of a keyword that should be preceded by a closing curly
// brace at the end of the statement hint.
int startStatementHintIndex = sql.IndexOf('{');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I think this can be made simpler, just for the purposes of maintenance.

  • If there's no { at the start of the command, we just assume it has not hints and return it as is, caller code or backend will handle the failure.
  • If there was a { at the start then:
    • We find and strip everything from the start until the first }.
    • If there are no } we return as is and the caller will fail.

WDYT?

/// Throws ArgumentException if the sql string contains an unclosed
/// literal or unclosed quoted identifier
/// </exception>
internal static string RemoveComments(string sql)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we can make this simpler as well:

  • We have 2 kinds of comments, line comments, starting with -- for instance, and block comments /* */. And we only care about those at the beginning of the command.

In a loop:

  • If the command string starts with any of the line comment starters --, #, or others, remove the entire line and loop back.
  • If the commnd string starts with any of the block comment starterts, /* (others?) just find the first comment block closer that's not escaped */.
    • If found, remove everything up to and including the closer */ and loop back again.
    • Else, return the original string and let the caller or the backend fail.
  • Return what you have when it doesn't start with a line or block comment.

WDTY? Or am I missing something?

Copy link
Contributor Author

@olavloite olavloite Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break because line comments can also start in the middle of a line, which would not be stripped by such a simplified approach. And I worry that those 'inline line comments' that would be left in the command string could break the rest of the parsing. So the following kind of statements would probably not be parsed correctly:

@{
  OPTIMIZER_VERSION=1 -- Use the first optimizer for now. Change this to @{OPTIMIZER_VERSION=2} once XYZ is done.
}
SELECT *
FROM Singers

@{OPTIMIZER_VERSION=1} -- This is a comment /* this is not a block comment
SELECT *
FROM Singers
/* This is a block comment */

The current statement hint parsing is simplified to just look for the first and next curly brace, but that can only be safely done because we at that moment know that:

  1. String literals cannot occur in that part of a valid SQL string.
  2. All comments in that part of the SQL string have been removed.

/* (others?)

No, /* is the only valid start of a block comment.

Edit

I guess the biggest problem with such a simplified parsing approach would be that both types of comments can contain both the start and end marker of the other type, which could make the parsing break:

If we remove block comments first, the following could not simply be stripped by looking for a /*:

-- This is a comment /* Still a comment, but not a block comment.
SELECT 1

While if we remove line comments first, the following could break the parsing of block comments afterwards:

/*
 This is a block comment
-- This is not a line comment */
SELECT 1

[InlineData("SELECT * FROM Foo@{FORCE_INDEX=`INDEX FOR SELECT`}", "SELECT * FROM Foo@{FORCE_INDEX=`INDEX FOR SELECT`}")]
[InlineData("@{OPTIMIZER_VERSION=1 SELECT 1", "@{OPTIMIZER_VERSION=1 SELECT 1")]
[InlineData("{OPTIMIZER_VERSION=1} SELECT 1", "{OPTIMIZER_VERSION=1} SELECT 1")]
public void RemoveStatementHintAndComments(string sql, string sqlWithoutHintsAndComments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Givent the name, should some of the inputs also have comments, or should we change the test name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've updated the test name, as we are already testing the stripping of comments above. I also changed the name of the method that removes comments and statement hints, as there could be more than one.

amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this pull request Aug 11, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this pull request Aug 12, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this pull request Aug 12, 2021
amanda-tarafa added a commit that referenced this pull request Aug 12, 2021
@jskeet
Copy link
Collaborator

jskeet commented Aug 16, 2021

Presumably this PR can be closed now that #6947 has been merged?

@olavloite
Copy link
Contributor Author

Presumably this PR can be closed now that #6947 has been merged?

Yes

@olavloite olavloite closed this Aug 16, 2021
amanda-tarafa added a commit to amanda-tarafa/google-cloud-dotnet that referenced this pull request Aug 24, 2021
Changes in Google.Cloud.Spanner.Data version 3.12.0:

- [Commit 7c6a6f1](googleapis@7c6a6f1): feat: add support for JSON data type in Spanner ([issue 6390](googleapis#6390))
- [Commit ac367e2](googleapis@ac367e2): feat: Regenerate all APIs to support self-signed JWTs
- [Commit 61938b6](googleapis@61938b6):
  - feat(Spanner): Support comments and statement hints in untyped commands.
  - Alternative to [issue 6848](googleapis#6848)
  - Closes [issue 6847](googleapis#6847)
- [Commit d26b04c](googleapis@d26b04c): fix: address review comments
- [Commit d2025be](googleapis@d2025be): fix: use logger from SpannerSettings
- [Commit b34f6f4](googleapis@b34f6f4): cleanup: fix comment + remove unnecessary import
- [Commit fc7a41b](googleapis@fc7a41b): test: remove connectionstring tests and add settings test
- [Commit 6016ef0](googleapis@6016ef0):
  - feat: support custom SpannerSettings in SessionPoolManager
  - Support setting custom SpannerSettings when creating a SessionPoolManager.
- [Commit 0ab6b8b](googleapis@0ab6b8b):
  - feat: allow adding an additional version header
  - Allows adding an additional version header to the connection string. This will be
  - added to the `x-goog-api-client` header that is used by the underlying Spanner client.
  - Only a fixed set of values may be set for the header (currently only 'efcore' is
  - allowed), and the property is not intended for public use.
- [Commit 250124f](googleapis@250124f):
  - fix: synchronize access to the underlying transaction for ambient transactions ([issue 6616](googleapis#6616))
  - * fix: synchronize access to the underlying transaction for ambient transactions
  - Synchronizes access to the underlying Spanner transaction to prevent
  - multiple transactions from being created if parallel commands are
  - executed on the ambient transaction.
  - * test: add integration test
  - * fix: use Lazy for initialization
  - * chore: address review comments
  - * fix: remove unnecessary read call

Packages in this release:
- Release Google.Cloud.Spanner.Admin.Database.V1 version 3.12.0
- Release Google.Cloud.Spanner.Admin.Instance.V1 version 3.12.0
- Release Google.Cloud.Spanner.Common.V1 version 3.12.0
- Release Google.Cloud.Spanner.Data version 3.12.0
- Release Google.Cloud.Spanner.V1 version 3.12.0
gcf-merge-on-green bot pushed a commit that referenced this pull request Aug 24, 2021
Changes in Google.Cloud.Spanner.Data version 3.12.0:

- [Commit 7c6a6f1](7c6a6f1): feat: add support for JSON data type in Spanner ([issue 6390](#6390))
- [Commit ac367e2](ac367e2): feat: Regenerate all APIs to support self-signed JWTs
- [Commit 61938b6](61938b6):
  - feat(Spanner): Support comments and statement hints in untyped commands.
  - Alternative to [issue 6848](#6848)
  - Closes [issue 6847](#6847)
- [Commit d26b04c](d26b04c): fix: address review comments
- [Commit d2025be](d2025be): fix: use logger from SpannerSettings
- [Commit b34f6f4](b34f6f4): cleanup: fix comment + remove unnecessary import
- [Commit fc7a41b](fc7a41b): test: remove connectionstring tests and add settings test
- [Commit 6016ef0](6016ef0):
  - feat: support custom SpannerSettings in SessionPoolManager
  - Support setting custom SpannerSettings when creating a SessionPoolManager.
- [Commit 0ab6b8b](0ab6b8b):
  - feat: allow adding an additional version header
  - Allows adding an additional version header to the connection string. This will be
  - added to the `x-goog-api-client` header that is used by the underlying Spanner client.
  - Only a fixed set of values may be set for the header (currently only 'efcore' is
  - allowed), and the property is not intended for public use.
- [Commit 250124f](250124f):
  - fix: synchronize access to the underlying transaction for ambient transactions ([issue 6616](#6616))
  - * fix: synchronize access to the underlying transaction for ambient transactions
  - Synchronizes access to the underlying Spanner transaction to prevent
  - multiple transactions from being created if parallel commands are
  - executed on the ambient transaction.
  - * test: add integration test
  - * fix: use Lazy for initialization
  - * chore: address review comments
  - * fix: remove unnecessary read call

Packages in this release:
- Release Google.Cloud.Spanner.Admin.Database.V1 version 3.12.0
- Release Google.Cloud.Spanner.Admin.Instance.V1 version 3.12.0
- Release Google.Cloud.Spanner.Common.V1 version 3.12.0
- Release Google.Cloud.Spanner.Data version 3.12.0
- Release Google.Cloud.Spanner.V1 version 3.12.0
@jskeet jskeet deleted the spanner-parse-comments-and-hints branch October 14, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spanner]: Statements that start with a comment or statement hint are not recognized
4 participants