Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLFluff improperly parses unquoted Bigquery project names with dashes #2771

Closed
2 of 3 tasks
kdw2126 opened this issue Mar 3, 2022 · 2 comments · Fixed by #2842
Closed
2 of 3 tasks

SQLFluff improperly parses unquoted Bigquery project names with dashes #2771

kdw2126 opened this issue Mar 3, 2022 · 2 comments · Fixed by #2842
Labels
bigquery Issues relating to the BigQuery dialect bug Something isn't working

Comments

@kdw2126
Copy link
Contributor

kdw2126 commented Mar 3, 2022

Search before asking

  • I searched the issues and found no similar issues.

What Happened

Whenever I parse valid BigQuery code which contains my organization's primary project names (which contain dashes), SQLFluff treated the dash as an operator and attempts to apply operator fixes to this code.

Expected Behaviour

I would expect that when an unquoted project name with a dash is clearly being cited as part of a FROM statement, SQLFluff properly understands that the dash is part of a project name and does not apply dash-related operator rules (related to subtraction) to it.

Observed Behaviour

Currently, whenever I use a dashed project name without quoting it, the dash is treated as an operator and spaces are added before and after it (which breaks the SQL code I am working on).

How to reproduce

(1). Generate a new BigQuery SQLFluff project.
(2) Lint the following code with it:

SELECT col_foo
FROM foo-bar.foo.bar

(3). Note that the linter output takes the following form, indicating that the dash is being treated like an operator. This can be reproduced from online.sqlfluff.com

Code Line / Position Description
2 / 9 Missing whitespace before -
2 / 9 Missing whitespace after -

Dialect

bigquery

Version

sqlfluff, version 0.10.1
Python 3.8.9
dbt version 1.0.1, although this issue does not involve the dbt templater

Configuration

[sqlfluff]
dialect = bigquery
exclude_rules = L003,L008,L011,L014,L016,L029,L031,L034

[sqlfluff:rules]
max_line_length = 120
comma_style = leading

[sqlfluff:rules:L010]
capitalisation_policy = upper

[sqlfluff:rules:L030]
capitalisation_policy = upper

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

@kdw2126 kdw2126 added the bug Something isn't working label Mar 3, 2022
@tunetheweb tunetheweb added the bigquery Issues relating to the BigQuery dialect label Mar 6, 2022
@tunetheweb
Copy link
Member

More discussion on this here: #2756 (comment)

These dashes are very much the exception so should ensure we don't allow dashes in other unquoted identifiers.

Table name identifiers have additional syntax to support dashes (-) when referenced in FROM and TABLE clauses.

@tunetheweb
Copy link
Member

So turns out we do allow dashses in out BigQuery dialect:

@bigquery_dialect.segment()
class HyphenatedObjectReferenceSegment(ObjectReferenceSegment): # type: ignore
"""A reference to an object that may contain embedded hyphens."""
type = "hyphenated_object_reference"
match_grammar = ansi_dialect.get_segment(
"ObjectReferenceSegment"
).match_grammar.copy()
match_grammar.delimiter = OneOf(
Ref("DotSegment"),
Sequence(Ref("DotSegment"), Ref("DotSegment")),
Sequence(Ref("MinusSegment")),
)

However cause it uses the MinusSegment this is falling foul of L006.

The fix is simple - replicate the MinusSegment but call it something else so L006 doesn't look at it!

PR coming up...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery Issues relating to the BigQuery dialect bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants