- Author: Ben Du
- Date: 2021-02-17 09:55:38
- Title: Tips on Sqlfluff
- Slug: tips-on-sqlfluff
- Category: Computer Science
- Tags: Computer Science, programming, sqlfluff, SQL, lint, linter, format, issue, error
- Modified: 2021-04-17 09:55:38


## Installation

In [2]:
pip3 install -U sqlfluff

Collecting sqlfluff
  Downloading sqlfluff-0.5.2-py3-none-any.whl (278 kB)
[K     |████████████████████████████████| 278 kB 2.9 MB/s 
Collecting pytest
  Downloading pytest-6.2.3-py3-none-any.whl (280 kB)
[K     |████████████████████████████████| 280 kB 42.1 MB/s 
[?25hCollecting cached-property
  Using cached cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)
Collecting configparser
  Downloading configparser-5.0.2-py3-none-any.whl (19 kB)
Collecting typing-extensions
  Using cached typing_extensions-3.7.4.3-py3-none-any.whl (22 kB)
Collecting diff-cover>=2.5.0
  Downloading diff_cover-5.0.1-py3-none-any.whl (44 kB)
[K     |████████████████████████████████| 44 kB 11.3 MB/s 
[?25hCollecting colorama>=0.3
  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting bench-it
  Using cached bench_it-1.0.1-py2.py3-none-any.whl (19 kB)
Collecting oyaml
  Using cached oyaml-1.0-py2.py3-none-any.whl (3.0 kB)
Collecting appdirs
  Using cached appdirs-1.4.4-py2.py3-none-any.whl (

## General Tips and Traps 

1. SQLfluff supports Jinja template! 

## Safe to fix
2. L001: Unneccessary trailing whitespace.
3. L008: Commas should be followed by a single whitespace unless followed by a comment.


## Ignore 

L:  75 | P:   5 |  LXR | Unable to lex characters: ''${candidat'...'

## Parsing Error

1. PRS: Found unparsable section: '-- /*Select list of users to choose from...'

## Configuration

1. `.sqlfluff`


2. `.sqlfluffignore`

2. You can customize linting and fixing of SQL files by customizing rules.
    Please refer to 
    [Rules Reference](https://docs.sqlfluff.com/en/stable/rules.html#ruleref)
    for a complete list of rules.

## Command-line APIs

In [3]:
!sqlfluff lint --help

Usage: sqlfluff lint [OPTIONS] [PATHS]...

  Lint SQL files via passing a list of files or using stdin.

  PATH is the path to a sql file or directory to lint. This can be either a
  file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single
  ('-') character to indicate reading from *stdin* or a dot/blank ('.'/' ')
  which will be interpreted like passing the current working directory as a
  path argument.

  Linting SQL files:

      sqlfluff lint path/to/file.sql     sqlfluff lint
      directory/of/sql/files

  Linting a file via stdin (note the lone '-' character):

      cat path/to/file.sql | sqlfluff lint -     echo 'select col from tbl'
      | sqlfluff lint -

Options:
  -n, --nocolor                   No color - if this is set then the output
                                  will be without ANSI color codes.

  -v, --verbose                   Verbosity, how detailed should the output
                                  be. This is *stackable*, so `-vv` is more
   

In [None]:
sqlfluff lint test.sql

In [None]:
sqlfluff fix test.sql

## Customized Fix Rules for sqlfluff

Below is my customized fix rules for sqlfluff. 
It uses upper case for SQL keywords 
and lower case for identifiers.
```
[sqlfluff]
verbose = 0
nocolor = False
dialect = ansi
templater = jinja
rules = None
exclude_rules = None
recurse = 0
output_line_length = 80
runaway_limit = 10
ignore_templated_areas = True
# Comma separated list of file extensions to lint.

# NB: This config will only apply in the root folder.
sql_file_exts = .sql,.sql.j2,.dml,.ddl

[sqlfluff:indentation]
indented_joins = False
template_blocks_indent = True

[sqlfluff:templater]
unwrap_wrapped_queries = True

[sqlfluff:templater:jinja]
apply_dbt_builtins = True

[sqlfluff:templater:jinja:macros]
# Macros provided as builtins for dbt projects
dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %}
dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %}
dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %}
dbt_var = {% macro var(variable) %}item{% endmacro %}
dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}

# Some rules can be configured directly from the config common to other rules.
[sqlfluff:rules]
tab_space_size = 4
max_line_length = 80
indent_unit = space
comma_style = trailing
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all

# Some rules have their own specific config.
[sqlfluff:rules:L003]
lint_templated_tokens = True

[sqlfluff:rules:L010]  # Keywords
capitalisation_policy = upper

[sqlfluff:rules:L014]  # Unquoted identifiers
extended_capitalisation_policy = lower

[sqlfluff:rules:L016]
ignore_comment_lines = False

[sqlfluff:rules:L029]  # Keyword identifiers
unquoted_identifiers_policy = aliases

[sqlfluff:rules:L030]  # Function names
capitalisation_policy = lower

[sqlfluff:rules:L038]
select_clause_trailing_comma = forbid

[sqlfluff:rules:L040]  # Null & Boolean Literals
capitalisation_policy = upper

[sqlfluff:rules:L042]
# By default, allow subqueries in from clauses, but not join clauses.
forbid_subquery_in = join

[sqlfluff:rules:L047]  # Consistent syntax to count all rows
prefer_count_1 = False
```

## Python APIs

In [1]:
import sqlfluff

Parse a quite complex query.

In [36]:
sqlfluff.parse("select c1 from db.t1").tree.to_tuple()

('file',
 (('statement',
   (('select_statement',
     (('select_clause',
       (('keyword', ()),
        ('whitespace', ()),
        ('select_clause_element',
         (('column_reference', (('identifier', ()),)),)))),
      ('whitespace', ()),
      ('from_clause',
       (('keyword', ()),
        ('whitespace', ()),
        ('from_expression',
         (('from_expression_element',
           (('table_expression',
             (('table_reference',
               (('identifier', ()),
                ('dot', ()),
                ('identifier', ()))),)),)),)))))),)),))

In [2]:
sql = """
    WITH foo AS (
        SELECT * FROM bar.bar
    ),
    baz AS (
        SELECT * FROM bap
    )
    SELECT 
        * 
    FROM 
        foo
    INNER JOIN 
        baz 
    USING (
        user_id
    )
    INNER JOIN 
        ban 
    USING (
        user_id
    )
    """
parsed = sqlfluff.parse(sql)

In [3]:
type(parsed)

sqlfluff.core.linter.ParsedString

In [5]:
parsed.count("table_reference")

0

In [6]:
parsed.tree.get_table_references()

{'ban', 'bap', 'bar.bar'}

In [18]:
parsed.index("SELECT")

ValueError: tuple.index(x): x not in tuple

In [22]:
parsed.time_dict

{'templating': 0.005710315000001742,
 'lexing': 0.004631195999998283,
 'parsing': 0.031160859000003427}

In [24]:
parsed.tree

<FileSegment: ([0](1, 1, 1))>

In [27]:
parsed.tree.allow_empty

True

In [30]:
parsed.tree.as_record?

[0;31mSignature:[0m [0mparsed[0m[0;34m.[0m[0mtree[0m[0;34m.[0m[0mas_record[0m[0;34m([0m[0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return the segment as a structurally simplified record.

This is useful for serialization to yaml or json.
kwargs passed to to_tuple
[0;31mFile:[0m      /usr/local/lib/python3.9/site-packages/sqlfluff/core/parser/segments/base.py
[0;31mType:[0m      method


In [25]:
dir(parsed.tree)

['__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_comments',
 '_is_expandable',
 '_name',
 '_non_comments',
 '_preface',
 '_realign_segments',
 '_reconstruct',
 '_suffix',
 'allow_empty',
 'apply_fixes',
 'as_record',
 'can_start_end_non_code',
 'comment_seperate',
 'expand',
 'get_child',
 'get_children',
 'get_end_pos_marker',
 'get_start_pos_marker',
 'get_table_references',
 'invalidate_caches',
 'is_code',
 'is_comment',
 'is_expandable',
 'is_meta',
 'is_optional',
 'is_raw',
 'is_segment',
 'is_type',
 'is_whitespace',
 'iter_patches',
 'iter_raw_seg',
 'iter_segments',
 'iter_unparsables',
 'match',
 'match_grammar',
 'matched_length',
 'nam

In [43]:
parsed.tree.segments

(<newline_RawSegment: ([0](1, 1, 1)) '\n'>,
 <whitespace_RawSegment: ([1](1, 2, 1)) '    '>,
 <StatementSegment: ([5](1, 2, 5))>,
 <newline_RawSegment: ([271](1, 21, 6)) '\n'>,
 <whitespace_RawSegment: ([272](1, 22, 1)) '    '>)

In [39]:
print(parsed.tree.stringify())

[0](1, 1, 1)        |file:
[0](1, 1, 1)        |    newline:                                                  '\n'
[1](1, 2, 1)        |    whitespace:                                               '    '
[5](1, 2, 5)        |    statement:
[5](1, 2, 5)        |        with_compound_statement:
[5](1, 2, 5)        |            keyword:                                          'WITH'
[9](1, 2, 9)        |            whitespace:                                       ' '
[10](1, 2, 10)      |            common_table_expression:
[10](1, 2, 10)      |                identifier:                                   'foo'
[13](1, 2, 13)      |                whitespace:                                   ' '
[14](1, 2, 14)      |                keyword:                                      'AS'
[16](1, 2, 16)      |                whitespace:                                   ' '
[17](1, 2, 17)      |                start_bracket:                                '('
[18](1, 2, 18)      |           

In [32]:
import json

In [35]:
print(json.dumps(parsed.tree.to_tuple(), indent=4))

                                             ]
                                                            ]
                                                        ]
                                                    ]
                                                ]
                                            ]
                                        ]
                                    ]
                                ],
                                [
                                    "newline",
                                    []
                                ],
                                [
                                    "whitespace",
                                    []
                                ],
                                [
                                    "end_bracket",
                                    []
                                ]
                            ]
                        ],
                        [
              

In [31]:
parsed.tree.to_tuple()

('file',
 (('newline', ()),
  ('whitespace', ()),
  ('statement',
   (('with_compound_statement',
     (('keyword', ()),
      ('whitespace', ()),
      ('common_table_expression',
       (('identifier', ()),
        ('whitespace', ()),
        ('keyword', ()),
        ('whitespace', ()),
        ('start_bracket', ()),
        ('newline', ()),
        ('whitespace', ()),
        ('select_statement',
         (('select_clause',
           (('keyword', ()),
            ('whitespace', ()),
            ('select_clause_element',
             (('wildcard_expression',
               (('wildcard_identifier', (('star', ()),)),)),)))),
          ('whitespace', ()),
          ('from_clause',
           (('keyword', ()),
            ('whitespace', ()),
            ('from_expression',
             (('from_expression_element',
               (('table_expression',
                 (('table_reference',
                   (('identifier', ()),
                    ('dot', ()),
                    ('ident

Extract table names. 
SQLfluff looks for all table references which are NOT CTE aliases.

In [5]:
parsed.tree.get_table_references()

{'ban', 'bap', 'bar.bar'}

## References 

https://github.com/sqlfluff/sqlfluff/blob/master/examples/03_extracting_references.py

https://github.com/sqlfluff/sqlfluff/tree/master/examples

[Rules Reference](https://docs.sqlfluff.com/en/stable/rules.html#ruleref)
