Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule L003 performance: Cache the line number and last newline position #3060

Merged
merged 1 commit into from Apr 9, 2022

Conversation

barrywhart
Copy link
Member

@barrywhart barrywhart commented Apr 9, 2022

Brief summary of the change made

On each new line, L003 is rescanning the entire file from the beginning to the current line. This results in O(n^2) performance, i.e. really slow on large files. This small change caches the current line number and segment index (within raw_stack). I confirmed using line_profiler that this eliminates a lot of processing on a large file.

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@barrywhart
Copy link
Member Author

@OTooleMichael: This builds on your recent L003 changes, if you'd like to review.

@codecov
Copy link

codecov bot commented Apr 9, 2022

Codecov Report

Merging #3060 (84b6573) into main (a72ec67) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #3060   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          164       164           
  Lines        12057     12061    +4     
=========================================
+ Hits         12057     12061    +4     
Impacted Files Coverage Δ
src/sqlfluff/rules/L003.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a72ec67...84b6573. Read the comment docs.

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@barrywhart barrywhart merged commit 06fe4d6 into sqlfluff:main Apr 9, 2022
@OTooleMichael
Copy link
Contributor

Interesting, I don't see how the old code wasn't avoiding the loops in the same way? It was certainly what I intended and checked for (but didnt test as I figrured it was internal) - the line number is a good idea :) LGTM

@barrywhart
Copy link
Member Author

@OTooleMichael: I only found this because of the profiler. It was showing tons of hits on this line.

Some interesting core linter performance work going on in another open PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants