Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

Closed
3 tasks done
WilsonSunBritten opened this issue Jul 29, 2022 · 4 comments · Fixed by #3770
Closed
3 tasks done
Labels
enhancement New feature or request

Comments

@WilsonSunBritten
Copy link

Search before asking

  • I searched the issues and found no similar issues.

Description

large_file_skip_char_limit can prevent sqlfluff from hanging potentially forever on very large files, however when I timed the linting of two 2MB DBCR files it took over 2 minutes.

I suspect the issue is the current order of operations, as it seems files are fully loaded into memory and parsed for sqlfluff commands before this large file check occurs. If large file checking occurred earlier processing could go much quicker.

Current workaround: Manually adding large files to .sqlfluffignore

Use case

As a user with large sql migrations I want my linting too to run quickly while properly ignoring overly large files.

Dialect

N/A

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

@WilsonSunBritten WilsonSunBritten added the enhancement New feature or request label Jul 29, 2022
@WilsonSunBritten
Copy link
Author

Early proposal: during _load_raw_file_and_config, check for large_file_skip_char_limit config in the main config(not the current file to be loaded), if present, check file size prior to opening and fail if it exceeds sizing limit.

@barrywhart
Copy link
Member

What is a DBCR file? About the file being 2MB, is that before or after the templating has been applied?

@WilsonSunBritten
Copy link
Author

@barrywhart Sorry DBCR is more internal lingo. Think using a 100MB sql file to backfill a database with several years worth of data. Before templating is applied, or no templating happening at all, it can be quite a bit of raw data.

There are certainly other ways to accomplish it, such as using sql to read in a separate csv file, but it's a general enough, simple enough use case.

@alanmcruickshank
Copy link
Member

I think checking for the large file limit as early as possible makes sense. Ideally we'd do it when initially loading the file, using os.path.getsize or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants