Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

WilsonSunBritten · 2022-07-29T19:21:43Z

Search before asking

I searched the issues and found no similar issues.

Description

large_file_skip_char_limit can prevent sqlfluff from hanging potentially forever on very large files, however when I timed the linting of two 2MB DBCR files it took over 2 minutes.

I suspect the issue is the current order of operations, as it seems files are fully loaded into memory and parsed for sqlfluff commands before this large file check occurs. If large file checking occurred earlier processing could go much quicker.

Current workaround: Manually adding large files to .sqlfluffignore

Use case

As a user with large sql migrations I want my linting too to run quickly while properly ignoring overly large files.

Dialect

N/A

Are you willing to work on and submit a PR to address the issue?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

WilsonSunBritten · 2022-07-29T19:24:05Z

Early proposal: during _load_raw_file_and_config, check for large_file_skip_char_limit config in the main config(not the current file to be loaded), if present, check file size prior to opening and fail if it exceeds sizing limit.

barrywhart · 2022-08-03T20:00:00Z

What is a DBCR file? About the file being 2MB, is that before or after the templating has been applied?

WilsonSunBritten · 2022-08-03T23:59:47Z

@barrywhart Sorry DBCR is more internal lingo. Think using a 100MB sql file to backfill a database with several years worth of data. Before templating is applied, or no templating happening at all, it can be quite a bit of raw data.

There are certainly other ways to accomplish it, such as using sql to read in a separate csv file, but it's a general enough, simple enough use case.

alanmcruickshank · 2022-08-04T08:19:11Z

I think checking for the large file limit as early as possible makes sense. Ideally we'd do it when initially loading the file, using os.path.getsize or similar.

WilsonSunBritten added the enhancement New feature or request label Jul 29, 2022

alanmcruickshank mentioned this issue Aug 22, 2022

Add a file size check in bytes #3770

Merged

alanmcruickshank closed this as completed in #3770 Aug 23, 2022

fmms mentioned this issue Sep 11, 2022

large_file_skip_char_limit not working #3820

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

WilsonSunBritten commented Jul 29, 2022

WilsonSunBritten commented Jul 29, 2022

barrywhart commented Aug 3, 2022

WilsonSunBritten commented Aug 3, 2022

alanmcruickshank commented Aug 4, 2022

Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

Large files take several minutes before large_file_skip_char_limit stops parsing them #3693

Comments

WilsonSunBritten commented Jul 29, 2022

Search before asking

Description

Use case

Dialect

Are you willing to work on and submit a PR to address the issue?

Code of Conduct

WilsonSunBritten commented Jul 29, 2022

barrywhart commented Aug 3, 2022

WilsonSunBritten commented Aug 3, 2022

alanmcruickshank commented Aug 4, 2022