Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent spacegrep from skipping files #3418

Merged
merged 7 commits into from Jun 28, 2021
Merged

Prevent spacegrep from skipping files #3418

merged 7 commits into from Jun 28, 2021

Conversation

mjambon
Copy link
Member

@mjambon mjambon commented Jun 25, 2021

The spacegrep command line can take a single file or folders. It skips files silently based on a heuristic that guess whether the files look like programs, based on average line length. This causes markdown files with very long lines to be skipped. The --force option of spacegrep disables this heuristic.

I added test to check the performance on big and/or binary files. It's not that great and will depend on the pattern. Semgrep uses a 10MB limit for all targets, which should be fine for programs with proper indentation but will cause problems on minified files or some binary files that don't contain newlines. Rule authors must ensure that paths are filtered by extension like this rule. If that's already the case in practice, then we're good.

Fixes #2987

PR checklist:

  • changelog is up to date

@mjambon mjambon requested review from aryx and IagoAbal June 28, 2021 21:48
@mjambon mjambon merged commit 692df34 into develop Jun 28, 2021
@mjambon mjambon deleted the mj-skipping-files branch June 28, 2021 22:26
- generic mode on Markdown files with very long lines will now work (#2987)

### Changed
- generic mode: files that don't look like nicely-indented programs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see any check being removed from the code, although perhaps it's a very small change and I'm missing it since there are many other changes to docs etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the --force option that was added to the spacegrep command line... and the option in question was already implemented by spacegrep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

spacegrep error when parsing a markdown file
2 participants