Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8646 for faster find of all files #8815

Merged
merged 1 commit into from Sep 30, 2020

Conversation

tleish
Copy link
Contributor

@tleish tleish commented Sep 29, 2020

After sleeping on it, I realized there was a simpler (less code) and slightly faster version of the previous #8806.

By recursively getting all directories and then at each directory level testing Rubocop Exclude on directories. Once directories are found at all levels, we search for files.

In addition, I replaced many of the base_dir string concatenations with the safer File#join.

Benchmarks on a large project removes an additional 500+ milliseconds.

The following flamegraphs illustrate the percentage of processing TargetFinder#target_files_in_dir took in using the command bundle exec rubocop -L to list rubocop files.

Original Flamegraph

rubocop-original

Changes from #8806

rubocop-update01

Changes from this MR

rubocop-update02

Note: I didn't add an entry to the Changelog as a previous one was already added.


Before submitting the PR make sure the following are checked:

  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

…ies first and then apply Rubocop Exclude on directories before finding files
@bbatsov bbatsov merged commit 90d909d into rubocop:master Sep 30, 2020
@bbatsov
Copy link
Collaborator

bbatsov commented Sep 30, 2020

Nicely done! I think it's the first time I see someone sharing flamegraphs in a PR which is extra cool! I always think of @miharekar when I see a flamegraph (a friend of mine, who's a big proponent of the concept).

@tleish tleish deleted the faster-find-files branch September 30, 2020 11:34
Tonkpils added a commit to Tonkpils/rubocop that referenced this pull request Apr 29, 2021
rubocop#8815 introduced a traversal
strategy that used recursion.
rubocop#9703 then fixed an issue with
this traversal which accounted for directories and symlinks.

When a symlink points to a parent directory that contains that symlink
it'll cause this to go into a loop until the filename is too long for
glob to handle.

We prevent this by checking for the inclusion of a symlink's real path
in the base directory's realpath. If the base directory's path starts
with the symlink's destination then we are in a loop and should skip
processing the directory
bbatsov pushed a commit that referenced this pull request May 3, 2021
#8815 introduced a traversal
strategy that used recursion.
#9703 then fixed an issue with
this traversal which accounted for directories and symlinks.

When a symlink points to a parent directory that contains that symlink
it'll cause this to go into a loop until the filename is too long for
glob to handle.

We prevent this by checking for the inclusion of a symlink's real path
in the base directory's realpath. If the base directory's path starts
with the symlink's destination then we are in a loop and should skip
processing the directory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants