New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_file: don't visit the same directory more than once #284

Merged
merged 1 commit into from Oct 11, 2018

Conversation

Projects
None yet
2 participants
@wjt
Contributor

wjt commented Oct 9, 2018

The directory tree being scanned may in fact be a directed graph, with
cycles, due to following symbolic links. The kernel imposes a limit on
the number of symbolic links followed in a path, beyond which operations
fail with ELOOP. (According to path_resolution(7), the limit is 40 links
on current Linux.) If the graph is very branchy, it may take a very long
time to fail.

For a real-world example, the Meson build system uses gcovr to provide a
'ninja coverage-xml' target. The systemd project (which uses Meson) has
a test suite which generates a fake sysfs tree, with many circular
references, including 64 self-referential ttyX directories:

$ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
$ ls -l build/test/sys/devices/virtual/tty/tty1
total 12
-rw-r--r-- 1 wjt wjt    4 Oct  9 12:16 dev
drwxr-xr-x 2 wjt wjt 4096 Oct  9 12:16 power
lrwxrwxrwx 1 wjt wjt   21 Oct  9 12:16 subsystem -> ../../../../class/tty
-rw-r--r-- 1 wjt wjt   16 Oct  9 12:16 uevent
$ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
$ readlink -f build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
/home/wjt/src/endlessm/systemd/build/test/sys/devices/virtual/tty/tty1

Note that each link refers to a directory which contains all the fake
tty trees again. 64 ** 40 is a very large number! On our build server, I
cancelled the build after 16 hours.

Even in less pathological cases, since search_file() only yields
absolute paths, there's no need to visit a file via more than one path.
Maintain a set of (device, inode) pairs for each directory visited, and
skip processing any directory that's been visited before.

@latk

This comment has been minimized.

Show comment
Hide comment
@latk

latk Oct 9, 2018

Member

This looks like a very sensible improvement! Thank you for discovering the problem and finding a solution :)

Unfortunately, this PR seems to break many tests under Windows, but I don't understand why. Maybe a less fragile solution would be to store the abspath in the set? If absolutely necessary, it would also be acceptable if your check gets disabled under Windows.

For the test that you modified to create many links, it seems that this isn't supported under Windows/MinGW either:

ln: failed to create symbolic link 'loop-1' -> '.': File name too long

It might be necessary to add this test case to the list of xfail tests in gcovr/tests/test_gcovr.py:

     is_windows = platform.system() == 'Windows'
     needs_symlinks = any([
-        name == 'linked' and format == 'html',
+        name == 'linked',
         name == 'filter-relative-lib',
 ])
Member

latk commented Oct 9, 2018

This looks like a very sensible improvement! Thank you for discovering the problem and finding a solution :)

Unfortunately, this PR seems to break many tests under Windows, but I don't understand why. Maybe a less fragile solution would be to store the abspath in the set? If absolutely necessary, it would also be acceptable if your check gets disabled under Windows.

For the test that you modified to create many links, it seems that this isn't supported under Windows/MinGW either:

ln: failed to create symbolic link 'loop-1' -> '.': File name too long

It might be necessary to add this test case to the list of xfail tests in gcovr/tests/test_gcovr.py:

     is_windows = platform.system() == 'Windows'
     needs_symlinks = any([
-        name == 'linked' and format == 'html',
+        name == 'linked',
         name == 'filter-relative-lib',
 ])
@wjt

This comment has been minimized.

Show comment
Hide comment
@wjt

wjt Oct 9, 2018

Contributor

Unfortunately, this PR seems to break many tests under Windows, but I don't understand why. Maybe a less fragile solution would be to store the abspath in the set? If absolutely necessary, it would also be acceptable if your check gets disabled under Windows.

The failures are weird… and I don't quite understand why the test fails under Python 2.7 on 32-bit Windows, but passes on 64-bit Windows with Python 3.7. Perhaps the files are being discovered in a different order because of the self-referential symlink? I'm afraid I don't have a Windows development system on hand to debug interactively. I'll try storing the abspath instead.

Contributor

wjt commented Oct 9, 2018

Unfortunately, this PR seems to break many tests under Windows, but I don't understand why. Maybe a less fragile solution would be to store the abspath in the set? If absolutely necessary, it would also be acceptable if your check gets disabled under Windows.

The failures are weird… and I don't quite understand why the test fails under Python 2.7 on 32-bit Windows, but passes on 64-bit Windows with Python 3.7. Perhaps the files are being discovered in a different order because of the self-referential symlink? I'm afraid I don't have a Windows development system on hand to debug interactively. I'll try storing the abspath instead.

@latk

This comment has been minimized.

Show comment
Hide comment
@latk

latk Oct 9, 2018

Member

Oh, I missed that the tests passed under Python 3.7. This might be one of those cases where standard library support has drastically improved over time. Note that both tests use the same Windows image and MinGW version, merely the Python version is different.

If the failure is specific to 2.7, let's just disable your improvements for Python 2.x under Windows, without a fallback. After all, the code currently works fine (except in extreme cases like those you encountered). Once Python 2.7 hits its EOL in 2020 that check can be removed again.

Member

latk commented Oct 9, 2018

Oh, I missed that the tests passed under Python 3.7. This might be one of those cases where standard library support has drastically improved over time. Note that both tests use the same Windows image and MinGW version, merely the Python version is different.

If the failure is specific to 2.7, let's just disable your improvements for Python 2.x under Windows, without a fallback. After all, the code currently works fine (except in extreme cases like those you encountered). Once Python 2.7 hits its EOL in 2020 that check can be removed again.

@wjt

This comment has been minimized.

Show comment
Hide comment
@wjt

wjt Oct 9, 2018

Contributor

Are you suggesting making the fix itself conditional on the Python version, or skipping the test on 2.x? I think the former, but just want to check.

Contributor

wjt commented Oct 9, 2018

Are you suggesting making the fix itself conditional on the Python version, or skipping the test on 2.x? I think the former, but just want to check.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Oct 9, 2018

Codecov Report

Merging #284 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
+ Coverage   94.22%   94.29%   +0.07%     
==========================================
  Files          14       14              
  Lines        1507     1526      +19     
  Branches      260      263       +3     
==========================================
+ Hits         1420     1439      +19     
  Misses         43       43              
  Partials       44       44
Impacted Files Coverage Δ
gcovr/utils.py 92.25% <100%> (+1.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c0d81f...bd456f5. Read the comment docs.

codecov bot commented Oct 9, 2018

Codecov Report

Merging #284 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
+ Coverage   94.22%   94.29%   +0.07%     
==========================================
  Files          14       14              
  Lines        1507     1526      +19     
  Branches      260      263       +3     
==========================================
+ Hits         1420     1439      +19     
  Misses         43       43              
  Partials       44       44
Impacted Files Coverage Δ
gcovr/utils.py 92.25% <100%> (+1.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c0d81f...bd456f5. Read the comment docs.

@wjt

This comment has been minimized.

Show comment
Hide comment
@wjt

wjt Oct 9, 2018

Contributor

Pushed (and squashed after the tests passed) a new version that disables the new logic under (Windows && Python 2), and factored it out to a small helper class, which I think makes both the platform-conditional behaviour and the main function easier to read.

Contributor

wjt commented Oct 9, 2018

Pushed (and squashed after the tests passed) a new version that disables the new logic under (Windows && Python 2), and factored it out to a small helper class, which I think makes both the platform-conditional behaviour and the main function easier to read.

@latk

latk approved these changes Oct 9, 2018

@latk

This comment has been minimized.

Show comment
Hide comment
@latk

latk Oct 9, 2018

Member

Thanks, this is great 👍 I really like this solution.

I think that while your extra links for i in `seq 1 100`; do ln -s . loop-$$i; done; fail under Windows, this is shell so the failure is silently ignored which lets the test pass 🤷‍♂️ This is not good, but since it's more a performance test than a correctness test, I think this is acceptable. (Note to self: convince pytest to stop hiding stderr output)

One last thing: can you insert your preferred name into the AUTHORS.txt list? I'll merge tomorrow.

Member

latk commented Oct 9, 2018

Thanks, this is great 👍 I really like this solution.

I think that while your extra links for i in `seq 1 100`; do ln -s . loop-$$i; done; fail under Windows, this is shell so the failure is silently ignored which lets the test pass 🤷‍♂️ This is not good, but since it's more a performance test than a correctness test, I think this is acceptable. (Note to self: convince pytest to stop hiding stderr output)

One last thing: can you insert your preferred name into the AUTHORS.txt list? I'll merge tomorrow.

@wjt

This comment has been minimized.

Show comment
Hide comment
@wjt

wjt Oct 9, 2018

Contributor

A fun problem I just noticed is that pytest's test collector has the same bug; if you run the test suite twice without manually cleaning out the symlink farm, then scanning for tests the second time takes forever. pytest-dev/pytest#624

Sigh. I can just amend the Makefile to only create one cycle. This still makes test collection take longer the second and subsequent times, but at least it's measured in milliseconds! On my system, with just one link cycle, runtime of the linked tests (including longer collection time) across 5 runs is 2.597+/-0.3141 seconds with the fix, and 2.985+/-0.3226 seconds without it.

Contributor

wjt commented Oct 9, 2018

A fun problem I just noticed is that pytest's test collector has the same bug; if you run the test suite twice without manually cleaning out the symlink farm, then scanning for tests the second time takes forever. pytest-dev/pytest#624

Sigh. I can just amend the Makefile to only create one cycle. This still makes test collection take longer the second and subsequent times, but at least it's measured in milliseconds! On my system, with just one link cycle, runtime of the linked tests (including longer collection time) across 5 runs is 2.597+/-0.3141 seconds with the fix, and 2.985+/-0.3226 seconds without it.

search_file: don't visit the same directory more than once
The directory tree being scanned may in fact be a directed graph, with
cycles, due to following symbolic links. The kernel imposes a limit on
the number of symbolic links followed in a path, beyond which operations
fail with ELOOP. (According to path_resolution(7), the limit is 40 links
on current Linux.) If the graph is very branchy, it may take a very long
time to fail.

For a real-world example, the Meson build system uses gcovr to provide a
'ninja coverage-xml' target. The systemd project (which uses Meson) has
a test suite which generates a fake sysfs tree, with many circular
references, including 64 self-referential ttyX directories:

    $ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
    lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
    $ ls -l build/test/sys/devices/virtual/tty/tty1
    total 12
    -rw-r--r-- 1 wjt wjt    4 Oct  9 12:16 dev
    drwxr-xr-x 2 wjt wjt 4096 Oct  9 12:16 power
    lrwxrwxrwx 1 wjt wjt   21 Oct  9 12:16 subsystem -> ../../../../class/tty
    -rw-r--r-- 1 wjt wjt   16 Oct  9 12:16 uevent
    $ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
    lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
    $ readlink -f build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
    /home/wjt/src/endlessm/systemd/build/test/sys/devices/virtual/tty/tty1

Note that each link refers to a directory which contains all the fake
tty trees again. 64 ** 40 is a very large number! On our build server, I
cancelled the build after 16 hours.

Even in less pathological cases, since search_file() only yields
absolute paths, there's no need to visit a file via more than one path.
Maintain a set of (device, inode) pairs for each directory visited, and
skip processing any directory that's been visited before.

@latk latk merged commit bd456f5 into gcovr:master Oct 11, 2018

4 checks passed

codecov/patch 100% of diff hit (target 94.22%)
Details
codecov/project 94.29% (+0.07%) compared to 6c0d81f
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

latk added a commit that referenced this pull request Oct 11, 2018

@wjt wjt deleted the wjt:search_file-only-visit-paths-once branch Oct 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment