-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent multiple collecting of the same file due to recursive symlinks #7355
Prevent multiple collecting of the same file due to recursive symlinks #7355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aklajnert,
-
pytest already has a mechanism to handle duplicate files, see
FSCollector._collectfile
. I think we should enhance it handle symlinks correctly, instead of handling that separately. -
A single
readlink
is actually not enough, as it is possible to have symlinks to symlinks, etc. So I recommend usingpathlib.Path.resolve
. There may be similar functions inpy.path.local
oros.path
, I haven't checked.
Hi @bluetech, thanks for the hints. Fair point about the symlink to symlink case, I'll work on it later. |
Hi @bluetech, I've looked at the The symlink to symlink case has been fixed and covered by tests. |
Given this directory tree
I tried 3 versions: Current master:
i.e. the catastrophic behavior. This PR (commit " Remove the
Looks like it still collects the files more than once (twice in this scenario). This patch: diff --git a/src/_pytest/config/__init__.py b/src/_pytest/config/__init__.py
index 4ff6ce707..62213f27a 100644
--- a/src/_pytest/config/__init__.py
+++ b/src/_pytest/config/__init__.py
@@ -322,7 +322,7 @@ class PytestPluginManager(PluginManager):
self._conftestpath2mod = {} # type: Dict[Any, object]
self._confcutdir = None # type: Optional[py.path.local]
self._noconftest = False
- self._duplicatepaths = set() # type: Set[py.path.local]
+ self._duplicatepaths = set() # type: Set[str]
self.add_hookspecs(_pytest.hookspec)
self.register(self)
diff --git a/src/_pytest/nodes.py b/src/_pytest/nodes.py
index 4c7aa1bcd..b085fca67 100644
--- a/src/_pytest/nodes.py
+++ b/src/_pytest/nodes.py
@@ -595,10 +595,11 @@ class FSCollector(Collector):
keepduplicates = self.config.getoption("keepduplicates")
if not keepduplicates:
duplicate_paths = self.config.pluginmanager._duplicatepaths
- if path in duplicate_paths:
+ resolved_path = str(Path(str(path)).resolve())
+ if resolved_path in duplicate_paths:
return ()
else:
- duplicate_paths.add(path)
+ duplicate_paths.add(resolved_path)
return ihook.pytest_collect_file(path=path, parent=self) # type: ignore[no-any-return] # noqa: F723
result:
This one only collects once, but has a recursion issue. So it seems that the |
Thanks @bluetech. I've noticed that the double collection was gone from my tests after I handled symlink to symlink case, but it seems that my test case isn't as good as I thought since the issue is still there. |
fc3c3b1
to
c875abe
Compare
Hi @bluetech, I've changed the Here's the output with my changes in
With changes in
As you see, the difference in execution time is huge. I've tried to profile it - it seems like too many collected directories are causing the slowdown (cProfile snapshots: snapshots.zip). Basically, because |
Hi @aklajnert, thanks for the update, I think I see why you changed Is there a chance you'll be interested to do a little refactoring in this area as well? In particular what you said "maybe os.walk() could help here?" -- I think we should really try to move away from I imagine we can't use
|
While I definitely agree with @bluetech's suggestion of moving away from |
Hi @bluetech - I'll be happy to work on the refactor, but I agree with @nicoddemus that this should be done in a separate PR. I'll work on it when I find some time (probably within a week or two). |
I still haven't dug into the collection code but it occurs to me that the strategy of completely resolving all symlinks and deduplicating them can be problematic. Consider for example the following test tree: $ tree top/
top/
├── a
│ ├── conftest.py
│ ├── __init__.py
│ └── test_it.py
├── b
│ ├── conftest.py
│ ├── __init__.py
│ └── test_it.py -> ../a/test_it.py
└── __init__.py With the following file contents: # top/a/conftest.py
import pytest
@pytest.fixture
def conftest_fixture():
return "a" # top/b/conftest.py
import pytest
@pytest.fixture
def conftest_fixture():
return "b" # top/a/test_it.py
def test_it(conftest_fixture):
print(conftest_fixture) This symlinks the same file to multiple modules, but the outcome is different because module-level fixtures. Before this PR the output is $ pytest -s -q top/
a
.b
.
2 passed in 0.02s After: $ pytest -s -q top/
a
.
1 passed in 0.02s This example is a little convoluted but I can imagine real use cases for this and it's seems legitimate. |
Hi @bluetech - fair point. The example indeed seems a bit convoluted, but I see it could be useful in some cases. Btw, your comment seems a bit broken here, as if you have pasted something twice. |
Oops, I fixed the comment now. |
0702b25
to
479d048
Compare
Hi @bluetech - I'm sorry that I couldn't work on it earlier. I've tried to rebase and work on it today but I've noticed that you did the refactor mentioned here in #7541. Now, I'm not sure if the PR should be removed completely, or should we address your observations from the comment above. |
Oh right I should have mentioned it back here. I did it as part of the effort to remove py.path.local.
Yes I don't remember any change that intentionally changes things here, but presumably it is #7541. I will check this when I get the chance.
Maybe related to #5965 / #6523? I procrastinated on this PR over checking what changed in that PR.
In light of your comments I will also check what happens in the scenario I described in the comment in master. Depending on that we can merge your tests to solidify that they won't regress. |
Hi @bluetech - I've just noticed that this is still open. Should I close it, or you'd like to merge these tests? |
Hi @aklajnert, I obviously dropped the ball here and I don't even remember the exact details anymore, sorry about that. Looking at the tests, they do look good to me (if they do indeed pass in main...). So if you could:
Then I think we can merge this. |
8dabbae
to
4e43db6
Compare
@bluetech - seems like the bug is back according to my tests, or I've messed something up. I don't have time to investigate it now, but I'll look into that later. |
4e43db6
to
0b527fa
Compare
0b527fa
to
569dfaf
Compare
0042a8f
to
be245dc
Compare
Fixes: #624.
The following directory structure will now be collected correctly.