Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect directory loops #17

Open
israel-lugo opened this issue Sep 9, 2017 · 3 comments
Open

Detect directory loops #17

israel-lugo opened this issue Sep 9, 2017 · 3 comments
Assignees

Comments

@israel-lugo
Copy link
Owner

Now that find_duplicates_in_firs has the follow_dirlinks parameter (see #16), we need a way to detect symlink loops. If there is a symlink pointing to ., or to a parent directory, we will go in a loop.

Fortunately, os.walk() seems to stop after several levels of recursion. But still, it's probably undefined behavior.

See what commands like find or rsync do.

@israel-lugo israel-lugo changed the title Detect symlink loops Detect directory loops Sep 10, 2017
@israel-lugo
Copy link
Owner Author

israel-lugo commented Sep 10, 2017

We ended up detecting any kind of directory loop, not just those caused by symlinks. Should never happen without symlinks, the filesystem itself may be corrupted somehow (e.g. directory hardlinks).

@israel-lugo israel-lugo self-assigned this Sep 10, 2017
israel-lugo added a commit to israel-lugo/capidup-cli that referenced this issue Sep 10, 2017
Not all errors mean files couldn't be compared. See e.g.
israel-lugo/capidup#17 (detect directory loops).
israel-lugo added a commit that referenced this issue Sep 14, 2017
We were detecting loops related to symlinks even when follow_dirlinks
was false, i.e. when the symlinks would be irrelevant.
@israel-lugo
Copy link
Owner Author

We still have false positives, when there is a (forward) symlink to a subdir. For example, given the following file tree:

testroot
├── child
│   └── foo2
├── foo
├── same-child -> child
└── unique

we have the following output:

error listing 'testroot/same-child': directory loop detected
testroot/child/foo2
testroot/foo
------------------------------
error: some files/directories could not be compared/indexed (see previous errors)

When in reality we should have output like this:

$ ./capidup testroot
testroot/child/foo2
testroot/same-child/foo2
testroot/foo
------------------------------

@israel-lugo israel-lugo reopened this Sep 15, 2017
israel-lugo added a commit that referenced this issue Sep 15, 2017
Mark the current directory as visited before inspecting the subdirs.
This lets us catch symlinks to it immediately, instead of after entering
one level of loop. See #17.
@israel-lugo
Copy link
Owner Author

We still have false positives. This, for example:

testroot/
├── child1
│   └── foo1
├── child2
│   ├── brother -> ../child1
│   └── foo2
├── foo0
└── unique

Produces a false positive when we enter child2/brother, as we've already visited child1:

$ ./capidup -L testroot
error listing 'testroot/child2/brother': directory loop detected
testroot/child1/foo1
testroot/child2/foo2
testroot/foo0
------------------------------
error: some files/directories could not be compared/indexed (see previous errors)

This isn't a loop. Compare e.g. with find:

$ find -L testroot
testroot
testroot/foo0
testroot/unique
testroot/child2
testroot/child2/brother
testroot/child2/brother/foo1
testroot/child2/foo2
testroot/child1
testroot/child1/foo1

Do we need to go and implement a graph and strongly connected components? :/

@israel-lugo israel-lugo reopened this Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant