Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/shfmt: replaces symlinks with regular files #843

Open
benjamineskola opened this issue Apr 6, 2022 · 7 comments · Fixed by #948
Open

cmd/shfmt: replaces symlinks with regular files #843

benjamineskola opened this issue Apr 6, 2022 · 7 comments · Fixed by #948

Comments

@benjamineskola
Copy link

benjamineskola commented Apr 6, 2022

If shfmt is run on a file that is a symbolic link, and writes to that file, it breaks the symbolic link and replaces it with a copy of the original file (formatted accordingly).

$ printf "#!/bin/sh\nif true; then\ntrue\nfi\n" > test.sh
$ ln -s test.sh test2.sh
$ ls -l
.rw-r--r-- 32 ben  6 Apr 17:31 test.sh
lrwxr-xr-x  7 ben  6 Apr 17:31 test2.sh -> test.sh
$ shfmt -i 2 -w test2.sh
$ ls -l
.rw-r--r-- 32 ben  6 Apr 17:31 test.sh
.rwxr-xr-x 34 ben  6 Apr 17:31 test2.sh

Expected behaviour would be to write to the underlying file, preserving the symbolic link.

@mvdan
Copy link
Owner

mvdan commented Apr 7, 2022

I think the expected behavior should rather be to skip symbolic links entirely; otherwise shfmt -w . when there's a symbolic link from foo.sh to bar.sh could result in formatting the same file twice, or causing a data race due to writing to the same path twice concurrently (once shfmt becomes parallelized). There's also other nasty edge cases that symbolic links could cause, such as a symbolic link pointing to a parent directory, meaning that shfmt -w . could potentially overwrite files anywhere in the filesystem, not just under the current directory.

In general, if one runs shfmt over an entire source repository, I assume that the symlink targets would already get formatted, so I think skipping the links will generally be fine. Do you want to send a patch with a test?

@mvdan mvdan changed the title shfmt breaks symlinks if -w is passed cmd/shfmt: replaces symlinks with regular files Apr 7, 2022
@benjamineskola
Copy link
Author

I'm not sure about ignoring symbolic links entirely — if a symbolic link was specified as an explicit parameter then I think it would be surprising for it to be ignored. However, I think you're right that it should be safe to ignore symbolic links when recursing. (I think this is the default for tools like find.)

However, this has made me realise the problem is not limited to symbolic links but also occurs with hard links:

$ printf "#!/bin/sh\nif true; then\ntrue\nfi\n" > test.sh
$ ln test.sh test2.sh
$ stat -f '%i %N' *.sh
58677185 test.sh
58677185 test2.sh
# %i is the inode number; note that these are identical
$ shfmt -w test2.sh
$ stat -f '%i %N' *.sh
58677185 test.sh
58677241 test2.sh
# inode numbers now different

I don't think it's feasible to ignore files with hard links; every hard link is semantically identical, unlike symlinks, so it's not possible just to decide to ignore some and not others; and ignoring every file with multiple hard links seems wrong too. It also seems like it would present the same concern for parallelism.

I think the behaviour I would like (for both symbolic and hard links) is for the file to be written in-place, preserving any links. However I'm not sure how practical that is (something like reading the file, formatting its contents, then rewinding to the beginning of the file and overwriting it?).

On the other hand, I think that breaking hard links (i.e., leaving behind two files) is probably more reasonable than overwriting symlinks, so fixing this behaviour for symlinks and not hard links makes sense.

@mvdan
Copy link
Owner

mvdan commented Apr 7, 2022

Tools like shfmt and gofmt (which shfmt is modelled after) are used on source code, which is generally kept in a VCS like git - and those almost universally do not support hard links, mainly because they aren't portable. So I honestly don't think we need to worry about hard links.

Always formatting files passed explicitly is right - I failed to clarify that I meant the "skip" to only kick in when walking directories.

something like reading the file, formatting its contents, then rewinding to the beginning of the file and overwriting it?

That's an interesting idea that could be more generic to support non-regular files. However:

  1. It makes atomic writes near-impossible. shfmt currently replaces files atomically, so even if it crashes or gets killed, it will never leave a file partially written - which could potentially make the user lose data.

  2. It makes concurrency harder to implement, as we would now also have to track which files we might open more than once. Note that shfmt isn't concurrent right now, but I recently helped make gofmt concurrent, so it's likely that shfmt will be at some point too.

mvdan added a commit that referenced this issue Nov 22, 2022
That is, we skip all symlinks when walking,
but if the tool gets passed a symlink to a shell file directly,
that gets formatted.

There's nothing to fix for the original issue beyond adding a test,
as we concluded that we want the behavior that's already in place.

Fixes #843.
@mvdan mvdan closed this as completed in #948 Dec 2, 2022
mvdan added a commit that referenced this issue Dec 2, 2022
That is, we skip all symlinks when walking,
but if the tool gets passed a symlink to a shell file directly,
that gets formatted.

There's nothing to fix for the original issue beyond adding a test,
as we concluded that we want the behavior that's already in place.

Fixes #843.
@benjamineskola
Copy link
Author

I'm not quite sure if that PR addresses this issue, which was not about walking but about breaking symlinks that are explicitly specified.

You mentioned that symlinks should be ignored when walking, and I agree, but I think the correct behaviour for a symlink that is explicitly specified on the commandline would be to format the link target and preserve the link. (That is, for example, gofmt's behaviour.) Unless this has changed in another commit I think the issue will still exist here?

@mvdan
Copy link
Owner

mvdan commented Dec 3, 2022

No, we haven't changed any behavior with symlinks. I thought we already settled that we shouldn't change the behavior with explicit symlink arguments; see the second half of my response in #843 (comment).

@benjamineskola
Copy link
Author

benjamineskola commented Dec 3, 2022

I agree that shfmt should not attempt to preserve hard links, and nor should it follow symlinks found when recursing; but I think the correct behaviour for explicitly-specified symbolic links is to preserve the link and format its target. (Apologies, I think I complicated matters by bringing up hard links which are a very different issue.)

Currently if a symbolic link is specified on the command line, shfmt will replace it with a formatted copy of the link target; I'd consider this to be data loss. Actually, I think it would be preferable to ignore symlinks even when they are mentioned on the commandline than to replace a symlink with a regular file. (Note for example that git, at least, and I assume other VCSes do track symbolic links.)

It also does not match gofmt's behaviour: as mentioned, gofmt will follow explicitly-specified symlinks and format their target. To repeat my initial example with go instead:

$ printf 'package main; func main() {fmt.Println("hello world")}' > test.go
$ ln -sf test.go test2.go
$ ls -l *.go
.rw-r--r--@ 54 ben  3 Dec 12:47 test.go
lrwxr-xr-x@  7 ben  3 Dec 12:42 test2.go -> test.go
$ gofmt -w test2.go
$ ls -l *.go
.rw-r--r--@ 60 ben  3 Dec 12:47 test.go
lrwxr-xr-x@  7 ben  3 Dec 12:42 test2.go -> test.go

@mvdan
Copy link
Owner

mvdan commented Dec 9, 2022

To be clear, I never really thought about hard links in my previous responses. They're almost never present in VCS repositories, for example.

You're right that gofmt will replace the target. But that's because the way gofmt modifies files is not atomic - it opens the file, reads it, and writes to it in-place. If two gofmt processes run in parallel, they may result in weird parse errors or a corrupted result. I'd also consider replacing the destination of the link to be the wrong behavior, because then shfmt -w foo.sh might modify a file that's in an entirely different directory or filesystem.

Currently if a symbolic link is specified on the command line, shfmt will replace it with a formatted copy of the link target; I'd consider this to be data loss.

We could certainly refuse to format symlinks given as explicit arguments, with the understanding that we would break the symlink otherwise. I'll reopen for that purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants