Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: add Copy #56172

Open
fumin opened this issue Oct 12, 2022 · 19 comments
Open

proposal: os: add Copy #56172

fumin opened this issue Oct 12, 2022 · 19 comments
Labels
Milestone

Comments

@fumin
Copy link

fumin commented Oct 12, 2022

I propose adding the function os.Copy(dst, src string) that does the exact same thing as the cp command.
This function would help simplify these codes in Go itself:

In the wild, there are 91k results for the search https://github.com/search?p=4&q=os.create+%22io.copy%22&type=Code , and one-tenth of them are copying files.
This implies that this function would help 10k codes out there.

Tentative implementation:

// Copy copies src to dst like the cp command.
func Copy(dst, src string) error {
        if dst == src {
                return ErrInvalid
        }

        srcF, err := Open(src)
        if err != nil {
                return err
        }
        defer srcF.Close()

        info, err := srcF.Stat()
        if err != nil {
                return err
        }

        dstF, err := OpenFile(dst, O_RDWR|O_CREATE|O_TRUNC, info.Mode())
        if err != nil {
                return err
        }
        defer dstF.Close()

        if _, err := io.Copy(dstF, srcF); err != nil {
                return err
        }
        return nil
}

Unsettled questions:

  • Should the name be Copy or CopyFile?
  • How can the implementation handle errors from File.Close() in a clean manner without defer?
  • Should it return the number of bytes that were copied like io.Copy does?
@fumin fumin added the Proposal label Oct 12, 2022
@gopherbot gopherbot added this to the Proposal milestone Oct 12, 2022
@mvdan
Copy link
Member

mvdan commented Oct 12, 2022

Would the new file have the same FileMode bits as the existing one?

Would the API be atomic? For example, what would happen if a read or write operation fails halfway through? Would we leave a broken file behind? Existing APIs like os.WriteFile are not atomic, so perhaps the answer is "no need", but that needs to be clear.

Would we attempt to use a hard link if the operating system and filesystem support it? If those are true, it's the most efficient way to make a duplicate of a file, as the contents are not read and written entirely.

cc @stapelberg given his experience with https://pkg.go.dev/github.com/google/renameio
cc @bcmills

@fumin
Copy link
Author

fumin commented Oct 12, 2022

@mvdan Thanks for the questions, according to my current implementation, which based on the above github search seems to be also the most common one, the answers are:

  • The new file would have the same FileMode bits.
  • It would not be atomic.
  • We would not attempt to use a hard link.

These bahaviours seem to be what the cp command does?
Below is the implementation:

func Copy(dst, src string) error {
        srcF, err := Open(src)
        if err != nil {
                return err
        }
        defer srcF.Close()

        info, err := srcF.Stat()
        if err != nil {
                return err
        }

        dstF, err := OpenFile(dst, O_RDWR|O_CREATE|O_TRUNC, info.Mode())
        if err != nil {
                return err
        }
        defer dstF.Close()

        if _, err := io.Copy(dstF, srcF); err != nil {
                return err
        }
        return nil
}

@CAFxX
Copy link
Contributor

CAFxX commented Oct 12, 2022

We would not attempt to use a hard link.

What about FICLONE? Shouldn't it attempt to use that? If not, why not?

@mvdan
Copy link
Member

mvdan commented Oct 12, 2022

We would not attempt to use a hard link.

Personally I think that's a missed opportunity - copying files within the same filesystem is quite common, and you already pointed to one such example, go mod vendor. The operation can do significantly less work if it doesn't need to read and write the entire contents in an io.Copy loop.

@bcmills
Copy link
Member

bcmills commented Oct 12, 2022

A hard link creates two names for the same underlying file — mutations through the second link also affect the contents of the first. I would find that behavior very surprising in os.Copy.

@mvdan
Copy link
Member

mvdan commented Oct 12, 2022

Right, of course. I've used hard links in a few projects of mine and that restriction was fine with me in those cases, but I agree it's surprising behavior for a "copy". Then I guess I want a different API, like "file clone".

@bcmills
Copy link
Member

bcmills commented Oct 12, 2022

@CAFxX

What about FICLONE? Shouldn't it attempt to use that? If not, why not?

Implementing the proposed semantics using the FICLONE ioctl where supported seems fine.

Per https://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html:

If a file write should occur to a shared region, the filesystem must ensure that the changes remain private to the file being written. This behavior is commonly referred to as "copy on write".

@mvdan
Copy link
Member

mvdan commented Oct 12, 2022

Here's an example: git-lfs, which is written in Go, started trying to use FICLONE three years ago: git-lfs/git-lfs#3796

@fumin
Copy link
Author

fumin commented Oct 12, 2022

@mvdan and @CAFxX Thanks for suggesting FICLONE, I concur with @bcmills that it's a good way to do it whenever supported.

@ianlancetaylor ianlancetaylor changed the title proposal: os: add os.Copy proposal: os: add Copy Oct 12, 2022
@carlmjohnson
Copy link
Contributor

carlmjohnson commented Oct 12, 2022

Should it create missing directories? ISTM this is a convenience function, so if it exists at all, it should be convenient.

I also think the name should be os.CopyFile to make it clear at glance that it's not io.Copy and that it doesn't work for directories (or does it?). This matches Python's shutil.copyfile FWIW.

@stapelberg
Copy link
Contributor

stapelberg commented Oct 12, 2022

+1 for adding os.CopyFile, which I have written a couple of times in my own Go programs, too.

+1 also for not making it atomic for consistency with os.WriteFile. We can add a compatible CopyFile to renameio once it was merged.

+1 for using copy-on-write where available, but not hard links.

Below is the implementation:

Nit: you should handle the error that dstF.Close() can return. Given that (*os.File).Close() is idempotent, you can keep the defer and just change return nil to return dstF.Close().

@rittneje
Copy link

rittneje commented Oct 13, 2022

One note on the sample implementation above: if src and dst refer to the same file, it empties the file. Either it should be a no-op or an error. (cp considers it an error.) Note that this is not as simple as comparing src and dst for equality as they could also be hard or symbolic links to the same inode. You have to use os.SameFile.

Also, should it return the number of bytes that were copied like io.Copy does?

By the way, "The new file would have the same FileMode bits." is not entirely true. If the dest file already exists, then its current bits will be preserved. (This is the same as what cp does by default.)

@fumin
Copy link
Author

fumin commented Oct 13, 2022

Thanks for all the great questions, here are my answers. For those that I have no idea, added to the unsettled questions at the top of this issue:

Q: Should it create missing directories?
A: I think it should not, since the cp command does not.

Q: Should the name be Copy or CopyFile?
A: No idea, added to the unsettled questions section at the top of this issue.

Q: Implementation should handle the error that dstF.Close() can return. Given that (*os.File).Close() is idempotent.
A: Agree, but to my understanding File.Close is not idempotent, according to docs,

Close will return an error if it has already been called.

Without idempotence, we'd need to liter every return of error with Close according to the context of whether src or dst has been successfully opened. What is an idiomatic way of doing this?

Q: If src and dst refer to the same file, it empties the file. Either it should be a no-op or an error. (cp considers it an error.)
A: Good point, added this check in the implementation at the top of this issue.

Q: Should it return the number of bytes that were copied like io.Copy does?
A: No idea, added to the unsettled questions section at the top of this issue.

@mvdan
Copy link
Member

mvdan commented Oct 13, 2022

+1 to the name CopyFile. +1 to not creating parent directories.

A duplicate Close call is fine; you simply ignore the second error (inside the defer) because the first error is handled. This is a rather common pattern that we can easily figure out when it comes to the code review, and it shouldn't need discussion here.

I don't think we should return the size. io.Copy is a relatively low-level and generic IO function, and it needs to return the copied size because in some cases it can't be obtained before or after the operation. With files, it's always possible to stat either file before (or after) the call to CopyFile. Similarly, there are other file attributes that could also be reasonable to return, such as the FileMode bits, so I think we should return none.

A few more points popped to mind this morning as I mulled over this issue:

If the destination already exists, we should probably only overwrite the file if it's a regular file. If it's something else, like a directory or a named pipe, we should return an error. cp has more logic here, like placing the file inside the destination if it's a directory, but I don't think we want this much logic.

If the source is not a regular file, I similarly think we should return an error immediately. For directories because we won't support a "recursive copy". For other file types, because lots could go wrong. For example, copying from a named pipe file could hang forever, and it would rarely make sense anyway. Put another way, I think CopyFile should only work with regular files.

If the source file has read-only permission bits and the destination does not exist, CopyFile should presumably succeed. Note that this would require creating the file with read-write permission bits, doing the copy, then calling Chmod to set it to read-only after.

@stapelberg
Copy link
Contributor

stapelberg commented Oct 13, 2022

Q: Should it create missing directories?
A: I think it should not, since the cp command does not.

I agree that “whichever cp does is probably best” is a good approach in general for implementing the proposed function, but I for one would appreciate getting rid of the extra os.MkdirAll(filepath.Dir(fn)) call. Is there a reason behind the cp command’s refusal to create the missing directories?

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Oct 13, 2022

Note that a version of this, ioutil.CopyFile, was proposed and rejected before at #8868. In https://go-review.googlesource.com/c/go/+/1591/comment/e17b245e_0240ea62/ @robpike objected, though the primary objection was to the choice of the io/ioutil package, which does not apply to this proposal.

This doesn't mean we can't accept this somewhat different proposal, I just wanted to note the history.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Oct 13, 2022

os.Copy should not create directories unless we also provide a way to set the mode of any newly created directories. And that would seem to mean adding an argument that very few people would need.

@bcmills
Copy link
Member

bcmills commented Oct 13, 2022

If the source is not a regular file, I similarly think we should return an error immediately.

I think that's ok as a starting point, but we should be careful not to overspecify it so that we can perhaps relax that restriction in the future. (I could see some use-cases for, say, allowing /dev/stdout as an argument to a program that accepts a user-provided path and eventually calls Copy to write to that path.)

@beoran
Copy link

beoran commented Oct 14, 2022

A fileutils package would be more useful, something like github.com/hacdias/fileutils. Of course whether or not this belongs in the standard library or not is another question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

10 participants