Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: MkdirAll returns "file exists" error when actual error is "intput/output error" #8283

Closed
gopherbot opened this issue Jun 24, 2014 · 17 comments
Closed

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 24, 2014

by Zachary.Drew:

What does 'go version' print?

go1.3

What steps reproduce the problem?

Difficult to reproduce but easy to understand. The underlying block device is returning
"input/output error". In my case, the "input/output error" is caused
by a failed hardware RAID block device. I am seeing the incorrect error when I try to
create a directory on the file system of the failed block device.

err := os.MkdirAll( "/failed-file-system-mount-point/subdirectory-to-create",
0755 )

What happened?

os.MkdirAll is returning "file exists" error on the mount point of the failed
filesystem:

mkdir /failed-file-system-mount-point: file exists


What should have happened instead?

os.MkdirAll should have returned "input/output error":

mkdir /failed-file-system-mount-point/subdirectory-to-create: input/output error


Please provide any additional information below.

The root cause is that os.MkdirAll calls os.Stat but doesn't properly check the returned
error starting at line 21 in src/pkg/os/path.go. It assumes that the error returned is
"file exists" when it could actually be some other error such as
"input/output error".
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 24, 2014

Comment 1:

Labels changed: added repo-main, release-go1.4.

Status changed to Accepted.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 16, 2014

Comment 2:

I can't follow the diagnosis. Can you get a trace of the failing call under strace -f?
It looks like MkdirAll checked, could not confirm that /mtpt/subdir was an existing
directory (perhaps due to error, but that's okay), called itself recursively to work on
/mtpt, and then that call returned an error. That error seems to be 'file exists'. So it
seems to have nothing to do with /mtpt/subdir and only with /mtpt. Is /mtpt a directory?
Does stat of /mtpt work?
It seems like Stat("/mtpt") is returning an error but Mkdir("/mtpt") is returning
EEXIST. Can you explain how that happens?
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 16, 2014

Comment 3:

Status changed to WaitingForReply.

@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Sep 16, 2014

Comment 4 by Zachary.Drew:

I can't get the strace output at this moment, but perhaps I can better explain.
src/pkg/os/path.go starting at line 21 is making the assumption that if Stat() returns
an error, the error indicates the file already exists. It never checks to see what the
error is, only that the error is not null.
So if Stat() is returning some other error, like "intput/output error" in my case,
instead of returning the error to the caller it proceeds with business as usual.
On line 29 we have the commented assumption "// Doesn't already exist; make sure parent
does." This assumption is wrong. The code never checked that the directory doesn't
exist, only whether or not Stat() on it generates an error.
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 6, 2014

Comment 5:

I still believe the code is correct.
func MkdirAll(path string, perm FileMode) error {
    // If path exists, stop with success or error.
    dir, err := Stat(path)
    if err == nil {
        if dir.IsDir() {
            return nil
        }
        return &PathError{"mkdir", path, syscall.ENOTDIR}
    }
    // Doesn't already exist; make sure parent does.
    i := len(path)
    for i > 0 && IsPathSeparator(path[i-1]) { // Skip trailing path separator.
        i--
    }
    j := i
    for j > 0 && !IsPathSeparator(path[j-1]) { // Scan backward over element.
        j--
    }
    if j > 1 {
        // Create parent
        err = MkdirAll(path[0:j-1], perm)
        if err != nil {
            return err
        }
    }
    // Now parent exists, try to create.
    err = Mkdir(path, perm)
    if err != nil {
        // Handle arguments like "foo/." by
        // double-checking that directory doesn't exist.
        dir, err1 := Lstat(path)
        if err1 == nil && dir.IsDir() {
            return nil
        }
        return err
    }
    return nil
}
The first bit checks to see if Stat can *succeed*. If so, I think you must agree that
path 'exists'. In that case, if it's a directory we're done and if not we return
ENOTDIR. This is a fast path.
Otherwise, we might as well try creating it. The comment says 'doesn't already exist'
and you are correct that that's not strictly true, but it's the right way to think about
what follows. It doesn't matter here that Stat might have failed with an I/O error.
We figure out what the parent of path must be and call Mkdir recursively on that. If
that fails, we stop.
If it succeeds, now we know the parent directory definitely does exist, and we call
Mkdir on the original path. Whatever Mkdir returns (ignoring the Lstat fixup), we return.
In particular, the only way you get EEXIST from this function is if the inner MkdirAll
(inside if j > 1) returns it or if Mkdir itself returns it. Since there has to be a base
case in the recursion, the EEXIST *must* be coming from the call to the Mkdir system
call.
You are right that if Stat returns I/O error then we proceed with 'business as usual'
but that is *exactly* what we should do. We want to see the error that Mkdir gives us,
not the error that Stat gives us.
I will recomment the function but I really think everything here is fine.
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 6, 2014

Comment 6:

CL https://golang.org/cl/151460043 mentions this issue.
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 6, 2014

Comment 7:

This issue was closed by revision 1eea5ca.

Status changed to Fixed.

@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 6, 2014

Comment 8 by Zachary.Drew:

Changing the comment does not fix the issue. Your statement "It doesn't matter here that
Stat might have failed with an I/O error." is wrong. The user does care what the error
is. The caller deserves to know the actually reason the call failed.
We have these two errors, occurring in this order:
stat /failed-file-system-mount-point/subdirectory-to-create: input/output error
mkdir /failed-file-system-mount-point: file exists
The important one, the first one, the one the user wants to know about, the one I want
to know about, the reason the call is actually failing--is being discarded. Yes, an
error is being returned, but it is the wrong error to return.
It would be more accurate to mark this issue "not going to fix" than to changing
comments and claim it's fixed.
As a general design principal, a routine should return the error that caused the routine
to fail. Do you agree? Yes, technically, "file exists" is the error that caused the
logic to fail. But the IO Error is what caused the routine to fail and claiming
otherwise is disingenuous.
Further, recursion is beautiful computer science and all, but is it really appropriate
here? Looks to me like it's recursion for the sake of recursion. We're not descending a
tree here. A filesystem may be a tree, but a file path is a sequence.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 6, 2014

Comment 9:

What caused this routine to fail is that Mkdir failed.  The Stat is irrelevant.
The only way I can make sense of what you are saying is if
Stat("/failed-file-system-mount-point") also fails, presumably with EIO.  It is only
after that that the code would try Mkdir("/failed-file-system-mount-point") which I
assume is failing with EEXIST.  At that point the code will call
Lstat("/failed-file-system-mount-point"), which I guess must *also* be failing with EIO.
 You seem to be suggesting that it should instead try
Mkdir("/failed-file-system-mount-point/subdirectory-to-create") which I guess would fail
with the EIO you want to see.
It's hard for me to get worked about a case where Stat(filename) fails with EIO but
Mkdir(filename) fails with EEXIST.  Why doesn't the Mkdir fail with EIO also?
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 6, 2014

Comment 10 by Zachary.Drew:

You are completely misinterpreting what I am suggesting.
"What caused this routine to fail is that Mkdir failed.  The Stat is irrelevant." <-
This is the bug. The stat isn't irrelevant! The code is not distinguishing between the
error it assumes stat will return (file doesn't exist) and all the other possible errors
stat could return (but it wouldn't know, it never checks!)
Mark it as "not going to fix" or reassign to somebody who cares about returning the
correct error because it matters.
I ran into the bug while writing disk/filesystem stress testing code. Getting the
correct error matters to me as that is the point of the software and is a normal
occurrence since I'm testing over a petabyte a week.
In any case, I long ago stopped using this function in my code.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 6, 2014

Comment 11:

I am saying is that a function called MkdirAll should return an error from Mkdir.
You are saying that it should return an error from Stat.  I don't agree.  We can discuss
which specific Mkdir call should produce an error that should be returned, but it should
definitely be an error from Mkdir.
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 6, 2014

Comment 12 by Zachary.Drew:

Why? What advantage is there to obscuring the errors returned by helper functions? You
say "should" as if this is an established design principle.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 6, 2014

Comment 13:

I suppose it seems obvious to me: MkdirAll should return errors from Mkdir.
Note that if Stat fails, MkdirAll will always call Mkdir.  It takes some sort of special
pleading to say that we should ignore the Mkdir error and instead return the earlier
Stat error.
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 6, 2014

Comment 14 by Zachary.Drew:

That is neither obvious nor correct.
MkdirAll should return whatever error caused it to fail. There should be no "earlier
error." As soon as MkdirAll encounters an error it's not expecting, that is the reason
it failed and that is the error it should return. It should have never even made it to
the Mkdir call.
It is not "special pleading" to expect a function that fails to return the reason it
failed.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 6, 2014

Comment 15:

I think we must agree to disagree.  The Stat call could be omitted entirely from this
function; it's just an optimization.  Then we would have the same result as now.
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 7, 2014

Comment 16 by Zachary.Drew:

I am bordering on speechless. At least mark this issue as "not going to fix" because you
didn't fix it nor have you even demonstrated that  you understand what I am saying.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 7, 2014

Comment 17:

I don't know how I could demonstrate that I understand what you are saying.  I am open
to suggestions.
You keep saying that the function should return the reason that it failed, and you also
say that it should return the error from Stat, but the error from Stat is not the reason
that it failed.  It failed because Mkdir failed.  The error from Stat was only used to
determine which Mkdir call to make.
Changing the status to Unfortunate.

Status changed to Unfortunate.

@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman added a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
The internal comments are not completely precise about
what is going on, and they are causing confusion.

Fixes golang#8283.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/151460043
wheatman added a commit to wheatman/go-akaros that referenced this issue Jun 26, 2018
The internal comments are not completely precise about
what is going on, and they are causing confusion.

Fixes golang#8283.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/151460043
wheatman added a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
The internal comments are not completely precise about
what is going on, and they are causing confusion.

Fixes golang#8283.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/151460043
wheatman added a commit to wheatman/go-akaros that referenced this issue Jul 30, 2018
The internal comments are not completely precise about
what is going on, and they are causing confusion.

Fixes golang#8283.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/151460043
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.