New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement readlink -f: a (portable) way to find resources relative to the current script, even if invoked through a symlink #96

Closed
andychu opened this Issue Apr 18, 2018 · 19 comments

Comments

Projects
None yet
2 participants
@andychu
Contributor

andychu commented Apr 18, 2018

Suggested by skissane through e-mail:

https://stackoverflow.com/questions/59895/getting-the-source-directory-of-a-bash-script-from-within

On Linux I use this:

readonly THIS_DIR=$(cd $(dirname $0) && pwd)
$THIS_DIR/other_script.sh
cat $THIS_DIR/other_file.txt

IIRC this doesn't work correctly with symlinks -- you need readlink too. However readlink isn't portable to Mac OS X.

We could either build in something like readlink (which isn't too far out of bounds for shell) or provide a simpler, portable solution, e.g. $OIL_SCRIPT_DIR (and perhaps $OIL_SCRIPT_PATH)

Similar to $BASH_SOURCE.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 18, 2018

Contributor

So I think the key to making this portable is the -P flag to pwd, which @BatmanAoD just added? @skissane, does that solve your problem?

I hadn't used -P before, but it has the benefit of being portable not only from Linux to Mac, but also bash and now osh.

readonly THIS_DIR=$(cd $(dirname $0) && pwd -P)

It might be nice to provide a shorter osh/oil-specific variable for this, but I think it should be defined in terms of this portable snippet (assuming it's correct).

Contributor

andychu commented May 18, 2018

So I think the key to making this portable is the -P flag to pwd, which @BatmanAoD just added? @skissane, does that solve your problem?

I hadn't used -P before, but it has the benefit of being portable not only from Linux to Mac, but also bash and now osh.

readonly THIS_DIR=$(cd $(dirname $0) && pwd -P)

It might be nice to provide a shorter osh/oil-specific variable for this, but I think it should be defined in terms of this portable snippet (assuming it's correct).

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 18, 2018

Contributor

Sadly pwd -P is not terribly helpful when the script itself is a symlink, since it only dereferences symlinked directories, not text files.

Can GNU's realpath be compiled on OS X?

Contributor

BatmanAoD commented May 18, 2018

Sadly pwd -P is not terribly helpful when the script itself is a symlink, since it only dereferences symlinked directories, not text files.

Can GNU's realpath be compiled on OS X?

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 18, 2018

Contributor

OK right, it has to dereference and then dirname, not the other way around! It should be:

readonly THIS_DIR=$(cd $(dirname $(readlink --canonicalize $0)) && pwd)

which is a mouthful... and starts three processes (used to be 4 before we had the pwd builtin).

But I think that is the best way to make something portable between bash and osh? I think readlink -f / --canonicalize is within scope for osh. Possibly with some option to remove it from the "PATH".

Does that sound reasonable?

Contributor

andychu commented May 18, 2018

OK right, it has to dereference and then dirname, not the other way around! It should be:

readonly THIS_DIR=$(cd $(dirname $(readlink --canonicalize $0)) && pwd)

which is a mouthful... and starts three processes (used to be 4 before we had the pwd builtin).

But I think that is the best way to make something portable between bash and osh? I think readlink -f / --canonicalize is within scope for osh. Possibly with some option to remove it from the "PATH".

Does that sound reasonable?

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 18, 2018

Contributor

@andychu readlink and realpath actually solve the whole problem on their, with no cd necessary! The cd tricks are only for POSIX sh where readlink and realpath aren't available.

So where readlink is available, this is equivalent to what you have above:

readonly THIS_DIR="$(dirname "$(readlink --canonicalize $0)")"

Still two processes, but no cd or pwd.

Contributor

BatmanAoD commented May 18, 2018

@andychu readlink and realpath actually solve the whole problem on their, with no cd necessary! The cd tricks are only for POSIX sh where readlink and realpath aren't available.

So where readlink is available, this is equivalent to what you have above:

readonly THIS_DIR="$(dirname "$(readlink --canonicalize $0)")"

Still two processes, but no cd or pwd.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 18, 2018

Contributor

Ah OK, that makes sense. I think I started using the cd / pwd thing because of OS X. And then I stopped writing shell scripts on OS X, because it was annoying, but I never went back to using readlink!

I will change the title since I think implementing readlink is the right fix. But then we still need a way to disable it in favor of the readlink in $PATH.

command and builtin change the lookup rules, or we could have a special $PATH entry perhaps. Although that could confuse other shells on the same machine that are also looking at $PATH !

Contributor

andychu commented May 18, 2018

Ah OK, that makes sense. I think I started using the cd / pwd thing because of OS X. And then I stopped writing shell scripts on OS X, because it was annoying, but I never went back to using readlink!

I will change the title since I think implementing readlink is the right fix. But then we still need a way to disable it in favor of the readlink in $PATH.

command and builtin change the lookup rules, or we could have a special $PATH entry perhaps. Although that could confuse other shells on the same machine that are also looking at $PATH !

@andychu andychu changed the title from Provide a (portable) way to find resources relative to the current script, even if invoked through a symlink to Implement readlink -f: a (portable) way to find resources relative to the current script, even if invoked through a symlink May 18, 2018

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 18, 2018

Contributor

If it's going to be a builtin, maybe we should implement a version of realpath with no arguments, instead? That avoids the necessity of providing the other modes provided by readlink. It's still true that we should defer to the installed system version if available, I think.

Contributor

BatmanAoD commented May 18, 2018

If it's going to be a builtin, maybe we should implement a version of realpath with no arguments, instead? That avoids the necessity of providing the other modes provided by readlink. It's still true that we should defer to the installed system version if available, I think.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 19, 2018

Contributor

OK I'm not familiar with realpath. It looks like they are both in coreutils:

$ dpkg -S realpath
coreutils: /usr/bin/realpath

$ dpkg -S readlink
coreutils: /bin/readlink

If I do realpath --help on my Ubuntu machine it looks like it has slightly fewer options. It says that all but the last component must exist.

I don't have a strong opinion here. I said readlink -f because I have seen it a lot more in practice.

Contributor

andychu commented May 19, 2018

OK I'm not familiar with realpath. It looks like they are both in coreutils:

$ dpkg -S realpath
coreutils: /usr/bin/realpath

$ dpkg -S readlink
coreutils: /bin/readlink

If I do realpath --help on my Ubuntu machine it looks like it has slightly fewer options. It says that all but the last component must exist.

I don't have a strong opinion here. I said readlink -f because I have seen it a lot more in practice.

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 19, 2018

Contributor

Well, any script that uses readlink is relying on coreutils being installed.

This blog claims that BSD Coreutils are available for Mac OS X, so maybe availability isn't too much of an issue after all? https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/

I think Oil should definitely provide a built-in way to dereference symlinks, because that seems like a pretty common operation that should be simple; but I suppose it may not be as important for osh, since it seems the general strategy for most existing shells is to always use coreutils for that functionality.

Contributor

BatmanAoD commented May 19, 2018

Well, any script that uses readlink is relying on coreutils being installed.

This blog claims that BSD Coreutils are available for Mac OS X, so maybe availability isn't too much of an issue after all? https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/

I think Oil should definitely provide a built-in way to dereference symlinks, because that seems like a pretty common operation that should be simple; but I suppose it may not be as important for osh, since it seems the general strategy for most existing shells is to always use coreutils for that functionality.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 19, 2018

Contributor

Yeah, in general I want to defer to external utilities unless there's a good reason to fold things in. find/xargs is the canonical example because find is a language and xargs starts processes.

And I think that being able to find resources relative to a script's realpath is a pretty core piece of functionality. Logically it does seem to belong within the shell. You can't "source" properly without that.

But there is the issue of confusion with external utilities. We could provide it under yet another name, although I'm not sure that solution is deal either.

It could also be opt in somehow, with a set -o flag, avoiding the $PATH issue.

The advantage of a separate name is not having to emulate all of readlink / realpath, especially if different versions of those utilities have different flags.

minimal special case hack:

set -o readlink-f
readonly THIS_DIR=$(dirname $(readlink -f $0))
source $THIS_DIR/util.sh
Contributor

andychu commented May 19, 2018

Yeah, in general I want to defer to external utilities unless there's a good reason to fold things in. find/xargs is the canonical example because find is a language and xargs starts processes.

And I think that being able to find resources relative to a script's realpath is a pretty core piece of functionality. Logically it does seem to belong within the shell. You can't "source" properly without that.

But there is the issue of confusion with external utilities. We could provide it under yet another name, although I'm not sure that solution is deal either.

It could also be opt in somehow, with a set -o flag, avoiding the $PATH issue.

The advantage of a separate name is not having to emulate all of readlink / realpath, especially if different versions of those utilities have different flags.

minimal special case hack:

set -o readlink-f
readonly THIS_DIR=$(dirname $(readlink -f $0))
source $THIS_DIR/util.sh
@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 19, 2018

Contributor

Hmm.

One advantage of providing something with a new name is that none of the names so far are actually "right" in my opinion. What's really happening is that the path is being canonicalized. So maybe creating a builtin with a name based on that would be desirable.

canonicalize          # a bit long...
canonical              # better, maybe? possible conflict with apps called "canonical"
canon                   # maybe not intuitive?
canonize               # kind of punny, not sure if that's desirable

Perhaps on startup the shell could check if readlink and realpath are in $PATH, and if not, generate something like the following:

readlink () {
    if [[ "$1" != -f ]]; then
        # some kind of error message and early return...
    fi
    shift
    canonize "$@"
}

alias realpath=canonize
Contributor

BatmanAoD commented May 19, 2018

Hmm.

One advantage of providing something with a new name is that none of the names so far are actually "right" in my opinion. What's really happening is that the path is being canonicalized. So maybe creating a builtin with a name based on that would be desirable.

canonicalize          # a bit long...
canonical              # better, maybe? possible conflict with apps called "canonical"
canon                   # maybe not intuitive?
canonize               # kind of punny, not sure if that's desirable

Perhaps on startup the shell could check if readlink and realpath are in $PATH, and if not, generate something like the following:

readlink () {
    if [[ "$1" != -f ]]; then
        # some kind of error message and early return...
    fi
    shift
    canonize "$@"
}

alias realpath=canonize
@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 19, 2018

Contributor

Hm actually oil.ovm is already a busybox-style binary. So right now this works:

ln -s oil.ovm true
./true  # the binary now behaves like true

So that is the logical way to do it for readlink.

ln -s oil.ovm readlink

And the user/sys admin would have the option of putting that in their $PATH.

There are many names that are subpoptimal in shell, but osh shouldn't change them. osh is conservative by design, while Oil is supposed to "clean things up" (if that even happens, it doesn't exist yet.)

Contributor

andychu commented May 19, 2018

Hm actually oil.ovm is already a busybox-style binary. So right now this works:

ln -s oil.ovm true
./true  # the binary now behaves like true

So that is the logical way to do it for readlink.

ln -s oil.ovm readlink

And the user/sys admin would have the option of putting that in their $PATH.

There are many names that are subpoptimal in shell, but osh shouldn't change them. osh is conservative by design, while Oil is supposed to "clean things up" (if that even happens, it doesn't exist yet.)

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 20, 2018

Contributor

But you're okay with setting up a fallback when realpath and readlink don't exist (on the path)? That approach seems good to me.

As for the name canonpath (or similar), would that be acceptable for Oil?

Contributor

BatmanAoD commented May 20, 2018

But you're okay with setting up a fallback when realpath and readlink don't exist (on the path)? That approach seems good to me.

As for the name canonpath (or similar), would that be acceptable for Oil?

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 21, 2018

Contributor

I'm not sure how much to leave up to the "system packager" (e.g. Debian) vs. something builtin. Letting the sys admin decide whether to put readlink -> oil.ovm in the $PATH is the most flexible, but it will also lead to some fragmentation. There is already fragmentation of bash configurations across operating systems.

I haven't thought about a new name -- we can cross that bridge when we get to it :) OSH doesn't even work on OS X yet (due to my build system hacks), and Oil still doesn't exist :-(

Contributor

andychu commented May 21, 2018

I'm not sure how much to leave up to the "system packager" (e.g. Debian) vs. something builtin. Letting the sys admin decide whether to put readlink -> oil.ovm in the $PATH is the most flexible, but it will also lead to some fragmentation. There is already fragmentation of bash configurations across operating systems.

I haven't thought about a new name -- we can cross that bridge when we get to it :) OSH doesn't even work on OS X yet (due to my build system hacks), and Oil still doesn't exist :-(

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 21, 2018

Contributor

Oh, oh, I see, that makes sense--so osh wouldn't try to be "smart" about automatically making some kind of limited readlink functionality available, but it would include a minimal implementation for interested parties to use if they desire.

Contributor

BatmanAoD commented May 21, 2018

Oh, oh, I see, that makes sense--so osh wouldn't try to be "smart" about automatically making some kind of limited readlink functionality available, but it would include a minimal implementation for interested parties to use if they desire.

@andychu andychu added the help wanted label May 25, 2018

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 25, 2018

Contributor

Since we more or less decided what OSH should do, I'm tagging this "help wanted". (I've gotten more than one request for small tasks.)

Contributor

andychu commented May 25, 2018

Since we more or less decided what OSH should do, I'm tagging this "help wanted". (I've gotten more than one request for small tasks.)

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD May 26, 2018

Contributor

How much of the interfaces for readlink and/or realpath should we implement?

Contributor

BatmanAoD commented May 26, 2018

How much of the interfaces for readlink and/or realpath should we implement?

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu May 26, 2018

Contributor

I think it should just be the bare minimum to find resources relative to the current script.

So readlink -f would do it. I don't quite understand realpath -- maybe I missed your explanation.

On Linux, nobody would use this because they'll just use coreutils.

On Mac, there will be the extra step of ln -s oil.ovm readlink. And then your script will run portably between Linux and Mac, and bash and osh.

If a Mac user wants more flags for readlink, they should just install coreutils. This is strictly for finding resources in a portable manner.

Does that make sense? If you're interested in doing it, then you can look for true and false in AppBundleMain() in bin/oil.py and make a small new function.

I guess I don't have a great test setup for these new binaries. You could put something in demo/ or maybe spec tests will work.

(demo/ is roughly for stuff that I haven't fully automated.)

Contributor

andychu commented May 26, 2018

I think it should just be the bare minimum to find resources relative to the current script.

So readlink -f would do it. I don't quite understand realpath -- maybe I missed your explanation.

On Linux, nobody would use this because they'll just use coreutils.

On Mac, there will be the extra step of ln -s oil.ovm readlink. And then your script will run portably between Linux and Mac, and bash and osh.

If a Mac user wants more flags for readlink, they should just install coreutils. This is strictly for finding resources in a portable manner.

Does that make sense? If you're interested in doing it, then you can look for true and false in AppBundleMain() in bin/oil.py and make a small new function.

I guess I don't have a great test setup for these new binaries. You could put something in demo/ or maybe spec tests will work.

(demo/ is roughly for stuff that I haven't fully automated.)

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Jun 6, 2018

Contributor

@BatmanAoD It sounded like you were interested in doing this -- if so, are you still interested? I have a few people looking for tasks so I thought I might suggest this.

Contributor

andychu commented Jun 6, 2018

@BatmanAoD It sounded like you were interested in doing this -- if so, are you still interested? I have a few people looking for tasks so I thought I might suggest this.

@BatmanAoD

This comment has been minimized.

Show comment
Hide comment
@BatmanAoD

BatmanAoD Jun 6, 2018

Contributor

I'm not terribly interested in this particular task; it does seem like a good one to suggest for others. Thanks.

Contributor

BatmanAoD commented Jun 6, 2018

I'm not terribly interested in this particular task; it does seem like a good one to suggest for others. Thanks.

andychu pushed a commit that referenced this issue Jun 13, 2018

Initial implementation of readlink -f.
Addresses issue #96.

It uses the Python function os.path.realpath(), which we discovered has
different semantics than C's realpath().

Run demo with:

$ demo/readlink-demo.sh all

Run failing "gold" test with:

$ test/gold.sh readlink-case

- Changed the gold.sh test framework to add $REPO_ROOT/bin to the $PATH
  so we can test Oil's readlink against the one on the system.

Co-authored-by: Nicolas Hahn <nicolas@stonespring.org>

@andychu andychu closed this Jul 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment