Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement readlink -f: a (portable) way to find resources relative to the current script, even if invoked through a symlink #96

Closed
andychu opened this issue Apr 18, 2018 · 19 comments

Comments

@andychu
Copy link
Contributor

andychu commented Apr 18, 2018

Suggested by skissane through e-mail:

https://stackoverflow.com/questions/59895/getting-the-source-directory-of-a-bash-script-from-within

On Linux I use this:

readonly THIS_DIR=$(cd $(dirname $0) && pwd)
$THIS_DIR/other_script.sh
cat $THIS_DIR/other_file.txt

IIRC this doesn't work correctly with symlinks -- you need readlink too. However readlink isn't portable to Mac OS X.

We could either build in something like readlink (which isn't too far out of bounds for shell) or provide a simpler, portable solution, e.g. $OIL_SCRIPT_DIR (and perhaps $OIL_SCRIPT_PATH)

Similar to $BASH_SOURCE.

@andychu
Copy link
Contributor Author

andychu commented May 18, 2018

So I think the key to making this portable is the -P flag to pwd, which @BatmanAoD just added? @skissane, does that solve your problem?

I hadn't used -P before, but it has the benefit of being portable not only from Linux to Mac, but also bash and now osh.

readonly THIS_DIR=$(cd $(dirname $0) && pwd -P)

It might be nice to provide a shorter osh/oil-specific variable for this, but I think it should be defined in terms of this portable snippet (assuming it's correct).

@BatmanAoD
Copy link
Contributor

Sadly pwd -P is not terribly helpful when the script itself is a symlink, since it only dereferences symlinked directories, not text files.

Can GNU's realpath be compiled on OS X?

@andychu
Copy link
Contributor Author

andychu commented May 18, 2018

OK right, it has to dereference and then dirname, not the other way around! It should be:

readonly THIS_DIR=$(cd $(dirname $(readlink --canonicalize $0)) && pwd)

which is a mouthful... and starts three processes (used to be 4 before we had the pwd builtin).

But I think that is the best way to make something portable between bash and osh? I think readlink -f / --canonicalize is within scope for osh. Possibly with some option to remove it from the "PATH".

Does that sound reasonable?

@BatmanAoD
Copy link
Contributor

@andychu readlink and realpath actually solve the whole problem on their, with no cd necessary! The cd tricks are only for POSIX sh where readlink and realpath aren't available.

So where readlink is available, this is equivalent to what you have above:

readonly THIS_DIR="$(dirname "$(readlink --canonicalize $0)")"

Still two processes, but no cd or pwd.

@andychu
Copy link
Contributor Author

andychu commented May 18, 2018

Ah OK, that makes sense. I think I started using the cd / pwd thing because of OS X. And then I stopped writing shell scripts on OS X, because it was annoying, but I never went back to using readlink!

I will change the title since I think implementing readlink is the right fix. But then we still need a way to disable it in favor of the readlink in $PATH.

command and builtin change the lookup rules, or we could have a special $PATH entry perhaps. Although that could confuse other shells on the same machine that are also looking at $PATH !

@andychu andychu changed the title Provide a (portable) way to find resources relative to the current script, even if invoked through a symlink Implement readlink -f: a (portable) way to find resources relative to the current script, even if invoked through a symlink May 18, 2018
@BatmanAoD
Copy link
Contributor

If it's going to be a builtin, maybe we should implement a version of realpath with no arguments, instead? That avoids the necessity of providing the other modes provided by readlink. It's still true that we should defer to the installed system version if available, I think.

@andychu
Copy link
Contributor Author

andychu commented May 19, 2018

OK I'm not familiar with realpath. It looks like they are both in coreutils:

$ dpkg -S realpath
coreutils: /usr/bin/realpath

$ dpkg -S readlink
coreutils: /bin/readlink

If I do realpath --help on my Ubuntu machine it looks like it has slightly fewer options. It says that all but the last component must exist.

I don't have a strong opinion here. I said readlink -f because I have seen it a lot more in practice.

@BatmanAoD
Copy link
Contributor

BatmanAoD commented May 19, 2018

Well, any script that uses readlink is relying on coreutils being installed.

This blog claims that BSD Coreutils are available for Mac OS X, so maybe availability isn't too much of an issue after all? https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/

I think Oil should definitely provide a built-in way to dereference symlinks, because that seems like a pretty common operation that should be simple; but I suppose it may not be as important for osh, since it seems the general strategy for most existing shells is to always use coreutils for that functionality.

@andychu
Copy link
Contributor Author

andychu commented May 19, 2018

Yeah, in general I want to defer to external utilities unless there's a good reason to fold things in. find/xargs is the canonical example because find is a language and xargs starts processes.

And I think that being able to find resources relative to a script's realpath is a pretty core piece of functionality. Logically it does seem to belong within the shell. You can't "source" properly without that.

But there is the issue of confusion with external utilities. We could provide it under yet another name, although I'm not sure that solution is deal either.

It could also be opt in somehow, with a set -o flag, avoiding the $PATH issue.

The advantage of a separate name is not having to emulate all of readlink / realpath, especially if different versions of those utilities have different flags.

minimal special case hack:

set -o readlink-f
readonly THIS_DIR=$(dirname $(readlink -f $0))
source $THIS_DIR/util.sh

@BatmanAoD
Copy link
Contributor

Hmm.

One advantage of providing something with a new name is that none of the names so far are actually "right" in my opinion. What's really happening is that the path is being canonicalized. So maybe creating a builtin with a name based on that would be desirable.

canonicalize          # a bit long...
canonical              # better, maybe? possible conflict with apps called "canonical"
canon                   # maybe not intuitive?
canonize               # kind of punny, not sure if that's desirable

Perhaps on startup the shell could check if readlink and realpath are in $PATH, and if not, generate something like the following:

readlink () {
    if [[ "$1" != -f ]]; then
        # some kind of error message and early return...
    fi
    shift
    canonize "$@"
}

alias realpath=canonize

@andychu
Copy link
Contributor Author

andychu commented May 19, 2018

Hm actually oil.ovm is already a busybox-style binary. So right now this works:

ln -s oil.ovm true
./true  # the binary now behaves like true

So that is the logical way to do it for readlink.

ln -s oil.ovm readlink

And the user/sys admin would have the option of putting that in their $PATH.

There are many names that are subpoptimal in shell, but osh shouldn't change them. osh is conservative by design, while Oil is supposed to "clean things up" (if that even happens, it doesn't exist yet.)

@BatmanAoD
Copy link
Contributor

But you're okay with setting up a fallback when realpath and readlink don't exist (on the path)? That approach seems good to me.

As for the name canonpath (or similar), would that be acceptable for Oil?

@andychu
Copy link
Contributor Author

andychu commented May 21, 2018

I'm not sure how much to leave up to the "system packager" (e.g. Debian) vs. something builtin. Letting the sys admin decide whether to put readlink -> oil.ovm in the $PATH is the most flexible, but it will also lead to some fragmentation. There is already fragmentation of bash configurations across operating systems.

I haven't thought about a new name -- we can cross that bridge when we get to it :) OSH doesn't even work on OS X yet (due to my build system hacks), and Oil still doesn't exist :-(

@BatmanAoD
Copy link
Contributor

Oh, oh, I see, that makes sense--so osh wouldn't try to be "smart" about automatically making some kind of limited readlink functionality available, but it would include a minimal implementation for interested parties to use if they desire.

@andychu
Copy link
Contributor Author

andychu commented May 25, 2018

Since we more or less decided what OSH should do, I'm tagging this "help wanted". (I've gotten more than one request for small tasks.)

@BatmanAoD
Copy link
Contributor

How much of the interfaces for readlink and/or realpath should we implement?

@andychu
Copy link
Contributor Author

andychu commented May 26, 2018

I think it should just be the bare minimum to find resources relative to the current script.

So readlink -f would do it. I don't quite understand realpath -- maybe I missed your explanation.

On Linux, nobody would use this because they'll just use coreutils.

On Mac, there will be the extra step of ln -s oil.ovm readlink. And then your script will run portably between Linux and Mac, and bash and osh.

If a Mac user wants more flags for readlink, they should just install coreutils. This is strictly for finding resources in a portable manner.

Does that make sense? If you're interested in doing it, then you can look for true and false in AppBundleMain() in bin/oil.py and make a small new function.

I guess I don't have a great test setup for these new binaries. You could put something in demo/ or maybe spec tests will work.

(demo/ is roughly for stuff that I haven't fully automated.)

@andychu
Copy link
Contributor Author

andychu commented Jun 6, 2018

@BatmanAoD It sounded like you were interested in doing this -- if so, are you still interested? I have a few people looking for tasks so I thought I might suggest this.

@BatmanAoD
Copy link
Contributor

I'm not terribly interested in this particular task; it does seem like a good one to suggest for others. Thanks.

andychu pushed a commit that referenced this issue Jun 13, 2018
Addresses issue #96.

It uses the Python function os.path.realpath(), which we discovered has
different semantics than C's realpath().

Run demo with:

$ demo/readlink-demo.sh all

Run failing "gold" test with:

$ test/gold.sh readlink-case

- Changed the gold.sh test framework to add $REPO_ROOT/bin to the $PATH
  so we can test Oil's readlink against the one on the system.

Co-authored-by: Nicolas Hahn <nicolas@stonespring.org>
@andychu andychu closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants