New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust determination of full path to current executable #795

Merged
merged 3 commits into from Sep 15, 2016

Conversation

Projects
None yet
8 participants
@xavierleroy
Contributor

xavierleroy commented Sep 5, 2016

The OCaml runtime system has some system-dependent code (Linux, Windows) to recover the full path to the current executable, in ways that are more reliable than searching argv[0] in $PATH.

Previously, this code worked only for paths of 256 characters or less, owing to the static allocation of a buffer.

Some OCaml users informed us that this limitation was cramping their style and that executables with very long names indeed are important to their business. "Surely you must be joking?" questions were answered with utter seriousness.

Always happy to help OCaml users do the weirdest things, I offer in this PR a reimplementation of these parts of the runtime where the buffer for the full path name is allocated dynamically and grows on demand.

As a bonus, special code to determine the path to the executable was added for Mac OS X as well, courtesy of informative StackOverflow posts.

The new code was tested under Linux, but not even compiled under Mac OS X nor under Windows. Caveat emptor.

xavierleroy added some commits Sep 5, 2016

Improve OS-dependent determination of file name for the current execu…
…table

In several places, the old code was limiting this file name to 256 characters at most.  At least one user is bothered by this restriction.

- Change API of caml_executable_name() to return a string dynamically allocated with caml_stat_alloc.
- Update implementation of caml_executable_name() byterun/unix.c and byterun/win32.c to handle file names of (almost) arbitrary length.
- Add special code for MacOS X, not tested yet.

Fun fact of the day: under (some versions of) Linux, lstat() on "/proc/self/exe" returns 0 as length, so we can't use it to determine the size of the name in advance.
Show outdated Hide outdated asmrun/startup.c
exe_name = proc_self_exe;
else
exe_name = caml_search_exe_in_path(exe_name);
caml_sys_init(exe_name, argv);
caml_stat_free(exe_name);

This comment has been minimized.

@yallop

yallop Sep 5, 2016

Member

Won't this (caml_stat_free) lead to a use-after-free error when exe_name is subsequently accessed?

@yallop

yallop Sep 5, 2016

Member

Won't this (caml_stat_free) lead to a use-after-free error when exe_name is subsequently accessed?

This comment has been minimized.

@xavierleroy

xavierleroy Sep 5, 2016

Contributor

Indeed, caml_sys_init doesn't make a copy of exe_name, I missed that. Will fix later.

@xavierleroy

xavierleroy Sep 5, 2016

Contributor

Indeed, caml_sys_init doesn't make a copy of exe_name, I missed that. Will fix later.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 5, 2016

Member

I used the CI machines to check that the compiler builds fine under MacOS, mingw and mscvc machines with the proposed changes. Could you maybe include a test that uses a super-duper-long executable name to actually test the new behavior?

There is a very suspicious CI failure on openbsd32, a segfault while running an innocuous typing test of the testsuite: see "Segmentation fault" in the logs https://ci.inria.fr/ocaml/view/precheck/job/precheck-openbsd-32/2/consoleText . I don't know where this comes from, and asked for a new build to see if it is reproducible but the results are not in yet.

Edit: the new openbsd32 build succeeded, so the observed segmentation fault (while testing basic-more/top_level_patterns.ml with ocamlopt) seems to be a transient error. Curiouser and Curiouser, but probably unrelated to the present PR.

Member

gasche commented Sep 5, 2016

I used the CI machines to check that the compiler builds fine under MacOS, mingw and mscvc machines with the proposed changes. Could you maybe include a test that uses a super-duper-long executable name to actually test the new behavior?

There is a very suspicious CI failure on openbsd32, a segfault while running an innocuous typing test of the testsuite: see "Segmentation fault" in the logs https://ci.inria.fr/ocaml/view/precheck/job/precheck-openbsd-32/2/consoleText . I don't know where this comes from, and asked for a new build to see if it is reproducible but the results are not in yet.

Edit: the new openbsd32 build succeeded, so the observed segmentation fault (while testing basic-more/top_level_patterns.ml with ocamlopt) seems to be a transient error. Curiouser and Curiouser, but probably unrelated to the present PR.

@yminsky

This comment has been minimized.

Show comment
Hide comment
@yminsky

yminsky Sep 5, 2016

In case anyone is curious, the crazy organization that runs into this is Jane Street. The problem comes up when we take a large change and break it into a long sequence of small patches, each dependent on the last. This is represented in our code review system as nested directories, one per patch. We don't often cross the 256 character limit, but when someone does, it's annoying.

So, while we're not joking, it doesn't mean we don't appreciate that our request is a little absurd.

Anyway, thanks for the patch!

yminsky commented Sep 5, 2016

In case anyone is curious, the crazy organization that runs into this is Jane Street. The problem comes up when we take a large change and break it into a long sequence of small patches, each dependent on the last. This is represented in our code review system as nested directories, one per patch. We don't often cross the 256 character limit, but when someone does, it's annoying.

So, while we're not joking, it doesn't mean we don't appreciate that our request is a little absurd.

Anyway, thanks for the patch!

asmrun/startup.c: freeing "exe_name" before use
caml_sys_init does not copy its first argument and just keeps the pointer.

(Reported by Jeremy Yallop.)
@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Sep 15, 2016

Contributor

Ping!

  • I fixed the use-after-free bug.
  • I really think the code is good to go.
  • I'd rather not merge my own pull requests.
  • I'd rather not maintain my pull request.

So: please merge in trunk. Thank you.

Contributor

xavierleroy commented Sep 15, 2016

Ping!

  • I fixed the use-after-free bug.
  • I really think the code is good to go.
  • I'd rather not merge my own pull requests.
  • I'd rather not maintain my pull request.

So: please merge in trunk. Thank you.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 15, 2016

Member

Thanks for the ping, merged.

Member

gasche commented Sep 15, 2016

Thanks for the ping, merged.

@gasche gasche merged commit 70d880a into ocaml:trunk Sep 15, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 15, 2016

Contributor

What about adding a non-regression test and a Changes entry?

Contributor

alainfrisch commented Sep 15, 2016

What about adding a non-regression test and a Changes entry?

@xavierleroy xavierleroy deleted the xavierleroy:fix-exe-name branch Sep 15, 2016

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Sep 16, 2016

Member

I don't think a regression test is really necessary. I've added the Changes entry (commit 12bd764).

Member

damiendoligez commented Sep 16, 2016

I don't think a regression test is really necessary. I've added the Changes entry (commit 12bd764).

name = caml_stat_alloc(namelen + 1);
retcode = readlink("/proc/self/exe", name, namelen);
if (retcode == -1) { caml_stat_free(name); return NULL; }
if (retcode <= namelen) break;

This comment has been minimized.

@ianthehenry

ianthehenry Aug 25, 2017

Is this check right? If readlink has to truncate its response, won't retcode == namelen?

@ianthehenry

ianthehenry Aug 25, 2017

Is this check right? If readlink has to truncate its response, won't retcode == namelen?

This comment has been minimized.

@nojb

nojb Aug 25, 2017

Contributor

Yes, but it can very well happen that more bytes than necessary were allocated, so no truncation is necessary, in which case you can have retcode < namelen.

@nojb

nojb Aug 25, 2017

Contributor

Yes, but it can very well happen that more bytes than necessary were allocated, so no truncation is necessary, in which case you can have retcode < namelen.

This comment has been minimized.

@ianthehenry

ianthehenry Aug 25, 2017

Right, but it seems like the only time that we want to break out of the loop is when retcode < namelen -- if retcode == namelen that's a sign that we want to try again with a longer name buffer.

Rephrase: if I'm understanding correctly, I think that this branch is always taken, and we never actually resize the buffer (readlink should never return retcode > namelen).

(The win32 implementation seems to have the right comparison)

@ianthehenry

ianthehenry Aug 25, 2017

Right, but it seems like the only time that we want to break out of the loop is when retcode < namelen -- if retcode == namelen that's a sign that we want to try again with a longer name buffer.

Rephrase: if I'm understanding correctly, I think that this branch is always taken, and we never actually resize the buffer (readlink should never return retcode > namelen).

(The win32 implementation seems to have the right comparison)

This comment has been minimized.

@nojb

nojb Aug 25, 2017

Contributor

Yes, sounds right. In fact, wouldn't it be even better to pass namelen + 1 as third parameter of readlink and leave the comparison as it is ?

@nojb

nojb Aug 25, 2017

Contributor

Yes, sounds right. In fact, wouldn't it be even better to pass namelen + 1 as third parameter of readlink and leave the comparison as it is ?

This comment has been minimized.

@ianthehenry

ianthehenry Aug 25, 2017

Yeah, that is even better.

@ianthehenry

ianthehenry Aug 25, 2017

Yeah, that is even better.

This comment has been minimized.

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

Fixed on trunk, commit e49e6ce, and cherry-picked to 4.05, commit 44c67eb.

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

Fixed on trunk, commit e49e6ce, and cherry-picked to 4.05, commit 44c67eb.

This comment has been minimized.

@nojb

nojb Aug 25, 2017

Contributor

But if we leave the comparison as it is and we break out of the loop it means that retcode <= namelen so that there is space for the terminating 0, right? Using retcode < namelen as exit test seems slightly less efficient because there could be some cases where retcode == namelen but not due to truncation.

@nojb

nojb Aug 25, 2017

Contributor

But if we leave the comparison as it is and we break out of the loop it means that retcode <= namelen so that there is space for the terminating 0, right? Using retcode < namelen as exit test seems slightly less efficient because there could be some cases where retcode == namelen but not due to truncation.

This comment has been minimized.

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

OK, you're both right. I think the clearer code is to have namelen as the size of the buffer and as the 3rd parameter to readlink, and to break when retcode < namelen, which indeed leaves room for the final zero. Pushing this improved fix in a minute...

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

OK, you're both right. I think the clearer code is to have namelen as the size of the buffer and as the 3rd parameter to readlink, and to break when retcode < namelen, which indeed leaves room for the final zero. Pushing this improved fix in a minute...

This comment has been minimized.

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

Commits 7a315bd on trunk and b5610e3.

@xavierleroy

xavierleroy Aug 25, 2017

Contributor

Commits 7a315bd on trunk and b5610e3.

This comment has been minimized.

@ianthehenry

ianthehenry Aug 25, 2017

Oh yeah. That's a lot more clear.

@ianthehenry

ianthehenry Aug 25, 2017

Oh yeah. That's a lot more clear.

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Merge pull request #795 from xavierleroy/fix-exe-name
More robust determination of full path to current executable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment