Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config-tethered Racket fails to start, Argument list too long #4133

Open
LiberalArtist opened this issue Jan 18, 2022 · 16 comments
Open

Config-tethered Racket fails to start, Argument list too long #4133

LiberalArtist opened this issue Jan 18, 2022 · 16 comments

Comments

@LiberalArtist
Copy link
Contributor

I have a config-tethered Racket installation where racket is an ELF starter executable. It fails to start, like this:

$ ./racket 
/gnu/store/3vrh2zbkl550apv8rv3nsr3nrxlc9mpf-racket-8.3.900/bin/racket: failed to start /gnu/store/3vrh2zbkl550apv8rv3nsr3nrxlc9mpf-racket-8.3.900/bin/racket (Argument list too long)

Using file * | grep ELF reports that racket, mzscheme, and mred are ELF files, as opposed to sh launcher scripts. Trying to run mzscheme fails in exactly the same way as racket, while mred fails with an even more perplexing error:

$ ./mred
open-input-file: cannot open module file
  module path: racket/base
  path: /gnu/store/3vrh2zbkl550apv8rv3nsr3nrxlc9mpf-racket-8.3.900/lib/racket/pkgs/draw-lib/racket/base.rkt
  system error: no such file or directory; rkt_err=3

On the other hand, all of the launcher sh scripts I've tried, including drracket, raco, and gracket, seem to be working fine.

There are many non-Racket things which might be to blame, but any hints about how things might go wrong with starter executables would be much appreciated. I'm under the impression that they embed command-line arguments in themselves, though I don't understand exactly how they work.

In a bit more detail, the problematic racket is from a "main distribution" layer which chains to a "minimal Racket" layer (with base and racket-lib), which in turn chains to a "vm" layer, with no packages, based on 8.3.900 (CS). The rackets in the other layers seem to work fine: the problem seems to begin with the "main distribution" layer.

In even more detail, this problem came up while I was experimenting with Guix's Racket packaging. The package definitions are on my fork here, particularly in gnu/packages/chez-and-racket-bootstrap.scm and gnu/packages/racket.scm. I've packed the build results into a tarball at https://philipmcgrath.com/tmp/4rmfs0mgavgbis6bdygi7igzwnymxdmp-racket-tarball-pack.tar.gz (n.b. it is 573 MiB gzipped): if unpacked at the root of any x86_64-linux system, you should be able to run (or not run, as the case may be) /opt/guix-pack-profile/bin/racket and find the paths to the other layers in /opt/guix-pack-profile/etc/racket/config.rktd. (The contents can of course be inspected anywhere.)

Alternatively, on a system with Guix installed (generic instructions), you could run:

guix time-machine --url=https://gitlab.com/philip1/guix-patches.git --commit=462ec050da800498d9797451d59089c62adb58bf --disable-authentication -- build racket --root=rkt

to create rkt as a symlink to a directory much like /opt/guix-pack-profile/ in the tarball, or use:

guix time-machine --url=https://gitlab.com/philip1/guix-patches.git --commit=462ec050da800498d9797451d59089c62adb58bf --disable-authentication -- pack --save-provenance -S /opt/guix-pack-profile=. racket --root=guix-pack-racket.tar.gz

to build a tarball equivalent to the above. On Debian stable, this should be enough to be able to run those commands:

sudo apt install guix
GUIX_PROFILE="$HOME/.config/guix/current"
. "$GUIX_PROFILE/etc/profile"
hash guix
@mflatt
Copy link
Member

mflatt commented Jan 18, 2022

It looks like the racket launcher may be configured to start itself. (That would explain why the command line eventually gets too long, since each start of itself will add to the command line.) Could that be a problem with two-layer tethering in general, or is it specific to something else in the Guix setup?

P.S. — To check what the launcher runs, searching backwards in the executable for racket will probably find the path. The path and embedded command line are at the end of the .rackprog section, and readelf --hex-dump=".rackprog" will show that specific section. But the section also has compiled Racket code first, and it's most of the executable, anyway.

@mflatt
Copy link
Member

mflatt commented Jan 18, 2022

A guess about the ./mred error: Something goes wrong with the startup so that the process cannot find the main collects directory, but it can find installed packages. The draw-lib package happens to have a racket collection, and the module loader happens to pick that package's directory to mention when complaining that base.rkt cannot be found.

@LiberalArtist
Copy link
Contributor Author

Thanks for the ideas! They led me to at least one problem—the lib-search-dirs were being duplicated, including ancestor layers and #f multiple times—but fixing that hasn't resolved the overall problem. I also haven't yet managed to reproduce without Guix. (I've put an attempt at https://github.com/LiberalArtist/racket-issue-4133.)

Inspecting the racket executable, I see that it is invoking itself, as you thought. Interestingly, it passes itself an -X with the correct path to the main collects directory, but the mred executable, while calling the correct original gracket, doesn't seem to supply -X:

  0x00003fc0 652d7661 72696162 6c65210c 260c2600 e-variable!.&.&.
  0x00003fd0 0c260c26 0a2e2e2f 2e2e2f6d 676b7778 .&.&.../../mgkwx
  0x00003fe0 7833716a 616b6830 66316279 76616b6e x3qjakh0f1byvakn
  0x00003ff0 76386d6e 37397261 6e647a2d 7261636b v8mn79randz-rack
  0x00004000 65742d76 6d2d6373 2d382e33 2e393030 et-vm-cs-8.3.900
  0x00004010 2f6f7074 2f726163 6b65742d 766d2f6c /opt/racket-vm/l
  0x00004020 69622f67 7261636b 65740000 2d47002f ib/gracket..-G./
  0x00004030 676e752f 73746f72 652f3237 35726134 gnu/store/275ra4
  0x00004040 63686d78 696a7377 64376b64 39723764 chmxijswd7kd9r7d
  0x00004050 6a773732 6e327872 6a6a2d72 61636b65 jw72n2xrjj-racke
  0x00004060 742d382e 332e3930 302f6574 632f7261 t-8.3.900/etc/ra
  0x00004070 636b6574 2f002d49 00736368 656d652f cket/.-I.scheme/
  0x00004080 6775692f 696e6974 00                gui/init.

One difference from previous ways of building Racket is that I've been invoking the original racket executable (with -G) first to do a raco setup, then again to do the raco pkg install that produces launchers (e.g. drracket).

I think I remember that the command line is just written into a static string in the binary, but, if it were embedded in compiled Racket code, I know there have been problems in the past from Guix trying to patch paths (or not finding paths it expects to be able to patch).

Does the launcher executable exec the original, or something more complicated? I've gotten a bit lost in trying to understand how launcher executables actually work. I've been curious anyway because Guix improved handling of shared libraries in December: I doubt it's relevant to this, but I've wondered if setting the RUNPATH on the launcher executable could be an alternative way for finding the libraries for ffi-lib.

@LiberalArtist
Copy link
Contributor Author

With further experimentation, it appears it is indeed the final raco pkg install step that somehow rewrites the racket binary incorrectly. I'll try to reproduce more minimally.

LiberalArtist added a commit to LiberalArtist/racket-issue-4133 that referenced this issue Jan 25, 2022
@LiberalArtist
Copy link
Contributor Author

The example at #4133 now (as of commit cbf1bed) reproduces the issue without Guix. Just adding launcher with racket-launcher-{names,libraries} wasn't enough, but installing in a layer a package that depends on gui-lib produces a racket starter executable that tries to start itself.

@LiberalArtist
Copy link
Contributor Author

The associated mred also has the same issue of having a correct -G flag and the correct original executable, but no -X flag.

@LiberalArtist
Copy link
Contributor Author

As a workaround, skipping the separate raco setup step seems to avoid the issue with racket, but it doesn't solve the problem with mred.

The working racket executable, in addition to correctly supplying -G and -X flags, patches the embedded coNFIg dIRECTORy to the absolute path to the original config directory, but mred leaves it as coNFIg dIRECTORy:../etc. Both executables contain coLLECTs dIRECTORy:../collects.

@mflatt
Copy link
Member

mflatt commented Jan 25, 2022

@LiberalArtist
Copy link
Contributor Author

Yes, I can confirm that just replacing:
https://github.com/racket/gui/blob/adb9a995cff267be13f84a89b4f987658508ecb8/gui-lib/mred/installer.rkt#L74-L75
with:

(define (config-flags)
  (list "-X" (path->string (find-collects-dir)) "-G" (path->string (find-config-dir))))

is enough to be able to run mred.

(I haven't looked into whether there's anything else around there that also ought to be repaired.)

On the other hand, it appears that skipping the separate raco setup as I suggested above, while it does build a working racket, ends up not building a raco launcher in the main-distribution layer.

@LiberalArtist
Copy link
Contributor Author

I think the problem with racket is that the call to find-exe here (or maybe some similar call):

(let ([exe (find-exe #:cross? #t #:untethered? #t mred? variant)])

finds the path to the racket from that main distribution layer, if one exists, despite being called with #:untethered? #f.

@LiberalArtist
Copy link
Contributor Author

I think we need to do something here:

(append
(->list (and (not untethered?)
(find-addon-tethered-console-bin-dir)))
(->list (and (not untethered?)
(find-config-tethered-console-bin-dir)))
(get-console-bin-search-dirs))))

to account for the fact that the result of find-{addon,config}-tethered-console-bin-dir may also be included in the list returned by (get-console-bin-search-dirs).

LiberalArtist added a commit to LiberalArtist/racket-issue-4133 that referenced this issue Jan 25, 2022
@LiberalArtist
Copy link
Contributor Author

In LiberalArtist/racket-issue-4133@4ab1e4f, I've adjusted the config.rktd files to set bin-dir and the intermediate entries in bin-search-dirs to placeholder directories (which are not used) rather than config-tethered-console-bin-dir. That gets racket working successfully.

mflatt added a commit to racket/gui that referenced this issue Jan 31, 2022
@mflatt
Copy link
Member

mflatt commented Jan 31, 2022

I've pushed the change to mred/installer.rkt.

For the other problem, we could change https://docs.racket-lang.org/raco/tethered-install.html where its says "The 'bin-dir and 'gui-bin-dir configurations can point to the same directories" to say that they should not point to the same directories. (I think the original text said that, and then later I forgot why it mattered.) But maybe it's better to change find-exe to specifically exclude the tethered directories when it's not supposed to include them?

@LiberalArtist
Copy link
Contributor Author

LiberalArtist commented Feb 5, 2022

I've pushed the change to mred/installer.rkt.

Thanks! I've switched to building with racket/gui@563c684, and it seems to be working.

For the other problem, we could change https://docs.racket-lang.org/raco/tethered-install.html where its says "The 'bin-dir and 'gui-bin-dir configurations can point to the same directories" to say that they should not point to the same directories. (I think the original text said that, and then later I forgot why it mattered.) But maybe it's better to change find-exe to specifically exclude the tethered directories when it's not supposed to include them?

I'm still thinking this through (and trying to remember what I thought about this before). I tracked down our previous conversation about bin-dir and the tethered variants to #3834: I remember I'd made some note about thoughts at the time, but I haven't found them yet.

Concretely, as far as I can tell, it doesn't cause any problems for my use case to define bin-dir etc. to point to ignored directories, rather than the -tethered- directories: that's what I'm doing now as a workaround. When I wrote #4133 (comment), though, I'd been thinking of something like adjusting:

(define bases (if mred?
(append
(->list (and (not untethered?)
(find-addon-tethered-gui-bin-dir)))
(->list (and (not untethered?)
(find-config-tethered-gui-bin-dir)))
(if cross?
(get-cross-lib-search-dirs)
(get-lib-search-dirs)))
(append
(->list (and (not untethered?)
(find-addon-tethered-console-bin-dir)))
(->list (and (not untethered?)
(find-config-tethered-console-bin-dir)))
(get-console-bin-search-dirs))))
to remove the tethered directories, as you suggest (I think). That approach does seem somewhat better, in that having to configure paths to directories that will never be used and may not even exist seems rather confusing.

But something about this doesn't feel entirely satisfying, though I'm not sure what a better approach would be. I'll keep thinking about this.

@LiberalArtist
Copy link
Contributor Author

Concretely, as far as I can tell, it doesn't cause any problems for my use case to define bin-dir etc. to point to ignored directories, rather than the -tethered- directories: that's what I'm doing now as a workaround.

Well, apparently I spoke too soon. As I've continued to work on updating Guix to Racket 8.4, I've discovered that an mzscheme executable (and nothing else) gets created in the bogus-untethered-bin directory for the main-distribution layer. The "minimal Racket" layer doesn't create that directory, much less try to put everything inside it.

Running bogus-untethered-bin/mzscheme --help works fine, but invoking it with no arguments in an empty container produces:

$ /gnu/store/zjrpgiv05v1jw8lq227r5wvi5b7lfc90-racket-8.4/lib/racket/bogus-untethered-bin/mzscheme
Welcome to Racket v8.4 [cs].
standard-module-name-resolver: collection not found
  for module path: (lib "scheme/init")
  collection: "scheme"
  in collection directories:
   /home/philip/.local/share/racket/8.4/collects
   /gnu/store/zjrpgiv05v1jw8lq227r5wvi5b7lfc90-racket-8.4/lib/racket/collects/

revealing that it can find neither the main collections directory nor the configuration directory (which configures installation-name as 8.4-guix). Inspecting the binary likewise shows that -X, -G, and similar are not set.

The things in the actual config-tethered-console-bin directory work fine. For context, in an empty container:

$ /gnu/store/zjrpgiv05v1jw8lq227r5wvi5b7lfc90-racket-8.4/bin/raco pkg show
/gnu/store/2q95bm8k2y21rk37f8p90pd6bkd15xzm-racket-vm-cs-8.4/opt/racket-vm/share/pkgs:
 [none]
/gnu/store/j496nzfja83ic21pyl9n7jvq70v93mpn-racket-minimal-8.4/lib/racket/pkgs:
 Package     Checksum               Source
 racket-lib                         static-link...ket-lib
 [1 auto-installed package not shown]
Installation-wide:
 Package            Checksum             Source
 main-distribution                       static-link...ution
 [200 auto-installed packages not shown]
User-specific for installation "8.4-guix":
 [none]

I'll investigate further. For now, this still works ok: I've just put bogus-untethered-bin in some out-of-the-way location.

@mflatt
Copy link
Member

mflatt commented Feb 27, 2022

I think mzscheme gets created in the bogus directory because "mzscheme-lib/mzscheme/installer.rkt" lacks the exists-in-another-layer? check that "gui-lib/racket/gui/installer.rkt" uses to avoid a useless gracket or mred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants