New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when binary is stripped (Assertion error making OSH AUR package) #47

Closed
andychu opened this Issue Oct 13, 2017 · 22 comments

Comments

Projects
None yet
4 participants
@andychu
Contributor

andychu commented Oct 13, 2017

Hm this is because the ./configure step writes out the PREFIX to _tmp/detected-config.sh. If the file isn't installed to that place, it won't run.

The core issue is that a Unix executable doesn't know where it is run from! The executable has to be set as PYTHONPATH in order to find the bytecode.

https://stackoverflow.com/questions/4031672/without-access-to-argv0-how-do-i-get-the-program-name

$>   export OVM_VERBOSE=1
$>   oil.ovm 
ovm_path = ovm_path_buf (/usr/local/bin/oil.ovm)
ovm_path: /usr/local/bin/oil.ovm
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
oil.ovm: Modules/main.c:347: Ovm_Main: Assertion `sts != -1' failed.
Aborted (core dumped)

In the PKGBUILD, I do execute ./configure and make to compile OSH. I do not, however, invoke the install script that ships with OSH. This is because the PKGBUILD convention dictates that you shouldn’t rely on an external script to actually do the placement of binaries on the system.

The PKGBUILD convention prefers that any binaries that need to be installed on the system be placed within a specific subdirectory beneath wherever the PKGBUILD file is sitting (this is referred to as $pkgdir).

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 14, 2017

Contributor

Possible solutions:

  • Instead of finding the .zip file dynamically with PYTHONPATH / open(), we can compile it into the binary.
  • We could use two files instead of a concatenated file. I don't like that distros put stuff in /usr/local/share though? But that means that arch-independent and arch-dependent files are separate, which is good.
Contributor

andychu commented Oct 14, 2017

Possible solutions:

  • Instead of finding the .zip file dynamically with PYTHONPATH / open(), we can compile it into the binary.
  • We could use two files instead of a concatenated file. I don't like that distros put stuff in /usr/local/share though? But that means that arch-independent and arch-dependent files are separate, which is good.
@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 16, 2017

Contributor

I don't follow what you mean by "The executable has to be set as PYTHONPATH in order to find the bytecode". Can you clarify?

Contributor

timetoplatypus commented Oct 16, 2017

I don't follow what you mean by "The executable has to be set as PYTHONPATH in order to find the bytecode". Can you clarify?

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 17, 2017

Contributor

This blog post has some details:

http://www.oilshell.org/blog/2017/05/05.html

But to summarize, I wanted to treat the fact that Oil is written in Python as an implementation detail. So basically I created an app bundle with the compiled Python interpreter and a zip file of all the .pyc files.

However it uses a trick about zip files.. you can simply do this:

cat python_interpreter app.zip > oil.ovm

Then make oil.ovm executable. When it runs, it will run the Python interpreter -- the stuff at the end is ignored. The zip file header lives at the end of the file, so oil.ovm is both a valid ELF file and a valid zip file. And then the Python interpreter has to find ITSELF to open() app.zip!!! This location is set to PYTHONPATH dynamically (i.e. since PYTHONPATH=app.zip is a conventional way of looking inside a .zip file.)

However this is actually a hard problem in Unix!!! There are Linux-specific solutions, but no generic Unix ones. argv[0] can be changed by the caller, so you can't necessarily find it by looking there. (See the stackoverflow link above.)

I need to fix this long term... it might even require ditching this app bundle concept altogether unfortunately.

That said, is there a workaround? Unfortunately I lost your Lobsters private messages. I think you could actually fix it by doing

./configure --prefix /usr  # assume oil.ovm and osh go into `/usr/bin/`
make
# Then don't call install, instead move it into the arch location.  The binary will NOT work
# in this location because --prefix is wrong.
# But then when the user installs it it might work???

Does Arch test the binary in the temporary location, or only in the final installed location?

Contributor

andychu commented Oct 17, 2017

This blog post has some details:

http://www.oilshell.org/blog/2017/05/05.html

But to summarize, I wanted to treat the fact that Oil is written in Python as an implementation detail. So basically I created an app bundle with the compiled Python interpreter and a zip file of all the .pyc files.

However it uses a trick about zip files.. you can simply do this:

cat python_interpreter app.zip > oil.ovm

Then make oil.ovm executable. When it runs, it will run the Python interpreter -- the stuff at the end is ignored. The zip file header lives at the end of the file, so oil.ovm is both a valid ELF file and a valid zip file. And then the Python interpreter has to find ITSELF to open() app.zip!!! This location is set to PYTHONPATH dynamically (i.e. since PYTHONPATH=app.zip is a conventional way of looking inside a .zip file.)

However this is actually a hard problem in Unix!!! There are Linux-specific solutions, but no generic Unix ones. argv[0] can be changed by the caller, so you can't necessarily find it by looking there. (See the stackoverflow link above.)

I need to fix this long term... it might even require ditching this app bundle concept altogether unfortunately.

That said, is there a workaround? Unfortunately I lost your Lobsters private messages. I think you could actually fix it by doing

./configure --prefix /usr  # assume oil.ovm and osh go into `/usr/bin/`
make
# Then don't call install, instead move it into the arch location.  The binary will NOT work
# in this location because --prefix is wrong.
# But then when the user installs it it might work???

Does Arch test the binary in the temporary location, or only in the final installed location?

@icarroll

This comment has been minimized.

Show comment
Hide comment
@icarroll

icarroll Oct 17, 2017

Contributor

Since the executable is loaded into memory, would it be possible to read the zip contents from there, instead of looking back in the file system?

Contributor

icarroll commented Oct 17, 2017

Since the executable is loaded into memory, would it be possible to read the zip contents from there, instead of looking back in the file system?

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 17, 2017

Contributor

Yes definitely, although it's not a trivial change in the build system. It couldn't be a .zip file anymore -- it would have to be some other in-memory format. Well maybe you could mmap() it or something, not sure.

Also, compiling the .pyc into the executable makes building for multiple platforms slower in theory. That is, I wanted an architecture-independent and architecture-dependent part. If you put all the bytecode into a C program and compile it, everything becomes architecture-dependent.

Most software has that problem but I was trying to make Oil more "optimal"... but that created some different problems as you can see!


Also a bit of a tangent about Arch... I noticed here that you are only supposed to retrieve PKGBUILD one at a time, not get the whole thing:

https://wiki.archlinux.org/index.php/Arch_Build_System#Retrieve_PKGBUILD_source_using_Svn

I wonder if there is a way to get a copy of all PKGBUILD? I have parsed all Alpine APKBUILD here:

http://www.oilshell.org/git-branch/master/21552818/wild.wwz/distro/alpine-aports/index.html

That's 5,000+ shell scripts.

Contributor

andychu commented Oct 17, 2017

Yes definitely, although it's not a trivial change in the build system. It couldn't be a .zip file anymore -- it would have to be some other in-memory format. Well maybe you could mmap() it or something, not sure.

Also, compiling the .pyc into the executable makes building for multiple platforms slower in theory. That is, I wanted an architecture-independent and architecture-dependent part. If you put all the bytecode into a C program and compile it, everything becomes architecture-dependent.

Most software has that problem but I was trying to make Oil more "optimal"... but that created some different problems as you can see!


Also a bit of a tangent about Arch... I noticed here that you are only supposed to retrieve PKGBUILD one at a time, not get the whole thing:

https://wiki.archlinux.org/index.php/Arch_Build_System#Retrieve_PKGBUILD_source_using_Svn

I wonder if there is a way to get a copy of all PKGBUILD? I have parsed all Alpine APKBUILD here:

http://www.oilshell.org/git-branch/master/21552818/wild.wwz/distro/alpine-aports/index.html

That's 5,000+ shell scripts.

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 17, 2017

Contributor

That said, is there a workaround? Unfortunately I lost your Lobsters private messages.

I asked around the IRCs and the workaround I thought may be an option won't, understandably, be admissible in the AURs. Invoking sudo ./install works in the PKGBUILD, but we can't use that because it goes against the convention.

Does Arch test the binary in the temporary location, or only in the final installed location?

There is a check() function available in PKGBUILD, but Arch doesn't do any testing unless that function is provided. In the PKGBUILD I'm currently making, I do not have that check() function, so there isn't any testing by Arch in the temporary location or the final location . The errors I'm getting are from trying to invoke osh myself after installing using the PKGBUILD.

Contributor

timetoplatypus commented Oct 17, 2017

That said, is there a workaround? Unfortunately I lost your Lobsters private messages.

I asked around the IRCs and the workaround I thought may be an option won't, understandably, be admissible in the AURs. Invoking sudo ./install works in the PKGBUILD, but we can't use that because it goes against the convention.

Does Arch test the binary in the temporary location, or only in the final installed location?

There is a check() function available in PKGBUILD, but Arch doesn't do any testing unless that function is provided. In the PKGBUILD I'm currently making, I do not have that check() function, so there isn't any testing by Arch in the temporary location or the final location . The errors I'm getting are from trying to invoke osh myself after installing using the PKGBUILD.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 18, 2017

Contributor

OK then I think there should be some setting for --prefix to make it work? Unless I am understanding something about Arch.

The assertion error happens when the binary can't find itself at runtime, i.e. PYTHONPATH is set to something invalid. It expects PYTHONPATH=/usr/bin/oil.ovm essentially.

But if it's OK that it doesn't run in the temp location, then I think we should able to make it work in the final location only.

Do you have a link to the PKGBUILD file so I can see what's going on? (I guess Arch doesn't use GitHub because it's in SVN? ) And the output of the build process would be useful.

Also if you use IRC we could arrange a time to chat there, which could be easier.

Contributor

andychu commented Oct 18, 2017

OK then I think there should be some setting for --prefix to make it work? Unless I am understanding something about Arch.

The assertion error happens when the binary can't find itself at runtime, i.e. PYTHONPATH is set to something invalid. It expects PYTHONPATH=/usr/bin/oil.ovm essentially.

But if it's OK that it doesn't run in the temp location, then I think we should able to make it work in the final location only.

Do you have a link to the PKGBUILD file so I can see what's going on? (I guess Arch doesn't use GitHub because it's in SVN? ) And the output of the build process would be useful.

Also if you use IRC we could arrange a time to chat there, which could be easier.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 18, 2017

Contributor

tl;dr I think ./configure --prefix /usr should work, but if it doesn't send me the code :)

install is a tiny file, so I think it could be duplicated in Arch (for now, although that's not ideal). Basically the build process outputs oil.ovm, and then osh is simply a symlink to it, in the style of Busybox.

Contributor

andychu commented Oct 18, 2017

tl;dr I think ./configure --prefix /usr should work, but if it doesn't send me the code :)

install is a tiny file, so I think it could be duplicated in Arch (for now, although that's not ideal). Basically the build process outputs oil.ovm, and then osh is simply a symlink to it, in the style of Busybox.

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 18, 2017

Contributor

I haven't put the PKGBUILD file anywhere public yet, I'm just testing locally. I'll post a link to it later though so y'all can test it out as well.

OK then I think there should be some setting for --prefix to make it work? Unless I am understanding something about Arch.

I think I can clarify when we chat. There's a specific subdirectory (referred to as $pkgdir within the PKGBUILD) that we have to install binaries to. It's a fakeroot where $pkgdir represents the root of the filesystem. We install binaries relative to that and makepkg theoretically does the actual placement on the real filesystem.
If I set the prefix to anything on the real filesystem, like /usr, it will actually install to that location and that violates the PKGBUILD convention. Now, if I set the prefix to /usr then manually change the actual installation location in the PKGBUILD to $pkdir/usr, I still get the same error as before. Here's the PKGBUILD file for that scenario: https://timetoplatypus.com/static/oil.PKGBUILD

Contributor

timetoplatypus commented Oct 18, 2017

I haven't put the PKGBUILD file anywhere public yet, I'm just testing locally. I'll post a link to it later though so y'all can test it out as well.

OK then I think there should be some setting for --prefix to make it work? Unless I am understanding something about Arch.

I think I can clarify when we chat. There's a specific subdirectory (referred to as $pkgdir within the PKGBUILD) that we have to install binaries to. It's a fakeroot where $pkgdir represents the root of the filesystem. We install binaries relative to that and makepkg theoretically does the actual placement on the real filesystem.
If I set the prefix to anything on the real filesystem, like /usr, it will actually install to that location and that violates the PKGBUILD convention. Now, if I set the prefix to /usr then manually change the actual installation location in the PKGBUILD to $pkdir/usr, I still get the same error as before. Here's the PKGBUILD file for that scenario: https://timetoplatypus.com/static/oil.PKGBUILD

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 24, 2017

Contributor

Any further thoughts on this? There's definitely something lost in translation when makepkg places the binaries on the filesystem. I've tried a bunch of different permutations of options, but I still get that error. Let me know if there's anything else I should try. Otherwise, it seems like the solution will have to be more with how OSH is shipped

Contributor

timetoplatypus commented Oct 24, 2017

Any further thoughts on this? There's definitely something lost in translation when makepkg places the binaries on the filesystem. I've tried a bunch of different permutations of options, but I still get that error. Let me know if there's anything else I should try. Otherwise, it seems like the solution will have to be more with how OSH is shipped

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 25, 2017

Contributor

Sorry I've been procrastinating on this. One reason is that I don't really have a good short term solution for the path issue.

I think you, along with some performance problems I noted in the latest blog post, have convinced me to get rid of the .zip file scheme in the long term. But in the short term I think the best I can do is improve the error message. It's possible to add a Linux-specific hack, but I'm not sure I want to go there yet.

One other possibility: does the Arch build system run "strip" on the binaries? See the dontStrip option here in Nix:

https://github.com/NixOS/nixpkgs/blob/master/pkgs/shells/oil/default.nix

I didn't realize this, but apparently stripping a binary with a .zip file appended messes with the zip file. That is another reason that this error could occur.

I will definitely add an improved error message for the next release so we can see why the import is failing.

Thanks for publishing the code -- that helps!

Contributor

andychu commented Oct 25, 2017

Sorry I've been procrastinating on this. One reason is that I don't really have a good short term solution for the path issue.

I think you, along with some performance problems I noted in the latest blog post, have convinced me to get rid of the .zip file scheme in the long term. But in the short term I think the best I can do is improve the error message. It's possible to add a Linux-specific hack, but I'm not sure I want to go there yet.

One other possibility: does the Arch build system run "strip" on the binaries? See the dontStrip option here in Nix:

https://github.com/NixOS/nixpkgs/blob/master/pkgs/shells/oil/default.nix

I didn't realize this, but apparently stripping a binary with a .zip file appended messes with the zip file. That is another reason that this error could occur.

I will definitely add an improved error message for the next release so we can see why the import is failing.

Thanks for publishing the code -- that helps!

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 26, 2017

Contributor

I've got some good news. Your guess about the binary stripping appears to have been accurate. makepkg does strip binaries by default and after explicitly selecting an option to not strip the binary in the PKGBUILD, it's working for me.

Identifying that as the problem also greatly simplifies the PKGBUILD, as I can now simply invoke the install script that ships with OSH.

Contributor

timetoplatypus commented Oct 26, 2017

I've got some good news. Your guess about the binary stripping appears to have been accurate. makepkg does strip binaries by default and after explicitly selecting an option to not strip the binary in the PKGBUILD, it's working for me.

Identifying that as the problem also greatly simplifies the PKGBUILD, as I can now simply invoke the install script that ships with OSH.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 26, 2017

Contributor

OK awesome! I will definitely put a better error message in the next release!

Thanks again for doing this and sorry for procrastinating :)

Contributor

andychu commented Oct 26, 2017

OK awesome! I will definitely put a better error message in the next release!

Thanks again for doing this and sorry for procrastinating :)

@andychu andychu changed the title from Assertion error making OSH AUR package to Improve error message when binary is stripped (Assertion error making OSH AUR package) Oct 26, 2017

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 26, 2017

Contributor

No problem, I'm glad we got it figured out.

By the way, am I supposed to be able to invoke oil.ovm directly? After installing OSH using the current corrected PKGBUILD, osh works fine but trying to run oil.ovm yields a "-bash: /usr/bin/oil.ovm: No such file or directory" error. It's worth noting that it only errors out when I try to run oil.ovm, and I do not get an error when I run it using absolute path (/usr/local/bin/oil.ovm)

Contributor

timetoplatypus commented Oct 26, 2017

No problem, I'm glad we got it figured out.

By the way, am I supposed to be able to invoke oil.ovm directly? After installing OSH using the current corrected PKGBUILD, osh works fine but trying to run oil.ovm yields a "-bash: /usr/bin/oil.ovm: No such file or directory" error. It's worth noting that it only errors out when I try to run oil.ovm, and I do not get an error when I run it using absolute path (/usr/local/bin/oil.ovm)

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 26, 2017

Contributor

That's interesting, maybe try in a new bash session? I think bash cached the location wrong?

I know that bash has logic for caching entries in $PATH, but I'm not sure how it's invalidated. If /usr/local/bin is in your PATH, then clearly it's done something wrong.

I would debug it like this:

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin: ...

$ which oil.ovm
/usr/local/bin/oil.ovm
$ oil.ovm
Usage: oil MAIN_NAME [ARG]...
       MAIN_NAME [ARG]...

oil behaves like busybox.  If it's invoked through a symlink, e.g. 'osh', then
it behaves like that binary.  Otherwise the binary name can be passed as the
first argument, e.g.:

    oil osh -c 'echo hi'


Missing name of main()
Contributor

andychu commented Oct 26, 2017

That's interesting, maybe try in a new bash session? I think bash cached the location wrong?

I know that bash has logic for caching entries in $PATH, but I'm not sure how it's invalidated. If /usr/local/bin is in your PATH, then clearly it's done something wrong.

I would debug it like this:

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin: ...

$ which oil.ovm
/usr/local/bin/oil.ovm
$ oil.ovm
Usage: oil MAIN_NAME [ARG]...
       MAIN_NAME [ARG]...

oil behaves like busybox.  If it's invoked through a symlink, e.g. 'osh', then
it behaves like that binary.  Otherwise the binary name can be passed as the
first argument, e.g.:

    oil osh -c 'echo hi'


Missing name of main()
@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 26, 2017

Contributor

In other words there's nothing special about oil.ovm -- if it's in your $PATH you should be able to run it, and if it isn't you won't. Since osh is in your PATH, and osh and oil.ovm live in the same directory, there's something wrong. But I feel like this has to do with bash caching.

Contributor

andychu commented Oct 26, 2017

In other words there's nothing special about oil.ovm -- if it's in your $PATH you should be able to run it, and if it isn't you won't. Since osh is in your PATH, and osh and oil.ovm live in the same directory, there's something wrong. But I feel like this has to do with bash caching.

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Oct 26, 2017

Contributor

Ohp, ya it looks like it's probably just a caching issue. I started a new bash session and it worked.

Contributor

timetoplatypus commented Oct 26, 2017

Ohp, ya it looks like it's probably just a caching issue. I started a new bash session and it worked.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Oct 26, 2017

Contributor

Yeah this is related to issue #44. I consider that a bash bug.

Contributor

andychu commented Oct 26, 2017

Yeah this is related to issue #44. I consider that a bash bug.

@garthy

This comment has been minimized.

Show comment
Hide comment
@garthy

garthy Oct 29, 2017

Ok I fixed an issue like this in pyinstaller. I used objcopy to append the zip file to the executable in a valid elf section. This doesn't get stripped off the executable.

pyinstaller/pyinstaller#1812

garthy commented Oct 29, 2017

Ok I fixed an issue like this in pyinstaller. I used objcopy to append the zip file to the executable in a valid elf section. This doesn't get stripped off the executable.

pyinstaller/pyinstaller#1812

andychu pushed a commit that referenced this issue Nov 7, 2017

Andy Chu
Add a more detailed error message when the app bundle can't be imported.
Fixes issue #47.

Also:

- Fix the wild table sorting.  It was broken after removing the columns
  about timing.
- Add web code to the line count.  csv2html.py is new for the OSH 0.2
  release.
- Fix a build issue with osh-quick-ref.  This build step outputs both
  code that goes in the app bundle and docs that go in _release/VERSION.
@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Nov 7, 2017

Contributor

@garthy Thanks, I didn't know about that! I added a better error message just now, but I will think about doing the objcopy instead.

Do you know if objcopy is GNU- or Linux- specific?

I might get rid of the app bundle thing altogether, so I'm not sure how much I want to fiddle with it.

Contributor

andychu commented Nov 7, 2017

@garthy Thanks, I didn't know about that! I added a better error message just now, but I will think about doing the objcopy instead.

Do you know if objcopy is GNU- or Linux- specific?

I might get rid of the app bundle thing altogether, so I'm not sure how much I want to fiddle with it.

@timetoplatypus

This comment has been minimized.

Show comment
Hide comment
@timetoplatypus

timetoplatypus Nov 10, 2017

Contributor

objcopy is a specifically a GNU development tool. It ships with the GNU binutils package.

Contributor

timetoplatypus commented Nov 10, 2017

objcopy is a specifically a GNU development tool. It ships with the GNU binutils package.

@andychu

This comment has been minimized.

Show comment
Hide comment
@andychu

andychu Nov 14, 2017

Contributor

@timetoplatypus

OK thanks, that's what I thought. I improved the error message for this release, so for now I'm closing this bug. But if people continue running into problems (mainly packagers I expect, not end users) then we can revisit it.

Contributor

andychu commented Nov 14, 2017

@timetoplatypus

OK thanks, that's what I thought. I improved the error message for this release, so for now I'm closing this bug. But if people continue running into problems (mainly packagers I expect, not end users) then we can revisit it.

@andychu andychu closed this Nov 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment