How to execute single file packaged python binaries #45

drunksaint · 2020-07-27T19:02:41Z

I have a custom python file argtest.py.txt that counts the number of lines from an input file and writes it to an output file:

$ python argtest.py input.txt output.txt

I am trying to run this using gg. I tried packaging it to a single binary using pyinstaller (pip install pyinstaller)

$ pyinstaller argtest.py --onefile --distpath .

This creates a single binary argtest which gives the expected output.

$ ./argtest input.txt output.txt

But after adding the correct wrapper

#!/bin/bash
model-generic "/path/to/argtest @infile @outfile" "$@"

and correctly generating the thunk in output.txt

$ gg infer argtest input.txt output.txt

running gg force output.txt results in the following error:

$ gg force output.txt 
→ Loading the thunks...  done (0 ms).
[104] Cannot open self /tmp/thunk-execute.FANRSb/argtest or archive /tmp/thunk-execute.FANRSb/argtest.pkg
std::exception
 `Tmrvv.MZ1JLsEcE3l9Jyz4bxjJ0kXnhl_ewuzxSceamw00000107': process exited with failure status 255
gg-force: `Tmrvv.MZ1JLsEcE3l9Jyz4bxjJ0kXnhl_ewuzxSceamw00000107': process exited with failure status 5

the binary is of type:

$ file argtest
argtest: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=294d1f19a085a730da19a6c55788ec08c2187039, stripped

is there something i am doing wrong here? any help will be appreciated!

The text was updated successfully, but these errors were encountered:

sadjad · 2020-07-27T19:58:53Z

Hi @drunksaint,

Let me try this out first, and will get back to you in a couple hours.

Best,
Sadjad

drunksaint · 2020-07-27T22:43:31Z

@sadjad let me know if i can help with anything

sadjad · 2020-07-28T02:23:32Z

Okay, I managed to reproduce the error with a simple script, and I'm getting the exact same message.

The problem

The PyInstaller bootstrapping function, tries to open and read the binary itself. From the looks of it, it takes argv[0] as the path for the binary, but that's non-existent (the actual binary is located in .gg/blobs/BINARY_HASH).

That's the error message you're getting: /tmp/thunk-execute.FANRSb/argtest. It tries to find argtest in the current directory.

I can think of a few solutions:

Solution 1

You can instruct gg to create a link to the binary in the execution directory. However, it's a new feature and currently only available through gg create-thunk, using --link option. For example, in your case, after creating your binary you can create your thunk like this:

gg create-thunk \
    --value $(gg hash input.txt) \
    --output output.txt \
    --executable $(gg hash argtest) \
    --placeholder output.txt \
    --link input.txt=$(gg hash input.txt) \
    --link argtest=$(gg hash argtest) \
    $(gg hash argtest)
    argtest input.txt output.txt

Two links are created: one a link to input.txt and one link to argtest. This simplifies the application, since they can refer to files using those names, and PyInstaller would be happy...

Solution 2

Of course, doing all that is not the most convenient way to create thunks. You can also change modes/model-generic.cc and add a option to tell it to include the link to the executable... (I can help with this, if you wanna go down this road).

Solution 3

Take a look at Nuitka, it's a Python compiler that's faster and makes smaller binaries than PyInstaller. I tried it with my simple script, and it works out of the box with gg (I haven't used this in real life before, it just looked promising!).

Please let me know if any of these helps!

Best,
Sadjad

drunksaint · 2020-07-28T20:17:43Z

Thanks for your help looking at this @sadjad. Both solution 1 and 3 worked for positional arguments! Solution 2 may not be required yet. I'll go down this road if I need to later. I had tried cython earlier but this was causing problems compiling larger libraries. That's why I started looking at pyinstaller. Nuitka seems to work great for larger libraries as well though. Thanks for this suggestion!

I expanded the test python file argtest.py to use a more complex combination of positional and optional arguments and this caused failure using both solution 1 & 3. In solution 1, gg was trying to read the optional arguments itself (gg-create-thunk: unrecognized option '--inputfile=i.txt') and in solution 3, i think my wrapper function is incorrect.

the command i used (i.txt and j.txt are input files whose number of lines are read):

python argtest.py 34 i.txt o.txt --arg 45 --inputfile j.txt --outputfile p.txt

my wrapper file:

#!/bin/bash
model-generic "/path/to/argtest @ @infile @outfile --inputfile=@infile --outputfile=@outfile" "$@"

is there something wrong with my wrapper file?
does gg create-thunk accept commands that use optional arguments?
is there some documentation on how to create wrappers?

sadjad · 2020-07-28T20:34:41Z

Hello there,

Glad it worked!

is there something wrong with my wrapper file?

I think the only thing that it's missing the --arg option. You need to tell model-generic about the non-file options as well, so it can parse the whole command correctly. For example, in this case, you need to add --arg=@ to the description.

does gg create-thunk accept commands that use optional arguments?

Yes, it does, but you need to tell it explicitly where the create-thunk options ends and your arguments begin; by passing -- right before passing the positional arguments:

gg create-thunk \
    --value $(gg hash input.txt) \
    ...
    --output output.txt \
    -- $(gg hash binary) argtest input.txt output.txt --any-option-you-like test

is there some documentation on how to create wrappers?

Sadly no. There are a few examples in here, and frankly, that's really all that's supported by model-generic.

drunksaint · 2020-07-28T21:24:45Z

I tried adding --arg=@. But the gg model creation gives an error.
My wrapper file:

#!/bin/bash
model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"

The error I get:

$ gg infer argtest 34 i.txt o.txt --arg 45 --inputfile j.txt --outputfile p.txt
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpected token in description
/path/to/argtest: line 2:  3516 Aborted                 (core dumped) model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"
$ gg infer argtest 34 i.txt o.txt --arg=45 --inputfile=j.txt --outputfile=p.txt
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpected token in description
/path/to/argtest: line 2:  3531 Aborted                 (core dumped) model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"

I can add a PR for simple documentation to use a custom binary with gg (create wrapper file, python to binary) if that helps.

sadjad · 2020-07-28T21:56:26Z

The issue was that we didn't have support for non-file positional arguments (the first @ in your arguments). I just pushed a commit that should fix that problem.

I can add a PR for simple documentation to use a custom binary with gg (create wrapper file, python to binary) if that helps.

That would be amazing. Thank you!

drunksaint · 2020-07-28T22:23:41Z

Nice! now the thunk creation goes through. But gg force fails:

$ gg infer argtest 65 i.txt o.txt --arg=23 --inputfile=j.txt --outputfile=p.txt
$ gg force o.txt 
→ Loading the thunks...  done (0 ms).
usage: argtest [-h] [--arg ARG] [--inputfile INPUTFILE]
               [--outputfile OUTPUTFILE]
               posarg posinfile posoutfile
argtest: error: argument posinfile: can't open 'i.txt': [Errno 2] No such file or directory: 'i.txt'
std::exception
 `TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d': process exited with failure status 2
gg-force: `TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d': process exited with failure status 5

the python file if it helps. I created the binary using

python -m nuitka --follow-imports argtest.py -o argtest

sadjad · 2020-07-28T23:40:31Z

Could you please run gg describe TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d and post the output here?

drunksaint · 2020-07-28T23:52:33Z

$ gg describe TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d
{
 "function": {
  "hash": "VwndxS_gxNE2mcwtSLrx9tEXMvi75zyQaPl5DAZEY8PA00068380",
  "args": [
   "argtest",
   "65",
   "i.txt",
   "o.txt",
   "--arg=23",
   "@{GGHASH:VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016}",
   "--outputfile=p.txt"
  ],
  "envars": []
 },
 "values": [
  "VQJaFeszdSCpcqZ.8IO313LxfNQhtfxIAF7wcf7U2nZc0000001c",
  "VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016"
 ],
 "thunks": [],
 "executables": [
  "VwndxS_gxNE2mcwtSLrx9tEXMvi75zyQaPl5DAZEY8PA00068380"
 ],
 "outputs": [
  "p.txt",
  "o.txt"
 ],
 "links": [],
 "timeout": 0
}

$ cat o.txt 
#!/usr/bin/env gg-force-and-run
TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d#o.txt
$ cat p.txt 
#!/usr/bin/env gg-force-and-run
TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d

sadjad · 2020-07-29T03:32:15Z

It looks like there's a bug in model-generic. Look at the function.args above; it totally omitted --inputfile and also didn't convert i.txt to @{GGHASH:}. I'm gonna take a look at it and fix the issue.

…).

sadjad · 2020-07-29T17:17:09Z

I just pushed a commit that hopefully fixes the issue!

When I was looking at model-generic implementation, I remembered how narrow the implementation was. It should be fine for now, but, for example, if instead of --inputfile A you pass --inputfile=A, it would not work. I'm motivated to redo the implementation to include support for all POSIX-style options, but that'll take some time :)

Please let me know if this fixes your problem.

Thank you!

drunksaint · 2020-07-29T18:49:59Z

@sadjad that would really help using gg with custom commands! :).

Your changes + replacing --inputfile=A with --inputfile A works perfectly!! Thanks for the fixes!

I tried 2 other things:

boolean flags don't seem to work right now (command --flag). I can see the problem here. Maybe something needs to be added in generic.cc as well, but I'm not sure.
i tried seeing if gg could be made to use the redirection operator > by substituting it with @ in the wrapper file. Seems like that doesn't work as expected.

$ helloworld > o.txt

associated wrapper file:

#!/bin/bash
model-generic "/path/to/helloworld @ @outfile" "$@"

model inference error:

$ gg infer helloworld > o.txt 
terminate called after throwing an instance of 'std::runtime_error'
  what():  missing positional argument
/path/to/helloworld: line 2: 19498 Aborted                 (core dumped) model-generic "/path/to/helloworld @ @outfile" "$@"

The error message seems to be related to your latest commit, so i thought it might be relevant.

I really appreciate your help with everything here. Thanks!

UPDATE: just realized that the shell is removing everything from the redirection operator. not sure what the best way to do this is.

sadjad · 2020-07-29T19:27:49Z

Awesome!

boolean flags don't seem to work right now (command --flag). I can see the problem here. Maybe something needs to be added in generic.cc as well, but I'm not sure.

You should not include boolean flags in the description---only options with a required argument are necessary.

i tried seeing if gg could be made to use the redirection operator > by substituting it with @ in the wrapper file. Seems like that doesn't work as expected.

The redirection operator is handled by the shell itself and is never passed to the program. So, in case of helloworld > o.txt, shell runs helloworld and writes its stdout to o.txt. The contract in a gg thunk is that it writes its output to a file, and then that file is grabbed by gg. Currently, there's no mechanism to directly tell gg to grab the stdout.

However, there's a trick you can play. You can wrap the command you wanna run in another script. For example:

#!/bin/sh

helloworld >o.txt

Then, create a thunk for this script, which writes its output to o.txt!

A year ago, I was trying to make gg work for simple command line programs like cat and grep that write their output to stdout, by creating a generic wrapper (iowrap). It was abandoned since, but feel free to take a look: https://github.com/sadjad/ggsh

drunksaint · 2020-07-29T22:05:49Z

You should not include boolean flags in the description---only options with a required argument are necessary.

Nice, this works! I've added this with our whole discussion to the documentation in this pull request

You can wrap the command you wanna run in another script.

Sounds good. I'll try this.

A year ago, I was trying to make gg work for simple command line programs like cat and grep that write their output to stdout, by creating a generic wrapper (iowrap). It was abandoned since, but feel free to take a look: https://github.com/sadjad/ggsh

This is neat! much better than having to write wrapper commands for all scripts. I tried running it but wasn't sure how to add iowrap as a thunk. I added the files from ggsh/models to gg/src/models/wrappers and kept ggsh/iowrap in the current directory that i ran the commands from. looks like gg didn't detect the iowrap thunk or something.

$ gg infer cat i.txt
TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117

$ gg describe TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117
{
 "function": {
  "hash": "VkIXLi2AvcdLUIbAIYdr4IfjH5c.ikp.MZ4QNEELTWPY00000133",
  "args": [
   "iowrap",
   "-",
   "out",
   "cat",
   "@{GGHASH:VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016}"
  ],
  "envars": []
 },
 "values": [
  "VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016=i.txt"
 ],
 "thunks": [],
 "executables": [
  "VkIXLi2AvcdLUIbAIYdr4IfjH5c.ikp.MZ4QNEELTWPY00000133=iowrap"
 ],
 "outputs": [
  "out"
 ],
 "links": [],
 "timeout": 0
}

$ gg force out 
→ Loading the thunks...  done (0 ms).
TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117: execvpe failed
std::exception
 `TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117': process exited with failure status 1
gg-force: `TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117': process exited with failure status 5

$ gg create-thunk --value $(gg hash iowrap) --executable $(gg hash iowrap) $(gg hash iowrap) iowrap
gg-create-thunk: a thunk needs at least one output

cat especially helps with the linking step for custom commands.
Some help with how to set this up will be great. Thanks!

sadjad · 2020-07-29T23:35:03Z

Nice, this works! I've added this with our whole discussion to the documentation in this pull request

Thank you for the pull request! I just had a peek and it looks great. Will merge it as soon as possible.

I tried running it but wasn't sure how to add iowrap as a thunk.

You're almost there! You need to collect the iowrap file. From the directory of your program, run gg collect /path/to/iowrap to make a copy in .gg/blobs directory. Also you may need to collect your input file (i.txt) manually as well (these should be easy to fix).

The nice part is that you can pipe these commands together. For example, you can run:

gg infer sh -c 'cat i.txt | grep hello'

And it will work. (as far as I remember!)

(Unfortunately, gg infer cat i.txt | grep hello would not work. But imagine if instead of bash, there's a gg shell that understands these commands and takes care of things without having to explicitly type gg infer. That was the ultimate idea behind this gsh thing...)

drunksaint · 2020-07-30T00:56:27Z

You're almost there! You need to collect the iowrap file. From the directory of your program, run gg collect /path/to/iowrap to make a copy in .gg/blobs directory. Also you may need to collect your input file (i.txt) manually as well (these should be easy to fix).

Nice, it works with this fix! Piping works too! Thanks! But I'm not sure I'll be able to use it since the modeled cat looks like it works with only one input file. I'm not sure it is possible to send an unknown number of input files to a command. If I have to use cat to perform the final linking step, It can be done locally if that is the case.

I'm trying to parallelize a simple script. To do this, I'm splitting a file into small pieces and trying to create an output for each piece in an output directory. Outputs to the current directory work fine, but outputs to a subdirectory give an error:

$ mkdir outputdir
$ gg infer fileoutputtest outputdir/out.txt
$ gg force outputdir/out.txt 
→ Loading the thunks...  done (0 ms).
Issue in opening the Output file
std::exception
 `TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f': process died on signal 11
gg-force: `TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f': process exited with failure status 5

$ gg describe TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f
{
 "function": {
  "hash": "VGbXEAZKy6aaAzFPLtIR0m1JOnTchAJ2vw_7UJLiVe1s000020f8",
  "args": [
   "fileoutputtest",
   "o/out.txt"
  ],
  "envars": []
 },
 "values": [],
 "thunks": [],
 "executables": [
  "VGbXEAZKy6aaAzFPLtIR0m1JOnTchAJ2vw_7UJLiVe1s000020f8"
 ],
 "outputs": [
  "o/out.txt"
 ],
 "links": [],
 "timeout": 0
}

Seems like the issue is that the directory outputdir doesn't exist in the execution context. Looks like inputs can have directories since they are referred to by their hash but outputs cannot since there is no implicit directory creation in the execution context. Am I thinking about this the right way? Or is there some other way to create output files in a subdirectory?

sadjad · 2020-07-30T02:36:47Z

You're right about this. Currently the system doesn't create the output directory automatically. Although, I think you can try creating the o/ directory in your script, and then put the output file there.

sadjad · 2020-07-30T02:42:59Z

I'm not sure it is possible to send an unknown number of input files to a command.

This should be possible, because at the time of thunk generation, I think we know how many files we have. But I'm not sure if current implementation of iowrap has support for multiple inputs.

drunksaint · 2020-07-30T19:07:00Z

Ah i see, the gg create-thunk command can be generated dynamically. I've added multiple file support for cat to this pull request.

I think I have a much better understanding of how gg can be used to parallelize a custom workload now. Thanks for your help with everything here! I'll close this issue.

sadjad added a commit that referenced this issue Jul 29, 2020

generic.cc: Fix a bug with mixed positional and optional arguments (#45…

1b830f5

…).

drunksaint closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to execute single file packaged python binaries #45

How to execute single file packaged python binaries #45

drunksaint commented Jul 27, 2020

sadjad commented Jul 27, 2020

drunksaint commented Jul 27, 2020

sadjad commented Jul 28, 2020

drunksaint commented Jul 28, 2020 •

edited

Loading

sadjad commented Jul 28, 2020 •

edited

Loading

drunksaint commented Jul 28, 2020 •

edited

Loading

sadjad commented Jul 28, 2020

drunksaint commented Jul 28, 2020

sadjad commented Jul 28, 2020

drunksaint commented Jul 28, 2020 •

edited

Loading

sadjad commented Jul 29, 2020

sadjad commented Jul 29, 2020

drunksaint commented Jul 29, 2020 •

edited

Loading

sadjad commented Jul 29, 2020 •

edited

Loading

drunksaint commented Jul 29, 2020 •

edited

Loading

sadjad commented Jul 29, 2020 •

edited

Loading

drunksaint commented Jul 30, 2020 •

edited

Loading

sadjad commented Jul 30, 2020

sadjad commented Jul 30, 2020

drunksaint commented Jul 30, 2020

How to execute single file packaged python binaries #45

How to execute single file packaged python binaries #45

Comments

drunksaint commented Jul 27, 2020

sadjad commented Jul 27, 2020

drunksaint commented Jul 27, 2020

sadjad commented Jul 28, 2020

The problem

Solution 1

Solution 2

Solution 3

drunksaint commented Jul 28, 2020 • edited Loading

sadjad commented Jul 28, 2020 • edited Loading

drunksaint commented Jul 28, 2020 • edited Loading

sadjad commented Jul 28, 2020

drunksaint commented Jul 28, 2020

sadjad commented Jul 28, 2020

drunksaint commented Jul 28, 2020 • edited Loading

sadjad commented Jul 29, 2020

sadjad commented Jul 29, 2020

drunksaint commented Jul 29, 2020 • edited Loading

sadjad commented Jul 29, 2020 • edited Loading

drunksaint commented Jul 29, 2020 • edited Loading

sadjad commented Jul 29, 2020 • edited Loading

drunksaint commented Jul 30, 2020 • edited Loading

sadjad commented Jul 30, 2020

sadjad commented Jul 30, 2020

drunksaint commented Jul 30, 2020

drunksaint commented Jul 28, 2020 •

edited

Loading

sadjad commented Jul 28, 2020 •

edited

Loading

drunksaint commented Jul 28, 2020 •

edited

Loading

drunksaint commented Jul 28, 2020 •

edited

Loading

drunksaint commented Jul 29, 2020 •

edited

Loading

sadjad commented Jul 29, 2020 •

edited

Loading

drunksaint commented Jul 29, 2020 •

edited

Loading

sadjad commented Jul 29, 2020 •

edited

Loading

drunksaint commented Jul 30, 2020 •

edited

Loading