Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper command argument parsing #661

Merged
merged 16 commits into from Jun 22, 2021
Merged

Proper command argument parsing #661

merged 16 commits into from Jun 22, 2021

Conversation

hugsy
Copy link
Owner

@hugsy hugsy commented Jun 12, 2021

Description/Motivation/Screenshots

Use argparse module to provide a lightweight way to parse command line arguments, instead of the hackish getopt we used so far.

This PR adds a new decorator, @parse_arguments() that takes 2 arguments: a dict of the required arguments, and another of optional non-positional arguments. Items of those dicts have the format { "argument_name" : default_value }: the type expected is inferred from the default value (therefore avoid using None, for strings prefer "", for ints prefer 0, etc.)

Positional (required) arguments also allow to pass a list of default values:

@parse_arguments({"foo": ["bar", "baz"]}, {"--plop": 1})
def do_invoke()...

Note that the argparsing is optional, but if we use it we must change the signature of the decorated do_invoke. Doing so allows us to retrieve the parsed arguments in kwargs["arguments"] . It is done as such

@parse_arguments({"foo": 1}, {"--bleh": ""})
def do_invoke(self, argv, *args, **kwargs):
  args = kwargs["arguments"]
  if args.foo == 1: ...

The only exception to the type inferrence comes for boolean values: doing

@parse_arguments({}, {"--plop": True})
def do_invoke()...

will not mean that kwargs["arguments"] == True by default : boolean behaves as store_true items for argparse . Therefore only their presence on the command line will set the value to True (meaning here, args.plop will be False by default, unless mycmd --plop is entered). The other way goes for @parse_arguments({}, {"--plop": False}) with store_false.

This was done on purpose to not break the existing API for do_invoke, and therefore break internal or external gef commands.

This PR closes #659 (and also closes #658 by extension).

Cheers,

How Has This Been Tested?

Architecture Yes/No Comments
x86-32 ✔️
x86-64 ✔️
ARM ✖️
AARCH64 ✔️
MIPS ✖️
POWERPC ✖️
SPARC ✖️
RISC-V ✖️
make tests ✔️

This is a pure Python change, I don't expect tests would go wrong on any other arch (still tested successfully aarch64 though).

Checklist

  • My PR was done against the dev branch, not master.
  • My code follows the code style of this project.
  • My change includes a change to the documentation, if required.
  • My change adds tests as appropriate.
  • I have read and agree to the CONTRIBUTING document.

hugsy added 11 commits June 8, 2021 17:35
* ported a few functions to use it
* updated unit tests to reflect the syntax change
* ported a few functions to use it
* updated unit tests to reflect the syntax change
- added `pattern` command
- changed the default period to 4 for better compat with `pwntools` (fixed #658)
- updated unit tests
- fixed ArgumentParser prog name (#659)
Copy link
Collaborator

@Grazfather Grazfather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty awesome. Will review more when I get clarification of how we have optional in the required parts.

docs/api.md Outdated Show resolved Hide resolved
gef.py Show resolved Hide resolved
gef.py Outdated
if argtype == int:
argtype = int_wrapper

if argname.startswith("-"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we have optional args in the required_arguments part?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have this section.

Why would we support this:

@parse_arguments({"--x": 0}, {})

AND

@parse_arguments({}, {"--x": 0})

(which do the same thing)

and

@parse_arguments({"x": 0}, {})

but not

@parse_arguments({}, {"x": 0})

We should either:

  1. Take 2 arguments. Everything in arg 1 is required, everything in arg2 is option
  2. Take 1 argument. Everything that starts with -- is optional, everything missing it is required.

Copy link
Owner Author

@hugsy hugsy Jun 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your points better, so let me clarify (keep in mind I made a lot of implementation choices on best effort towards what would be the most useful. I don't claim they are perfect 😄 ).

The approach parse_arguments( required_dict, optional_dict) was made to easily allow creating mandatory arguments: a lot of commands that don't use getopt still do some checks over argv, having required positional arguments. This was meant to (in the future) wraps all those commands easily. Basic (non-existing) example would be replacing

def do_invoke(self, argv):
   if len(argv) != 1: 
      err(...)
      return
   ...

To

@parse_arguments({"whatever", ""})
def do_invoke(self, argv, *args, **kwargs):

Also non-required arguments cannot be positional because I allowed nargs='*' which is useful (see AssembleCommand for an example). Hence the check for non-required args starting with -.

Regarding what you said about making the required dict in the format {"name" : type }, I don't like, I find it ugly and confusing. Also since I allowed the use of nargs='*' you need default values (think of the case you have 2 nargs='*'). But if @daniellimws feels the same way, I'll follow the majority.

We should either:

  1. Take 2 arguments. Everything in arg 1 is required, everything in arg2 is option

Yes that's exactly what is there now, with the fact that arg1 also accepts --args that are marked as required=true.

  1. Take 1 argument. Everything that starts with -- is optional, everything missing it is required.

No that's a simplistic approach: using nargs='*' prevents from having multiple position nargs='*' arguments (I've tried, if you find a way, I'm open). But on the other hand, you may want to pass other options that you (command developer) want required. Hence the possibility to provide required --foo.

Think of the AssembleCommand: you wanna do asm --arch=x86 nop; nop; nop; I used positional argument with nargs='*' to collect all instructions. But say I want to force the user to provide an architecture, with your approach it wouldn't work.

Hope it helps understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that helps a lot! Do we have any commands that are positional that require a - prefix? I think we could avoid that, and then simply put positional (and variadic) args in the first and optional in the second. Does that work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding what you said about making the required dict in the format {"name" : type }, I don't like, I find it ugly and confusing.

What do you mean? This is already how you did it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose (because I don't see why not 😃 ), but then if I must change the prototype of parse_arguments then let's agree for good now about the form it'd take, because I'll have to update each command and test again.

With variadic your extra code makes sense. Let's make it so that positional cannot be flags, except for boolean, basically:

disasm $pc -hex makes sense, where hex is a conditional

asm --arch=x86 nop, nop, nop makes sense, with variadic positional and one option, but in this case, since each instruction can be more than one 'word' it might make more sense to expect a string, e.g.

asm --arch=arm "mov r0, 1; mov r1, 0"

Does this arg parser correct put that whole string into one arg?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be done. Then we proscribe the use of variadiac arguments altogether in the argparser right?
I'm cool with it, just want things to be clear and ok for all of us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to support variadic

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make a choice here because we cannot have variadic (as in nargs='*') positional arguments.

Personally I prefer being able to type stuff like asm --arch=x86 nop; nop; nop rather than asm --arch=x86 "nop; nop; nop" (double quoted) but I can be fine with both. Does someone has a strong (argumented) opinion a way or another?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop; nop; nop isn't a great, example. What about
mov eax, 1; xor ebx, ebx; pop rdi. Would you go --arch=x86 "mov eax, 1" "xor ebx, ebx" "pop rdi"?

If each argument is a word, then how do we get it back into 3 instructions (since we would have EIGHT args, each word)? Just do " ".join(...)?.

I am like supporting nargs = + or *, I just think that when providing asm (another language) it makes sense to provide all of the assembly as one string are, since if arg parse chops it up it'll do one arg per word, and we have to 'rebuild' it into a string anyway.

Basically I think that variadic args are good, but maybe asm isn't the best example of where it's handy.

That said, I am still OK with asm supporting your way (I just wouldn't use it that way).

gef.py Outdated Show resolved Hide resolved
gef.py Outdated Show resolved Hide resolved
gef.py Show resolved Hide resolved
gef.py Outdated Show resolved Hide resolved
gef.py Outdated Show resolved Hide resolved
gef.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@daniellimws daniellimws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the decorator easy to use, just left my opinion on the required arguments part. Additionally, do you have plans to support adding descriptions to each argument, instead of having them all in _syntax_?

Haven't ran this yet. May update more later.

gef.py Outdated
if argtype == int:
argtype = int_wrapper

if argname.startswith("-"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused as well about why the required values would have default values, because don't think they will ever be used? In that case, for required arguments, supplying the type feels more appropriate.

Also, for required arguments, shouldn't we use nargs='+' instead of 'nargs='*'. For easy reference (taken from Python docs):

'+'. Just like '*', all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present.

gef.py Show resolved Hide resolved
@hugsy
Copy link
Owner Author

hugsy commented Jun 17, 2021

Yes, that can be done. Then we proscribe the use of variadiac arguments altogether in the argparser right?
I'm cool with it, just want things to be clear and ok for all of us.

Any update? There's no reason not to merge this soon.

@daniellimws
Copy link
Collaborator

I'm ok with what we have here now. 👍

@hugsy
Copy link
Owner Author

hugsy commented Jun 20, 2021

Very cool then, I will merge later today then. This way we can continue progressively porting the other commands to it.

Thanks for your feedback guys!

Copy link
Collaborator

@Grazfather Grazfather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving comments that haven't actually been fixed so we don't accidentally merge first.

@@ -212,6 +212,17 @@ gef_on_exit_unhook
> GDB are only present in the very latest version of GDB.


```
@parse_arguments( {"required_argument_1": DefaultValue1, ...}, {"--optional-argument-1": DefaultValue1, ...} )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe show a boolean in here, e.g. "-flag": doesntmatter

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ? flag is a valid flag

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why support -flag and flag? That just makes our cli less consistent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what's the point of the test for startswith -, then?

gef.py Outdated
if argtype == int:
argtype = int_wrapper

if argname.startswith("-"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to support variadic

gef.py Outdated
parser.add_argument(argname, type=argtype, default=argvalue, nargs=nargs)

for argname in optional_arguments:
if not argname.startswith("-"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want two dashes here, then?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argparse also supports one-dash arguments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but don't you think we can'should impose some standard so that we only support a more narrow 'look' of args?

With the 'alternative' names I can see why we'd support both, e.g. --flag and -f, so I am fine with this, just think about how you want it :)

gef.py Outdated Show resolved Hide resolved
hugsy added 3 commits June 20, 2021 12:09
Copy link
Collaborator

@Grazfather Grazfather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few Qs but otherwise good to go.

gef.py Show resolved Hide resolved
@parse_arguments( {"required_argument_1": DefaultValue1, ...}, {"--optional-argument-1": DefaultValue1, ...} )
```

This decorator aims to facilitate the argument passing to a command. If added, it will use the `argparse` module to parse arguments, and will store them in the `kwargs["arguments"]` of the calling function (therefore the function **must** have `*args, **kwargs` added to its signature). Argument type is inferred directly from the default value **except** for boolean, where a value of `True` corresponds to `argparse`'s `store_true` action. For more details on `argparse`, refer to its Python documentation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that you can provide a tuple as a key (as in your example) to provide alternative arg names, unless that's what you mean by 'argument flags' below.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I forgot to mention it. Will fix.

gef.py Outdated
pbcopy = which("pbcopy")
prog = [pbcopy]
else:
raise NotImplementedError("paste: Unsupported OS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not paste, but copy to clipboard

Copy link
Owner Author

@hugsy hugsy Jun 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make up your mind 😋 -> #661 (comment)
I literally copied what you wanted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 my bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do 'copy'

@hugsy hugsy merged commit fac0698 into dev Jun 22, 2021
@hugsy hugsy deleted the dev-argparse branch June 22, 2021 03:14
hugsy pushed a commit that referenced this pull request Jun 23, 2021
hugsy pushed a commit that referenced this pull request Jun 23, 2021
hugsy pushed a commit that referenced this pull request Jun 23, 2021
hugsy pushed a commit that referenced this pull request Jun 23, 2021
hugsy pushed a commit that referenced this pull request Jun 23, 2021
hugsy pushed a commit that referenced this pull request Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unified argument parsing support for commands Switch Debruijn pattern to use 4 byte period
3 participants