Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argparse optionals with nargs='?', '*' or '+' can't be followed by positionals #53584

Open
bethard mannequin opened this issue Jul 23, 2010 · 26 comments
Open

argparse optionals with nargs='?', '*' or '+' can't be followed by positionals #53584

bethard mannequin opened this issue Jul 23, 2010 · 26 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bethard
Copy link
Mannequin

bethard mannequin commented Jul 23, 2010

BPO 9338
Nosy @rhettinger, @jwilk, @merwok, @berkerpeksag, @kernc, @jacksonriley, @iritkatriel
Files
  • test_pos_after_var_args.patch
  • issue9338_argparse.patch
  • issue9338_3.patch
  • test_9338.py
  • issue9338_7.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-07-23.10:46:27.420>
    labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
    title = "argparse optionals with nargs='?', '*' or '+' can't be followed by positionals"
    updated_at = <Date 2022-02-17.10:51:52.599>
    user = 'https://bugs.python.org/bethard'

    bugs.python.org fields:

    activity = <Date 2022-02-17.10:51:52.599>
    actor = 'wdoekes'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2010-07-23.10:46:27.420>
    creator = 'bethard'
    dependencies = []
    files = ['18328', '19691', '30296', '30297', '30350']
    hgrepos = []
    issue_num = 9338
    keywords = ['patch']
    message_count = 25.0
    messages = ['111270', '111280', '111282', '112511', '121013', '121720', '122016', '141639', '141660', '166237', '166239', '189198', '189200', '189470', '189472', '189867', '189868', '200858', '200862', '216990', '285866', '314213', '355748', '370265', '408332']
    nosy_count = 19.0
    nosy_names = ['rhettinger', 'bethard', 'wrobell', 'jwilk', 'eric.araujo', 'TD22057', 'wdoekes', 'catherine', 'elsdoerfer', 'Kotan', 'berker.peksag', 'paul.j3', 'phawkins', 'kernc', 'sebix', 'extmind', 'jacksonriley', 'iritkatriel', 'kottalovag']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9338'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @bethard
    Copy link
    Mannequin Author

    bethard mannequin commented Jul 23, 2010

    [From the old argparse tracker: http://code.google.com/p/argparse/issues/detail?id=20]

    You can't follow a nargs='+' optional argument with a positional argument:

    >>> import argparse
    >>> parser = argparse.ArgumentParser(prog='PROG')
    >>> parser.add_argument('--badger', nargs='+')
    >>> parser.add_argument('spam')
    >>> parser.parse_args('--badger A B C D'.split())
    usage: PROG [-h] [--badger BADGER [BADGER ...]] spam
    PROG: error: too few arguments

    Ideally, this should produce:

    >>> parser.parse_args('--badger A B C D'.split())
    Namespace(badger=['A', 'B', 'C'], spam='D')

    The problem is that the nargs='+' causes the optional to consume all the arguments following it, even though we should know that we need to save one for the final positional argument.

    A workaround is to specify '--', e.g.:

    >>> parser.parse_args('--badger A B C -- D'.split())
    Namespace(badger=['A', 'B', 'C'], spam='D')

    The problem arises from the fact that argparse uses regular-expression style matching for positional arguments, but it does that separately from what it does for optional arguments.

    One solution might be to build a regular expression of the possible things a parser could match. So given a parser like::

      parser = argparse.ArgumentParser()
      parser.add_argument('-w')
      parser.add_argument('-x', nargs='+')
      parser.add_argument('y')
      parser.add_argument('z', nargs='*')

    the regular expression might look something like (where positionals have been replaced by the character A)::

    (-w A)? (-x A+)? A (-w A)? (-x A+)? A* (-w A)? (-x A+)?

    Note that the optionals can appear between any positionals, so I have to repeat their regular expressions multiple times. Because of this, I worry about how big the regular expression might grow to be for large parsers. But maybe this is the right way to solve the problem.

    @bethard bethard mannequin added the type-feature A feature request or enhancement label Jul 23, 2010
    @merwok merwok closed this as completed Jul 23, 2010
    @bethard
    Copy link
    Mannequin Author

    bethard mannequin commented Jul 23, 2010

    This is definitely a different bug from the one you just marked it as a duplicate of.

    @bethard bethard mannequin reopened this Jul 23, 2010
    @merwok
    Copy link
    Member

    merwok commented Jul 23, 2010

    I read too fast, I’m sorry for that.

    @catherine
    Copy link
    Mannequin

    catherine mannequin commented Aug 2, 2010

    Here's a unit test for the simplest cases.

    @merwok
    Copy link
    Member

    merwok commented Nov 12, 2010

    Looks good to me. Do you want to propose a code patch too, and/or more tests for non-simple cases?

    @Kotan
    Copy link
    Mannequin

    Kotan mannequin commented Nov 20, 2010

    My attached patch adds the "--" between the optionals and the arguments, if there are optionals which have variable length and at least some positional argument can be provided.

    Patch is for python 3.2 svn revision 86553.

    @merwok
    Copy link
    Member

    merwok commented Nov 21, 2010

    Patch looks good, thanks! Does it integrate Catherine’s tests?

    @merwok merwok added the stdlib Python modules in the Lib dir label Nov 21, 2010
    @wrobell
    Copy link
    Mannequin

    wrobell mannequin commented Aug 4, 2011

    is there a chance to fix this issue?

    @merwok
    Copy link
    Member

    merwok commented Aug 5, 2011

    There is. Someone wanting to help could reply to the question I asked :)

    @merwok merwok added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Aug 5, 2011
    @bethard
    Copy link
    Mannequin Author

    bethard mannequin commented Jul 23, 2012

    So Kotan's patch doesn't actually solve the original problem. Instead, it inserts the workaround into the help message of the parser. I think this is probably not the right fix. We should probably do two things:

    (1) Right now: create a documentation patch which at least explains the current limitations of argparse parsing, and describes the '--' workaround. Probably this patch should add a separate section about '--', give an example like the one in this issue, and then cross-reference this section from nargs='?', nargs='*', nargs='+' and the "Arguments containing -" section.

    (2) Longer term: create a code patch that implements the changes to the regular expression-based parsing like I've suggested.

    @bethard bethard mannequin changed the title argparse optionals with nargs='+' can't be followed by positionals argparse optionals with nargs='?', '*' or '+' can't be followed by positionals Jul 23, 2012
    @bethard
    Copy link
    Mannequin Author

    bethard mannequin commented Jul 23, 2012

    And I guess bpo-9182 is the right place for (1).

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 14, 2013

    I've played a bit the idea that barthard sketched. I don't have all the details worked out, but I believe this is what will happen:

    With

      parser = argparse.ArgumentParser()
      parser.add_argument('-w')
      parser.add_argument('-x', nargs='+')
      parser.add_argument('y')
      parser.add_argument('z', nargs='*')

    some possible parses are

    '-w 1 -x 2 3 4 5',     # w:1, x:[2,3,4], y:5, z:[] -
                           # fail +
    '-w 1 2 -x 3 4 5',     # w:1, y:2, x:[3 4 5], z:[] +
    
    '-w 1 -x 2 3',         # w:1, x:[2], y:3, z:[]     -
                           # fail +
    '-x 1 2 -w 3 4 5 6',   # w:3, x:[1,2], y:4, z:[5,6] +
                           # w:3, x:[1], y:2, z:[4,5,6] -
    
    '-x 1 2 3 4 -w 5 6 7', # w:5, x:[1,2,3,4], y:5, z:[7] +
                           # w:5, x:[1,2,3], y:4, z:[6,7] -
    
    '1 2 3 -x 4 5 -w 6',   # w:6, x:[4,5], y:1, z:[2,3] +
    

    '+' lines are those currently produced
    '-' lines are ones that would be produced by these ideas

    '-w 1 -x 2 3 4 5' is the protypical problem case. The current parser allocates all [2,3,4,5] to -x, leaving none for y, thus failing. So desired solution is to give 5 to y, leaving -x with the rest.

    '-x 1 2 -w 3 4 5 6' is a potentially ambiguous case. The current parser lets -x grab [1,2]; y then gets 4, and z the remainder. But the alternative is to give 2 to y, leaving -x with just [1].

    In this case
    arg_strings_pattern = 'OAAOAAAA'

    replacing the Os with the option flags: '-xAA-wAAAA'

    I match this with a refined version of bethard's regex:

    pat1='((?:-wA)|(?:-xA+)|(?:-wA-xA+)|(?:-xA+-wA))'
    pat = _re.compile('%s?(?P<y>A)%s?(?P<z>A*)%s?'%(pat1,pat1,pat1))
    

    groups (without the Nones) and groupdict are

    ['-xA', 'A', '-wA', 'AAA']
    {'z': 'AAA', 'y': 'A'}
    

    So this does effectively give y the 2nd argument, leaving -x with just the 1st.

    The current parser effectively groups the arguments as

    ['-xAA, '-wA', 'A', 'AA']
    

    In the real world, generating and apply a global pattern like this could get complicated. For example there are long option names ('--water'), and combined argument strings ('-x1', '-x=1').

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 14, 2013

    I need to make one correction to my last post:

    '-x 1 2 -w 3 4 5 6',   # w:3, x:[1,2], y:4, z:[5,6] +
                           # w:3, x:[1], y:2, z:[4,5,6] -
    

    The second solution is only possible if 'z' is not consumed when 'y' is being processed. In current version, if consume_positionals() is called with a 'AOAAAA' pattern, 'y' will match the first 'A', and 'z' will match ''. That means '4 5 6' will be left over.

    It's only when I use the patch in http://bugs.python.org/issue14191#msg187051
    (argparse doesn't allow optionals within positionals)
    that the processing 'z' is delayed, so it can get [4,5,6].

    So at least with the 4 arguments in this example, bethard's idea only seems to make a difference in the case of '-w 1 -x 2 3 4 5', where 'y' lays claim to the last string, and '-x' gets the rest.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 17, 2013

    This patch implements, I think, the ideas bethard proposed. It is test patch, not intended for production.

    Most of work is in ArgumentParser._get_alt_length() which

    • generates a pattern along the lines bethard proposed
    • generates a string like arg_strings_pattern, but with optionals strings ('-x') instead of 'O'.
    • runs a match
    • from groups like '-xAAA', creates dict entries like:
      alt_opt_length['x'] = 3

    Later, in consume_optionals(), this alternative count replaces arg_count if it is lower. The next consume_positionals() then takes care of consuming the unconsumed arguments.

    If _get_alt_length() has any problems, it logs an error, and returns an otherwise empty dict. So it 'fails' quietly without affecting regular parsing.

    Reasons for failing include (for now) the use of subparsers, optionals with explicit args, and special prefix_chars. With exclusions like this, test_argparse.py runs without errors or failures.

    Since this is still a testing vehicle, it writes an bpo-9338.log file with debugging entries.

    This version works, but is both not sufficiently general and too general. As bethard notes, the testing pattern could get very large if there are many optionals. Ideally the pattern will allow the optionals in any order and combination between positionals. The ambiguities that I discussed in the previous 2 posts disappear if the patching pattern is sufficiently general.

    But I also suspect it is too general. It does not need to match every case, just those where an optional is consuming arguments that should go to a positional. But if we come up with something more specific, this could still be a useful testing tool.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 17, 2013

    This is a test file for the patch I just submitted.

    It is not a formal unitttest, but uses print output as much as assert.

    Cases include the example bethard used, as well as ones from test_argparse.py that initially caused problems.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 23, 2013

    Here's another approach to the problem, using an iterative localized search. For simple cases it produces the same thing, but in complex cases it is more robust.

    It is based on two ideas:

    • if the action in consume_optional() is being 'greedy', use
        slots = self._match_arguments_partial([action]+positionals, selected_patterns)

    to determine if this action can share arguments with any of the remaining positionals. This is similar to how consume_positionals() allocates arguments to the set of positionals.

    • try this 'sharing' with the last optional. If that is not enough, try the penultimate optional as well, and continue working toward the start as needed.

    Since the normal parsing is from left to right, figuring out when to start 'sharing' requires some sort of search strategy. I have moved the start_index loop into a consume_loop() function, and added a switch so it can parse the arguments without invoking take_action() (so arguments are not evaluated, and namespace is not changed).

    If there is a suspected 'greed' problem, consume_loop() is called (in test mode) one or more times to determine the right-most optionals to use, and once more (with take_action) to parse and evaluate the arguments.

    As in the previous patch this writes a log file for debugging purposes. test_argparse.py now has a number of tests for this issue.

    It is more robust than the previous patch, and does not need special handling for things like subparsers and explicit arguments.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 23, 2013

    Oops, I attached the wrong file. Here's the correct one.

    @phawkins
    Copy link
    Mannequin

    phawkins mannequin commented Oct 22, 2013

    I ran into this bug the first time I needed nargs + in a tool.

    I found of course that if the option with the nargs is followed by another option before the positional arguments it will work as expected. But then the help would have to point this out, and it still could be used incorrectly (so then I get a mail about a bug in my tool.) ;-)>

    My workaround was to use action=append instead of nargs, then user would just have to give the option for each nargs desired. Since my use would be short this was OK. But the usage message does not reflect the multiple use nature of this option

    But what I expected to find in the doc was a way to specify the use of a separator char between the nargs option arguments. For example specify that ',' is the separator arg (currently a space is the separator.) So if option is -foo the cli could be:

    myprog.py -foo bar1,bar2,bar3 pos1 pos2

    (Of course I could just have the tool take a comma delimited single argument and parse it in the tool's logic, but again then a custom usage message would be needed.)

    Has this solution been considered?

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Oct 22, 2013

    parse_args() would see ['-foo', 'bar1,bar2,bar3', 'pos1', 'pos2']. The splitting on space is done by the shell. So having your own code split 'bar1,bar2,bar3' is simplest. But that would be messed up if the user entered 'bar1, bar2, bar3...'. You could also ask the user to use quotes - "bar1, bar2, bar3".

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Apr 22, 2014

    http://bugs.python.org/issue15112
    breaks one test that I added to issue

    +class TestPositionalsAfterOptionalsPlus(ParserTestCase):
    +    """Tests specifying a positional that follows an arg with nargs=+
    +    http://bugs.python.org/issue9338#msg111270
    +    prototypical problem"""
    +
    +    argument_signatures = [
    +        Sig('-w'),
    +        Sig('-x', nargs='+'),
    +        Sig('y', type=int),
    +        Sig('z', nargs='*', type=int)]
    +    failures = ['1 -x 2 3 -w 4 5 6' # error: unrecognized arguments: 5 6
    +                # z consumed in 1st argument group '1'
    +    ]
    

    This no longer fails. Due to 15112, z=[5,6]. That is, it is no longer consumed by the 1st argument group.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Jan 20, 2017

    Recent StackOverFlow question related to this issue - where the following positional is a subparsers.

    http://stackoverflow.com/questions/41742205/how-to-argparse-with-nargs-and-subcommands

    @TD22057
    Copy link
    Mannequin

    TD22057 mannequin commented Mar 21, 2018

    Is there any chance this will ever get fixed? Patches have been available for 5 years with no progress.

    @jacksonriley
    Copy link
    Mannequin

    jacksonriley mannequin commented Oct 31, 2019

    I'm a newcomer and thought about trying to follow up on this and potentially update existing patches, would this be a good idea?

    @rhettinger
    Copy link
    Contributor

    Paul, do you have any thoughts on this one.

    ISTM that this issue isn't well suited for a newcomer because it's somewhat complex and it isn't clear what if anything should be done.

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11:

    >>> import argparse
    >>> parser = argparse.ArgumentParser(prog='PROG')
    >>> 
    >>> parser.add_argument('--badger', nargs='+')
    _StoreAction(option_strings=['--badger'], dest='badger', nargs='+', const=None, default=None, type=None, choices=None, help=None, metavar=None)
    >>> parser.add_argument('spam')
    _StoreAction(option_strings=[], dest='spam', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
    >>> parser.parse_args('--badger A B C D'.split())
    usage: PROG [-h] [--badger BADGER [BADGER ...]] spam
    PROG: error: the following arguments are required: spam

    @iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Dec 11, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    adang1345 added a commit to adang1345/delvewheel that referenced this issue Sep 9, 2022
    This reverts commit cf92927.
    Due to python/cpython#53584, it is not very
    user-friendly to have an optional argument with `nargs='?'` be followed
    by a positional.
    
    Fix #31
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Nov 24, 2022
    …ment
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 10 years. python/cpython#53584
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Nov 24, 2022
    …ment
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 12 years. python/cpython#53584
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Nov 24, 2022
    …ment
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 12 years. python/cpython#53584
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Nov 24, 2022
    …ment
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 12 years. python/cpython#53584
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Dec 1, 2022
    …ment
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 12 years. python/cpython#53584
    pmhahn pushed a commit to univention/univention-corporate-server that referenced this issue Dec 1, 2022
    …nivention-app configure
    
      This was done with text replacement, because this is a bug from argparse
      and it was not fixed in the last 12 years. python/cpython#53584
    makkes added a commit to makkes/repo-stat-tools that referenced this issue Feb 13, 2023
    This deliberately doesn't use `nargs` because of
    python/cpython#53584 where the interpreter
    can't distinguish additional arguments from positional arguments.
    Separating values by comma also seems like a more common way to pass
    multiple values.
    @hchau630
    Copy link

    any updates on this? I need to use subparsers as well as an optional argument with nargs='*', which is impossible due to this bug.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: Bugs
    Development

    No branches or pull requests

    4 participants