Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore compatibility with docopt 0.6.2 docstrings #36

Merged
merged 14 commits into from Sep 8, 2022

Conversation

h4l
Copy link
Contributor

@h4l h4l commented Aug 27, 2022

This is my proposal to fix #33. (With the assumption that compatibility with docopt is a goal of docopt-ng.)

This PR adds two new functions, parse_docstring_sections() and parse_options(); and uses them to parse docstrings accepted by docopt 0.6.2, while retaining docopt-ng's improvements to supported syntax.

Currently, docopt-ng parses option-defaults using a strategy that was in docopt's master branch, but considered unstable by the author, and was not released in docopt. It looks for option descriptions in an "options:" section, which is ended on the first blank line. This has the side-effect that options defined in a man-page style — with blank lines in-between — are not found. Neither are options outside an options: section (docopt allows options to follow the usage with no section heading).

parse_docstring_sections() is used to separate the usage section from the rest of the docstring. The text before the usage is ignored. The usage body (without its header) is parsed for the argument pattern and the usage header with its body is used to print the usage summary help. The text following the usage is parsed for options descriptions, using parse_options(), which supports option the description syntax of both docopt and the current docopt-ng.

Note that docopt 0.6.2 recognises option descriptions in the text prior to the usage section, but this change does not, as it seems like an unintended side-effect of the previous parser's implementation, and seems unlikely to be used in practice.

The testcases have two cases added for docopt 0.6.2 compatibility.

The first commit ("Fix test for missing arg before --") could be merged separately, but my final commit's tests depends on it.

@h4l
Copy link
Contributor Author

h4l commented Aug 27, 2022

Just pushed a change to import Sequence from typing instead of collections.abc as that won't work on Python 3.7.

@h4l
Copy link
Contributor Author

h4l commented Aug 27, 2022

Related issue in the docopt repo: docopt/docopt#259

assert parse_section("usage:", "foo bar fizz buzz") == []
assert parse_section("usage:", "usage: prog") == ["usage: prog"]
assert parse_section("usage:", "usage: -x\n -y") == ["usage: -x\n -y"]
assert parse_section("usage:", usage) == [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h4l It looks like the constant usage isn't used anymore, should be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I'll get rid of it.

Copy link
Contributor Author

@h4l h4l Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • remove unused usage constant

"""
usage_pattern = r"""
# Any number of lines precede the usage section
\A(?P<before_usage>(?:.*\n)*?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the ? redundant in the *? ? can't we just have *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this is a bit tricky. I thought the same looking at the original regexes docopt used, I assumed +? could just be *. But the ? acts as a modifier to make the * non-greedy . So the effect here is that we'll match as few lines as possible before a usage: line occurs.

Without the non-greedy operator it'd match as many lines as possible. This is actually still OK for correctness, as the match will initially consume all the lines, then backtrack until it finds a usage: line to match the usage section with, and succeed from there. However the effect of that is that the final usage: line in the doc will be used. With the non-greedy match, the first usage: section gets used.

So it only matters if a doc incorrectly has multiple usage sections. There is some code that tries to check for multiple usage sections, and raises an error if there are multiple (that's lint_docstring in this version). I just tried with the standard greedy match, and lint_docstring currently fails to notice the second usage section in this case, as it currently looks only in the usage body and the section after usage, not in the section before.

The same effect as the non-greedy match can be achieved in this case using a negative lookahead assertion in this before_usage group to not allow it to match lines containing usage:. That seems like a more clear approach, as non-greedy matching is pretty obscure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that makes sense. I think you're right, the negative lookahead is more clear, otherwise I'd want a comment here, and it would be great to not have a comment.

Copy link
Contributor Author

@h4l h4l Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • make usage_pattern regex more intuitive by using lookahead instead of non-greedy match

# Any number of lines precede the usage section
\A(?P<before_usage>(?:.*\n)*?)
# The `usage:` section header.
^(?P<usage_header>.*usage:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably overkill, but perhaps .*\busage: so that it won't match "ingredients in pork sausage:" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do it.

""",
],
)
def test_parse_docstring_sections__reports_invalid_docstrings(invalid_docstring: str):
Copy link
Contributor

@NickCrews NickCrews Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a test case for usage: being present, but nothing after that?

"""
program summary

usage:
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, that test is now in test_lint_docstring.

# They can be indented with whitespace
[ \t]*
# The description itself starts with the short or long flag (-x or --xxx)
(-\S+?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, don't we want -\S+? We need one or more non-whitespace after the -?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story as the other with the non-greedy modifier, but looking at this again I don't think it's necessary. I think the repetition itself isn't needed. I'll get rid of it.

@NickCrews
Copy link
Contributor

@h4l sorry I just left a few thoughts but I don't think I'm done with a review yet. Let me look at this a bit more before you go and rework things, in case I find anything else

@h4l
Copy link
Contributor Author

h4l commented Sep 6, 2022

Oh ok, no probs. No rush.

-i, --interactive interactive picking
-p, --patch select hunks interactively
"""
@pytest.mark.xfail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be $ prog --interactive?

Copy link
Contributor

@NickCrews NickCrews Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always pass because no test case is actually generated.

Really, we need to fix conftest.py

def parse_test(raw: str):
raw = re.compile("#.*$", re.M).sub("", raw).strip()
if raw.startswith('"""'):
raw = raw[3:]
for fixture in raw.split('r"""'):
doc, _, body = fixture.partition('"""')
cases = []
for case in body.split("$")[1:]:
argv, _, expect = case.strip().partition("\n")
expect = json.loads(expect)
prog, _, argv = argv.strip().partition(" ")
cases.append((prog, argv, expect))
yield doc, cases

Something like

if len(cases) == 0:
    raise DocoptTestException(f"Each example must have at least one test case starting with '$'. Example: {doc}")

Do you want to add this? I can add in followup commit as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, oops. That @pytest.mark.xfail line is a mistake too — I was playing with allowing marking cases with pytest marks (to allow xfailing cases, or tagging cases to run a subset) but then I pulled that out, but forgot to remove that line. And seems like I butchered the lines around there somehow.

I can put in a check for that, yep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some error handling to conftest based on your suggestion, should be more robust against these kind of mistakes now! It also reports details of which test fails if there's a JSON syntax error in an test too.

@@ -950,8 +950,35 @@ local options: --baz
other options:
--egg
--spam
-not-an-option-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't delete this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the reason I got rid of this is that with docopt 0.6.2 this is actually a valid way to specify an option. In docopt-ng currently, option descriptions have to have leading whitespace, but not in 0.6.2.

I could put it back with a different name, like

-option-without-whitespace

although the "docopt 0.6.2 compatibility" test below also has options without leading whitespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. I guess it follows the line starting with - or -- rule, so you're right the other tests cover this case.

But, separate from the leading whitespace thing, I'm still hung up on the
-starts-with-single-dash-but-is-long, that isn't intuitive to me if that should be a long or short option.

Testing:

-long-solo
-s-d --single--double
--double-single -d-s

results in
{"-long-solo": false, "--single--double": false, "--double-single": false}
which I think makes sense, but probably should add a test case to make it official. Can add in follow up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that there was only one dash on that option. The option description parsing code is very lenient, it just assumes the doc is formatted according to the spec, and does almost no validation. The spec does have quite a clear definition of short vs long option in general (not specifically for option descriptions):

Words starting with one or two dashes (with exception of "-", "--" by themselves) are interpreted as short (one-letter) or long options, respectively.

  • Short options can be "stacked" meaning that -abc is equivalent to -a -b -c.

And the usage pattern / argv parsers (the Pattern classes) do implement this behaviour (slight quirk with the hyphen):

>>> docopt.docopt('usage: prog [-single-dash]', []) 
{'--': False, '-a': False, '-d': False, '-e': False, '-g': False, 
 '-h': False, '-i': False, '-l': False, '-n': False, '-s': 0}

And the argv parser/matcher doesn't recognise the the single dash option:

>>> docopt.docopt('usage: prog [options]\noptions: -single-dash\n', '-single-dash')
usage: prog [options]
options: -single-dash
An exception has occurred, use %tb to see the full traceback.

I'd suggest following the behaviour implemented in the Pattern code, as it is significantly more detailed/careful than than the option default parsing code.

def test_issue_126_defaults_not_parsed_correctly_when_tabs():
section = "Options:\n\t--foo=<arg> [default: bar]"
assert parse_defaults(section) == [Option(None, "--foo", 1, "bar")]
assert parse_options(section) == [Option(None, "--foo", 1, "bar")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just add this as a test case to the cases below? Before, we didn't have this "unit" testing for parsing options, so it made sense to have this as a separate test. But now we do have very similar tests in test_parse_options, so just put this case in there too. Having it separate implies it is somehow special, but I think this is just another instance of those other cases we test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • remove redundant test test_issue_126_defaults_not_parsed_correctly_when_tabs (include its example in main options test)

@@ -950,8 +950,35 @@ local options: --baz
other options:
Copy link
Contributor

@NickCrews NickCrews Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say in one of the commit messages:

Note that docopt 0.6.2 recognises option descriptions in the text
prior to the usage section, but this change does not, as it seems like
an unintended side-effect of the previous parser's implementation, and
seems unlikely to be used in practice.

We need to change the documentation in the README that says

Every line in doc that starts with - or -- (not counting spaces) is treated as an option description, e.g.

Then we should add a test for this:

# lines before the usage section that start with --
# are not treated as options.
r"""My CLI program

--speed is ignored as an option

usage: prog --direction
"""
# Note that "--speed" is not present
$ prog --direction
{"--direction": true}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, docopt.org says "Every line that starts with "-" or "--" (not counting spaces) is treated as an option description, e.g.:" so I think we want to stick with that. People are going to be reading that spec, and it's so stable I don't want to mess with it.

That does really throw a wrench in this implementation. What do you think, re-work this implementation to allow for --options to be scattered anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem to allow that. We just need to include sections.before_usage in the string that gets passed to parse_options(). I think you're right, better to stick to the 0.6.2 behaviour.

Copy link
Contributor Author

@h4l h4l Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Include content in before_usage when parsing option descriptions

# The first line of the body may follow the header without a line break:
(?:
# Some non-whitespace content
[ \t]*\S.*(?:\n|\Z)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we simplify (?:\n|\Z) to [\n\Z]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can't, the \Z magic doesn't work in character classes

@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Merging #36 (1cdb833) into master (6b8f84e) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 1cdb833 differs from pull request most recent head abf13dd. Consider uploading reports for the commit abf13dd to get more accurate results

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
- Coverage   99.79%   99.79%   -0.01%     
==========================================
  Files           1        1              
  Lines         492      487       -5     
==========================================
- Hits          491      486       -5     
  Misses          1        1              
Impacted Files Coverage Δ
docopt/__init__.py 99.79% <100.00%> (-0.01%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

h4l added a commit to h4l/docopt-ng that referenced this pull request Sep 7, 2022
This implements Nick's suggestion from jazzband#36. When generating tests from
testcases.docopt, we now raise an error if an example doc has no
testcases following it to actually exercise the example.

Also, tests with invalid JSON testcases are reported using the same
error reporting mechanism.
@h4l
Copy link
Contributor Author

h4l commented Sep 7, 2022

CI should pass now, I fixed the 3.7/3.8 compatibility issue.

I should have covered everything, but need to discuss the -not-an-option thing and behaviour of the usage_pattern regex.

@h4l
Copy link
Contributor Author

h4l commented Sep 7, 2022

Oh, also, fixing the testcases.docopt tests revealed an option description parsing bug which turned out to be a pre-existing issue, see commit 45f2a19 here.

@h4l
Copy link
Contributor Author

h4l commented Sep 7, 2022

The simple fixes are not-yet-squashed fixup commits, I can rebase/squash to cleanup the history before merging as and when we're ready.

"""\
\t --foo ARG, -f ARG With a long

wrapped description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h4l what actually terminates an --option item? eg

--foo  Intended description.

Now I'm back to describing the usage of my program
in general. I include [default: 42] on accident
(OK this is very contrived)

should --foo have a default of 42? To me it looks like that option description ends on the single line and so it should not have that default.

Should we have the same behavior as when the usage: section terminates? eg the option description starts when we see r"(?: )|$" (as we currently have), and then continues for any number of indented lines?
Oh but then this needs to be modified to "continues for any number of indented lines as long as they don't start with - or --"...yuck

This isn't really solving a likely problem, so we could just leave it unspecified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you answered it yourself, but the description continues until a line starts with [ \t]-\S. The previous and current code just splits the entire option description text block on lines starting in that way, and everything in between becomes part of the description for the option. So it's not at all nuanced in its handling of unusual stuff.

As for if the default should apply, I've certainly written option descriptions which extend over several lines. But I would keep them indented. Seems reasonable to end a description on a non-indented line, but it'd certainly require some thought.

Like I was mentioning in the other thread, the option description parsing code is really lenient. There was talk of defining a grammar for the docopt language in another issue — even if the implementation continued to use the hand-written parser, it'd be useful to have a grammar for the specification value of what is/isn't valid.

pytest.param("", id="empty"),
pytest.param("This is a prog\n", id="1line"),
pytest.param(
"This is a prog\n\n" "Info:\n Blah blah\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird little " " hole in the middle of this constant

Copy link
Contributor Author

@h4l h4l Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah strange, probably due to Black formatting a wrapped string.

  • Fix split string literal

Option("-e", None, 0, False),
],
),
]
Copy link
Contributor

@NickCrews NickCrews Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All of these are the "happy cases" where there truly are options. We should add cases for non-options, such as some text --foo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I've added another case to test this.

@NickCrews
Copy link
Contributor

@h4l those fixups look great! Just a few more tiny requests. The large one ("what terminates an --option?") can be a separate PR if you want to just get this in (My head is starting to hurt trying to keep all of these changes in my head) but the small ones should be included in this PR. Squashing the fixups would be nice, but if it's a pain then I'm fine with them staying in there.

@h4l
Copy link
Contributor Author

h4l commented Sep 8, 2022

@NickCrews Great, thanks for all your help reviewing this, it's really helpful. And sorry to make things trickier with that fix.

That's the "fix: handle options with descriptions on new line" commit, right? I'd be happy to take it out for another PR. The only trouble is that one of the testcases.docopt tests fails because of it — the --bar option gets mis-parsed in this example:

r"""Usage: prog [options]

global options: --foo
local options: --baz
               --bar
other options:
 --egg
 --spam
-not-an-option-
"""

We could take out the fix and mark this test as xfail to correct later.

FWIW the change shouldn't actually change the definition of how far an option description (the part with the [default: ]) extends, it only changes where the initial part (-f , --file FILE) ends.

What's happening is that other and options: are parsed as being arguments for --bar. So bar becomes default None/null, which causes the test to fail, because it expects it to be False. The specific example in this test doesn't happen with docopt-ng master because on master the other options: line is a section heading which stops --bar from seeing it. With this PR, options: headings are no longer significant, so --bar can consume it.

But the same general issue can occur on master:

In [1]: from docopt import parse_defaults

In [2]: parse_defaults("""
   ...: options:
   ...:  -f
   ...:  BLAH BLAH --other-option
   ...: """)
Out[2]: [Option('-f', '--other-option', 1, None)]

I don't think this fix should affect backwards compatibility, as the initial part this affects was always intended be on a single line, and putting them on multiple lines only "works" using this contrived syntax (the BLAH BLAH here stops the --other-option becoming a second option definition). And if the BLAH BLAH is indented more than 1 space, it doesn't count as an argument to -f either.

h4l added a commit to h4l/docopt-ng that referenced this pull request Sep 8, 2022
This implements Nick's suggestion from jazzband#36. When generating tests from
testcases.docopt, we now raise an error if an example doc has no
testcases following it to actually exercise the example.

Also, tests with invalid JSON testcases are reported using the same
error reporting mechanism.
@h4l
Copy link
Contributor Author

h4l commented Sep 8, 2022

@NickCrews I fixed those two things. Let me know what you'd like to do with the option defaults newline fix.

I pushed a separate branch to my repo with the fixups squashed out as an example: docopt-0-6-2-compat_squashed.

It was throwing an error due to the options: section occurring in
usage:, but the test was passing because the message of the exception
was not checked.
parse_docstring_sections() divides the docstring into the main
non-overlapping sections that are to be further analysed. These are
text before the usage, the usage header ("usage:" or similar), the usage
body (" proc [options]") and text following the usage.

The intention of this partitioning is to facilitate restoring the option
description parsing behaviour of the last stable docopt release (0.6.2),
while retaining the current improved behaviour of not requiring a blank
line after the usage: section; and also removing a small quirk in the
0.6.2 parser, which is that option-defaults can occur before the usage:
section. (But this partitioning provides access to the text before the
usage section, so this behaviour could be retained if desired.)
parse_options() parses option descriptions in a way that is compatible
with docopt 0.6.2, while also compatible with docopt-ng's current
behaviour.

The differences are:

  - docopt-ng requires option descriptions to be in an "options:"
    section, docopt allows them to be anywhere outside the "usage:"
    section.
  - docopt-ng requires options descriptions have leading whitespace,
    docopt allows them to start at column 0.
  - docopt-ng allows options descriptions to begin on the same line as
    a section heading, docopt does not. e.g. `options: --foo` is OK with
    docopt-ng.

parse_options() parses options following either docopt or docopt-ng's
behaviour. Although it expects the `docstring` argument to be a
sub-section of the overall docstring, so the caller is in control of
where in the docstring options are parsed from.
This commit uses parse_docstring_sections() and parse_options() to
parse docstrings accepted by docopt 0.6.2, while retaining docopt-ng's
improvements to supported syntax.

Currently, docopt-ng parses option-defaults using a strategy that was in
docopt's master branch, but considered unstable by the author, and was
not released in docopt. It looks for option descriptions in an
"options:" section, which is ended on the first blank line. This has
the side-effect that options defined in a man-page style — with blank
lines in-between — are not found. Neither are options outside an
options: section (docopt allows options to follow the usage with no
section heading).

parse_docstring_sections() is used to separate the usage section from
the rest of the docstring. The text before the usage is ignored. The
usage body (without its header) is parsed for the argument pattern and
the usage header with its body is used to print the usage summary help.
The text following the usage is parsed for options descriptions, using
parse_options(), which supports option the description syntax of both
docopt and the current docopt-ng.

Note that docopt 0.6.2 recognises option descriptions in the text
prior to the usage section, but this change does not, as it seems like
an unintended side-effect of the previous parser's implementation, and
seems unlikely to be used in practice.

The testcases have two cases added for docopt 0.6.2 compatibility.

This fixes jazzband#33
This implements Nick's suggestion from jazzband#36. When generating tests from
testcases.docopt, we now raise an error if an example doc has no
testcases following it to actually exercise the example.

Also, tests with invalid JSON testcases are reported using the same
error reporting mechanism.
This fixes a pre-existing bug in docopt-ng that wasn't previously
triggered by a test. Options in the options: section without a
description on the same line and with content on the following line
(e.g. a line-wrapped description) would be parsed as if the content on
the following line was part of the option name & arguments.

Option.parse() now ends the option name & arguments section on the end
of the line, not just on two consecutive spaces.
It's a bit of an obscure and non-obvious regex feature. The regex uses
a negative lookahead assertion now, which is still a little odd, but
it's clearer I think.
The usage section heading now needs to have a word break before usage:,
so e.g sausage: doesn't match as a usage: section.
lint_docstring() now checks for the usage_body being empty, and fails
with a message indicating this.
parse_docstring_sections() is no longer responsible for rejecting docs
with empty usage sections. Previously it did so to some extent, but
failed to reject whitespace-only usage bodies. lint_docstring() now
rejects empty usage bodies, which allows the regex's usage_body group to
be simplified.

The removed test for empty usage bodies is tested via
test_lint_docstring.
Docopt 0.6.2 allows option descriptions anywhere in the doc, not just
after the usage: section. With this change, we parse both the section
prior to the usage: and the section after for option descriptions.
This adds another case for test_parse_options demonstrating how
parse_options() parses docs which contain both options and text that
mentions options that should not be interpreted as options.

This includes an example (the '-b' option) where we do currently parse
an option mentioned in prose as a real option. In principle we could fix
this, but it's here for now to document current behaviour. And it's not
hard to work around when writing a usage doc - just need to adjust where
a line is wrapped.
@h4l
Copy link
Contributor Author

h4l commented Sep 8, 2022

@NickCrews I've updated this PR's branch to the squashed & rebased version (same as #38) so you can merge here if you're happy.

@NickCrews NickCrews merged commit 850293b into jazzband:master Sep 8, 2022
NickCrews pushed a commit that referenced this pull request Sep 8, 2022
This implements Nick's suggestion from #36. When generating tests from
testcases.docopt, we now raise an error if an example doc has no
testcases following it to actually exercise the example.

Also, tests with invalid JSON testcases are reported using the same
error reporting mechanism.
@NickCrews
Copy link
Contributor

Brilliant, thank you so much @h4l! I'll close #38 as it was superseded by this.

OK, so now I'm trying to summarize what were the changes, to determine if version number should be bumped as breaking, and what to put in changelog:

  1. BREAKING Now any line starting with - or -- is treated as an option. This means that some things that did not used to be treated as options, now ARE treated as options:
    a. lines before usage
    b. non-indented --options
    c. lines not inside the options: section
  2. NONBREAKING now allow for blank lines between options
  3. NONBREAKING now allow options descriptions to be immediately wrapped, don't have to have a two-space separator
  4. BREAKING (?) different errors raised for empty usage section

Am I missing anything? Woudl you describe this any differently?

@h4l
Copy link
Contributor Author

h4l commented Sep 9, 2022

Fantastic, thanks for your help with this @NickCrews, I really appreciate it, especially as this turned out to be quite a significant change!

I'd want to frame the changes in the context of restoring compatibility with 0.6.2, to make sure people understand why the behaviour has changed. This also implies the changes shouldn't be backwards incompatible/breaking for people who were using original docopt.

Can also link to #33 which was fixed by this PR.


  1. This looks good.

  2. Yep.

  3. I'd phrase this a little differently — it's more of a fix for an edge case that most likely nobody would have encountered. Could be summarised as:

    Options in the options: section without an argument, with descriptions on the following line, indented with only one space, are no longer interpreted as having an argument.

    If you want to include an example, docopt-ng would previously interpret --foo as having an argument here:

    options:
        --foo
     Enable the foo behaviour. (There's one space before "Enable", and 
       "Enable" was interpreted as the argument of --foo.)
    

    Whereas with this definition, --foo would be a regular boolean option as expected:

    options:
        --foo
      Enable the foo behaviour. (2 space before "Enable".)
    

    Now, docopt-ng interprets --foo as being a regular boolean option with both of these examples .

  4. The actual exception type is the same (DocoptLanguageError), just the exception's message has changed a bit. __all__ doesn't advertise DocoptLanguageError at the moment, so I don't think this constitutes an API change from the POV of a developer using docopt-ng.
    Previously, a usage section without any content would be treated as if it didn't exist, so the error would be that no usage section was found. Now the error says that a usage section was found, but is empty.


If you want to be very strict, in the PR we also stopped allowing usage section headings without a regex word break before usage: (like your sausage: example). I don't think this really needs mentioning though.

While rebasing the other PR I notice we've also fixed #9 here. That could be phrased as:

  • docs with "options:" headings containing 3 or more words no longer fail with an AssertionError. (e.g. "Very Advanced Options:")

@h4l h4l deleted the docopt-0-6-2-compat branch September 9, 2022 10:24
@h4l
Copy link
Contributor Author

h4l commented Sep 9, 2022

Because it's still v0, under semver, the version bump could be minor if you'd rather not do v1 yet.

@NickCrews
Copy link
Contributor

Awesome, thanks for helping me wrap my head around it. I'll bump from 0.8.1 to 0.9.0
and add this summary to the CHANGELOG

@h4l
Copy link
Contributor Author

h4l commented Sep 9, 2022

Great, thanks for handling that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docopt-ng fails to parse usage string that worked with docopt
2 participants