Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autosummary table gets confused by long / complex type hints #6311

Closed
svenevs opened this issue Apr 18, 2019 · 11 comments
Closed

autosummary table gets confused by long / complex type hints #6311

svenevs opened this issue Apr 18, 2019 · 11 comments

Comments

@svenevs
Copy link
Contributor

svenevs commented Apr 18, 2019

Describe the bug

While creating the table for .. autosummary::, the function

def mangle_signature(sig, max_chars=30):

will not produce the expected results, some type hints remain, albeit broken up. Here is the long function signature in question:

from pathlib import Path
# ...
def filter_file(path: Union[Path, str], pattern: str,
                repl: Union[Callable[[Match], str], str], count: int = 0,
                flags: int = 0, backup_extension: str = ".orig",
                line_based: bool = False, demand_different: bool = True,
                encoding: Optional[str] = None) -> Path:

In the .. autosummary:: table, the following is produced:

filter_file(path, str], pattern, repl, str], …)

The rendered docs can be found online here, but the root of the problem can be seen by the path and repl parameters (first and third parameter).

Stepping through, what actually happens is that everything just gets split on ', ':

opt_re = re.compile(r"^(.*, |)([a-zA-Z0-9_*]+)=")
while s:
m = opt_re.search(s)
if not m:
# The rest are arguments
args = s.split(', ')
break

So when we have something like Union[Path, str] or the (rather ugly) more complicated Union[Callable[[Match], str], str], splitting on ', ' will give us something like this (splitting each argument on line to make clear):

(Pdb) p args
[
    'path: Union[pathlib.Path',
    'str]',
    'pattern: str',
    'repl: Union[Callable[[Match[AnyStr]]',
    'str]',
    # ... remaining arguments ...
]

This is actually a pretty hard problem to solve with regular expressions, if it's even possible to do so (unless I'm missing something, it's kind of late xD). I think we would have to start counting how many [ we have seen and match them with ] (after having seen a :).

So I was playing around with alternatives, found a nice SO answer that has some relevant stuff, here's a rough prototype:

#!/usr/bin/env python3

import ast
import re
class HintDestroyer(ast.NodeTransformer):
    # See: https://stackoverflow.com/a/42734810/3814202
    # we only need to visit function since we only create function
    def visit_FunctionDef(self, node):
        node.returns = None
        if node.args.args:
            for arg in node.args.args:
                # just for a glimpse ...
                print("Removing annotation for {}: {}".format(arg.arg, arg.annotation.__dict__))
                arg.annotation = None
        return node

# sample input that breaks
sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"

# create a valid function signature to parse with ast module.
function = "def foo{sig}: pass\n".format(sig=sig)

# parse and transform
parsed_function = ast.parse(function)
transformed = HintDestroyer().visit(parsed_function)
# we only created a single "function", there should be a single module with a
# function as its body.
transformed_function = transformed.body[0]

# is this what we want?
args = ", ".join([item.arg for item in transformed_function.args.args])
print(args)
# output:
# path, pattern, repl, count, flags, backup_extension, line_based, demand_different, encoding

Right now I believe what args and opts in the current autosummary code are doing is trying to shove opts at the end if there is space (since there's a maximum length here). I'm not sure why this happens, since all of the keyword arguments have to be last anyway right?

Basically we can use ast and just limited join on the transformed thing if we don't care about necessarily preserving existing behavior. But I'm not sure about the implications of all this ... so thought I would open up the issue for discussion of other possible solutions or maybe downsides to use ast (it definitely doesn't come for free, but it's also a simple parse).

To Reproduce
Steps to reproduce the behavior:

$ git clone https://github.com/svenevs/ci_exec.git
$ cd ci_exec
$ tox -e docs
$ open .tox/docs/tmp/html/index.html

# ... or ...

$ pip install -r docs/requirements.txt
$ cd docs
$ make html 
$ open _build/html/index.html

Expected behavior
No stray str] end up as a "parameter" in the autosummary table.

Your project
https://github.com/svenevs/ci_exec

Screenshots

over_parse

Environment info

  • OS: linux
  • Python version: 3.7.2
  • Sphinx version: 2.0.1
  • Sphinx extensions: sphinx.ext.autosummary
  • Extra tools: n/a
@svenevs
Copy link
Contributor Author

svenevs commented Apr 19, 2019

Redacted previous comment, sorry for noise. I need to spend more time looking at the function grammar, I made assumptions about ordering that I apparently cannot. I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;) We don't need to do any transforming, but we do need to parse a def foo{sig}: pass\n like thing. I just need to go learn more about python than I expected, the rules of where *args and other things are a lot more complicated than I thought x0

@tk0miya
Copy link
Member

tk0miya commented May 6, 2019

Sorry for late response. I just posted #6344 to fix this.

I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;)

Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (a=<object object at 0x1047a60f0>). So ast module is not good for this case...

@svenevs
Copy link
Contributor Author

svenevs commented May 7, 2019

Ah interesting. I should be able to test the PR tonight but it looks good to me from inspection!

@svenevs
Copy link
Contributor Author

svenevs commented May 8, 2019

Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (a=<object object at 0x1047a60f0>). So ast module is not good for this case...

In this case, we could cheat the system if we can rely on <angle brackets> for inspected stuff. Just replace <anything> with object() or some dummy value that will satisfy ast? I think #6344 can be patched to work as needed, but I may bring this up in the context of autodoc soon too. I'll reach out to the mailing list when I figure out exactly what I am trying to accomplish, the goal would be to enable both extensions to use the same type hint stripping.

@svenevs
Copy link
Contributor Author

svenevs commented May 8, 2019

Sharing the test code that I was playing with back then for ast, the idea would be to make raw_signature available outside of mangle so that we can have a generic way to remove type hints. So maybe rename raw_signature to remove_type_hints (so that users could wield it with autodoc-process-signature if they wanted). The outstanding question is creating test cases where it's an inspected object.

As said previously, I think the easy solution is to find out how to fix #6344. I can revisit this in the future when I have more time. I just figured I'd make this code available publicly in case anybody wants to step in and see if they can get it to work with inspected objects. We'd need to pre-process sig to remove <object at 0xdeadbeef> stuff, replacing it with something that ast will be ok with e.g. object()

#!/usr/bin/env python3

import ast
import itertools


def raw_signature(sig):
    # create a valid function signature to parse with ast module.
    function = "def foo{sig}: pass\n".format(sig=sig)
    parsed = ast.parse(function)
    # parsed will have a body that is the module, with one element: the function
    parsed_function = parsed.body[0]

    # Gather all function parameter names without any annotations.  Store the
    # col_offset as the first index so that argument order can be preserved by
    # sorting.
    all_args = []  # type: List[Tuple[int, str]]
    # args.args: list of positional arguments
    if parsed_function.args.args:
        for item in parsed_function.args.args:
            all_args.append((item.col_offset, item.arg))
    # args.vararg: variadic pack (e.g., *args), * is removed from argument name
    # so we add it back.
    if parsed_function.args.vararg:
        all_args.append((
            parsed_function.args.vararg.col_offset,
            "*{0}".format(parsed_function.args.vararg.arg)
        ))
    # args.kwonlyargs: keyword only args (e.g., after a '*,')
    if parsed_function.args.kwonlyargs:
        for item in parsed_function.args.kwonlyargs:
            all_args.append((item.col_offset, item.arg))
    # args.kwarg: variadic keyword pack (e.g., **kwargs), ** is removed from
    # argument name so we add it back.
    if parsed_function.args.kwarg:
        all_args.append((
            parsed_function.args.kwarg.col_offset,
            "**{0}".format(parsed_function.args.kwarg.arg)
        ))

    # Sort based off the col_offset, but keep the name.
    ordered_arguments = [item[1] for item in sorted(all_args)]
    COMPARE = True
    if COMPARE:
        unordered = [item[1] for item in all_args]
        print("unordered: ", ", ".join(unordered))
    # TODO: this becomes a limited_join
    return ", ".join(ordered_arguments)


if __name__ == "__main__":
    print("==> no parameters:")
    print(raw_signature("()"))

    print("==> no annotations:")
    print(raw_signature("(boom, blam, blah)"))

    print("==> (stage: str, *args, **kwargs)")
    print(raw_signature("(stage: str, *args, **kwargs)"))
    # TODO: what are the rules????
    print(raw_signature("(stage: str, x=12, *args, x=11, **kwargs)"))

    print("==> *, thingy")
    print(raw_signature("(x=12, *, y=13, **kwargs)"))
    print(raw_signature("(foo: str, *, alpha: int = 2, beta: str = 'hi')"))

    sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"
    print("==> complicated example:")
    print(raw_signature(sig))

@tk0miya
Copy link
Member

tk0miya commented May 9, 2019

I found other case.

(a, b[, c]) :: (a, b[, c])
(a, b[, cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]) :: (a, b[, ...)

I don't know where such signatures come from. But it has been supported by mangle_signature().

IMO, current approach is not good. autosummary already have raw python function (or method) object once. So we can generate better signature from it. We don't need to modify signature string.

Note: Users can override the signature of functions (or methods) by docstring when autodoc_docstring_signature is set.

@svenevs
Copy link
Contributor Author

svenevs commented May 9, 2019

@pv do you happen to know what these test cases for autosummary are?

TL;DR this thread is trying to find a way to remove [ and ] from type hints.

https://github.com/sphinx-doc/sphinx/blame/165897a74951fb03e497d6e05496ce02e897f820/tests/test_ext_autosummary.py#L43

If we removed them, would that be OK or do they serve a specific purpose? There's a couple other tests like this that don't seem to be "valid python function signatures", we're a little confused about what they represent.

@tk0miya
Copy link
Member

tk0miya commented May 11, 2019

You can do it by overriding signature via docstring (when autodoc_docstring_signature enabled) like following:

def foo():
    """foo(x[, y, z])
    docstring.
    """

I don't know why this is useful. But it is current behavior.

@pv
Copy link
Contributor

pv commented May 11, 2019

@svenevs: that's the output produced by numpy ufunc and f2py docstring generators. The notation means positional-only arguments --- at the time I think Python had not converged on a common convention on these. I'm not immediately sure if this has been changed in numpy/f2py by now.

Supplying the signature on the first line of the docstring is necessary for extension modules (eg written in C), since Python usually cannot inspect the signatures of the functions provided by them.

So, it would be useful to not break this if there's no pressing need to.

tk0miya added a commit to tk0miya/sphinx that referenced this issue May 12, 2019
tk0miya added a commit to tk0miya/sphinx that referenced this issue May 12, 2019
tk0miya added a commit to tk0miya/sphinx that referenced this issue May 12, 2019
tk0miya added a commit that referenced this issue May 13, 2019
…mplex_typehints2

Fix #6311: autosummary: autosummary table gets confused by complex type hints
@tk0miya
Copy link
Member

tk0miya commented May 13, 2019

Fixed by #6361.
Thank you for reporting.

@tk0miya tk0miya closed this as completed May 13, 2019
@svenevs
Copy link
Contributor Author

svenevs commented May 13, 2019

Yay! Thank you very much for the help and information, I learned about some interesting new features 🙂

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants