Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional arguments false positive #30

Closed
ghjklw opened this issue Jun 23, 2023 · 20 comments · Fixed by #58
Closed

Optional arguments false positive #30

ghjklw opened this issue Jun 23, 2023 · 20 comments · Fixed by #58

Comments

@ghjklw
Copy link

ghjklw commented Jun 23, 2023

Both of these examples trigger an error DOC105: Function test: Argument names match, but type hints do not match:

def test(var1: Optional[str] = None) -> str:
    """Test function.

    Args:
        var1 (str, optional): var1


    Returns:
        str
    """
def test(var1: str | None = None) -> str:
    """Test function.

    Args:
        var1 (str, optional): var1


    Returns:
        str
    """

Changing the var1 type to str | None solves it, but this is a bit redundant and, as far as I know, inconsistent with how most other tools work. You can see for example the function module_level_function from Sphinx napoleon documentation.

@jsh9
Copy link
Owner

jsh9 commented Jun 24, 2023

Hi @ghjklw,

I understand your point. However, I am a bit concerned that this could become a slippery slope.

For example, if pydoclint allows str, optional to be equivalent to Optional[str], shall it also allow :obj:`str`, optional (what the Sphinx doc uses)? And what about other variations, such as str or None, str or `None` , str or ``None`` ?

It is possible that, with lots of bespoke logic, pydoclint can support all these variations, but I'm also a bit concerned that parsing these variations will make pydoclint a lot slower.

@jsh9
Copy link
Owner

jsh9 commented Jun 24, 2023

Another example to show that things could become intractable is a type hint like this:

Optional[Union[int, float, Optional[bool], List[str]]]

To "translate" this to pseudo natural language, it would have to be:

"int or float or (bool, optional) or list of str, optional"

😅

@real-yfprojects
Copy link
Contributor

Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Which could be written in pseudo natural language like this:

int, float, bool or list of str, optional

While I think pydoclint should support the syntaxes for optional arguments as defined in the official specifications, I don't think it should bother to support things like list of str. However, one could discuss supporting natural language lists like int, float or bool in another issue.

@jsh9
Copy link
Owner

jsh9 commented Jun 25, 2023

First of all, I checked the official Google Python style guide, and I didn't see Google recommending/enforcing any guidance for writing type hints in docstrings.

The example shown in the Sphinx documentation is Sphinx's style ("str, optional"). It's a fine style, but not quite practical for a linter.

There is only one way to specify type before Python 3.10, and a new way since Python 3.10. This standardization simplifies people's lives. At least, I personally only need to remember that "docstring's type hint should match the signature's type hint".

On the contrary, allowing natural languages can introduce ambiguities, variations, and people forgetting what the rules are.

Let me give you another example:

int | float | bool | str

Shall we allow int, float, bool, or str or int, float, bool or str or both? The former contains the Oxford comma and the latter does not.

@ghjklw
Copy link
Author

ghjklw commented Jun 27, 2023

Hi!

I now realise my initial comment was a bit misleading! I actually didn't understand str, optional as pseudo natural language nor as an equivalent to Optional[str] or str | None. I understood it as a special keyword to indicate that a keyword argument is not required / has a default value.

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:

If it is not necessary to specify a keyword argument, use optional:

x : int, optional

Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).

or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:

copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested

I couldn't however find a good example to determine whether the type in the docstring should be var1 (str, optional) or var1 (str | None, optional). I personally think the second one makes more sense (and works fine as-is with Pydoclint) and is less ambiguous (it's not obvious with the first option that None is a valid value for var1)

@ghjklw
Copy link
Author

ghjklw commented Jun 27, 2023

Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Putting aside the question of the natural language representation of types (which I think, if desirable, should be tracked as an independent issue), this raises another interesting point:

from typing import Optional


def example1(var: str | None) -> None:
    """Test function 1.

    Args:
        var (str | None): Variable description.
    """


def example2(var: str | None) -> None:
    """Test function 2.

    Args:
        var (str|None): Variable description.
    """


def example3(var: Optional[str]) -> None:
    """Test function 3.

    Args:
        var (str | None): Variable description.
    """

Raises the following validation errors:

    12: DOC105: Function `example2`: Argument names match, but type hints do not match 
    20: DOC105: Function `example3`: Argument names match, but type hints do not match 

I guess that's because types are compared as strings, but it is also possible to do:

>>> (str | None) == (str|None)
True
>>> (str | None) == Optional[str]
True
>>> Optional[Union[int, float, Optional[bool], List[str]]] == (int | float | bool | List[str] | None)
True

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

@real-yfprojects
Copy link
Contributor

real-yfprojects commented Jun 27, 2023

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

It definitely be a desirable feature but I imagine it could be quite difficult to implement.

@jsh9
Copy link
Owner

jsh9 commented Jun 28, 2023

Interesting!

I didn't know that we can directly compare type hints (str | None == str|None). TIL.

@ghjklw you are correct that the comparison of types in pydoclint is currently done via string comparison.

Here's a code snippet:

def __eq__(self, o: 'Arg') -> bool:
if not isinstance(o, Arg):
return False
return self.name == o.name and self._eq(self.typeHint, o.typeHint)

pydoclint only does static syntax analysis (more in what I mentioned in this comment: #35 (comment)). Doing "dynamic" code analysis on the full source code is most likely why darglint is slow.

That said, dynamically parsing type hint strings may not be that slow.

This is something I can do right now:

exec('left = str | None')
exec('right = str|None')
assert left == right

However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

@real-yfprojects
Copy link
Contributor

For example, the argument type is a custom class that needs to be quoted.

What do you mean by that? No type hints have to be quoted and why would that be a problem?

@jsh9
Copy link
Owner

jsh9 commented Jun 29, 2023

No type hints have to be quoted and why would that be a problem?

See this example:

def __lt__(self, other: 'Arg') -> bool:

@real-yfprojects
Copy link
Contributor

from __future__ import annotations doesn't fix that?

@ghjklw
Copy link
Author

ghjklw commented Jun 29, 2023

exec('left = str | None')
exec('right = str|None')
assert left == right

However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

Putting aside the security risk of using exec (probably acceptable in this case, but I might be wrong), I think there's possibly an even greater challenge than quoted type hints: the types could come from a large number of places: standard libraries, imports, wildcard imports, objects defined in the same script, libraries specific to the environment you're using (e.g. Spark when working in an environment like Databricks or Azure Synapse), dyamically created types or typevars... The more I think about it, the more I'm afraid this again comes back to the ast vs inspect approach...

@jsh9
Copy link
Owner

jsh9 commented Jun 29, 2023

Oh wow, I didn't know about from __future__ import annotations. TIL.

Naturally, there will be people like me who don't know about from __future__ import annotations and thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

I agree with @ghjklw 's comment above; there may be too many edge cases that we can't possibly think of.

That said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

@real-yfprojects
Copy link
Contributor

real-yfprojects commented Jun 29, 2023

Thinking about it I only see the following cases where it might make sense to have a docstring type that doesn't match the type hint.

  • Type Aliases (In case of a library exposing an API, they make sense to be used as a type hint but might not make sense to be use in docstrings read by the users of the API)
  • Type Variables, ParamSpec, TypeVarTuple (I am very unsure about the best way to document those in the context of an API)
  • White spaces
  • Optional[] (However you could use from __futures__ import annotations instead)
  • Union[] (same as above)
  • Dict[], Set[], List[], ... (same as above)
  • Literal (to improve readability)
  • Annotated
  • TypeGuard (basically an alias for bool)
  • Imported types (If I import a type with from foo import T, I would want to document it as foo.T)

@real-yfprojects
Copy link
Contributor

thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

The type system will use typing.ForwardRef to represent those. For string to string comparisons you can simply strip any quotes (outside Literal[]).

hat said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

I don't know of a type hint that doesn't separate two types by either , or |. So that should be possible.

@ghjklw
Copy link
Author

ghjklw commented Jun 29, 2023

A bit hacky, but can probably be improved and certainly more robust/generic than just ignoring spaces:

ast.dump(ast.parse("str|None")) == ast.dump(ast.parse("str | None"))

@jsh9
Copy link
Owner

jsh9 commented Jun 30, 2023

I already have a method to remove quotes from the type hint:

@classmethod
def _stripQuotes(cls, string: str) -> str:
return string.replace('"', '').replace("'", '')

And I think using ast.dump(ast.parse(...)) is a good idea.

With _stripQuotes() and ast.dump(ast.parse(...)), we can at least tackle the following cases:

  • str | int | None vs str|int|None
  • 'MyClass' | None vs 'MyClass'|None
  • Union[int, bool] vs Union[int,bool]
  • ...

But ast.dump(ast.parse('str | None')) == ast.dump(ast.parse('Optional[str]')) is still False, simply because ast doesn't actually know what | does and what Optional means -- it's a static syntax parser and that's why it's fast.

@jsh9
Copy link
Owner

jsh9 commented Jun 30, 2023

Hi @ghjklw , I forgot to address your previous point above

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:

If it is not necessary to specify a keyword argument, use optional:

x : int, optional

Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).

or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:

copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested>

I used to write , default=True after bool, but darglint reported it as a style violation ("type hints don't match"), so I trained myself to put "default = True` in the argument description.

And now in pydoclint, I'm actually "inheriting" that behavior from darglint.

I think there are 2 new features that I can potentially add:

  1. For numpy-style docstrings, parse the string "bool, default: True" to ignore , default: True, so that people don't have to move it away from the type hint
  2. Parse the default value from the docstring and compare that with the default value in the signature

No. 2 is a slippery slope, because rule-based natural language parsers are never easy to implement. Too many edge cases. I can already think of some:

  • arg1 : int, default is 1
  • arg1 : int, default value: 1
  • arg1 : int (default = 1)
  • arg1 : bool = True

@jsh9
Copy link
Owner

jsh9 commented Jul 20, 2023

Update: I haven't got a chance to get to it yet. But this issue is still on my mind.

@jsh9
Copy link
Owner

jsh9 commented Aug 19, 2023

Hi @ghjklw , I found that this PR (#58) actually fixed the issue of comparing str|int|bool with str | int | bool (only spaces are different).

However, I still think that to consider str | int and Union[str, int] equal, the implementation may be too cumbersome and fragile.

Therefore, I'll label str | int vs Union[str, int] as "may tackle in the future", and close this issue for now.

But if you have any additional comments or suggestions, please feel free to add comments below, or re-open this issue.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants