Optional arguments false positive #30

ghjklw · 2023-06-23T10:20:55Z

Both of these examples trigger an error DOC105: Function test: Argument names match, but type hints do not match:

def test(var1: Optional[str] = None) -> str:
    """Test function.

    Args:
        var1 (str, optional): var1


    Returns:
        str
    """

def test(var1: str | None = None) -> str:
    """Test function.

    Args:
        var1 (str, optional): var1


    Returns:
        str
    """

Changing the var1 type to str | None solves it, but this is a bit redundant and, as far as I know, inconsistent with how most other tools work. You can see for example the function module_level_function from Sphinx napoleon documentation.

The text was updated successfully, but these errors were encountered:

jsh9 · 2023-06-24T09:49:26Z

Hi @ghjklw,

I understand your point. However, I am a bit concerned that this could become a slippery slope.

For example, if pydoclint allows str, optional to be equivalent to Optional[str], shall it also allow :obj:`str`, optional (what the Sphinx doc uses)? And what about other variations, such as str or None, str or `None` , str or ``None`` ?

It is possible that, with lots of bespoke logic, pydoclint can support all these variations, but I'm also a bit concerned that parsing these variations will make pydoclint a lot slower.

jsh9 · 2023-06-24T09:59:57Z

Another example to show that things could become intractable is a type hint like this:

Optional[Union[int, float, Optional[bool], List[str]]]

To "translate" this to pseudo natural language, it would have to be:

"int or float or (bool, optional) or list of str, optional"

😅

real-yfprojects · 2023-06-24T13:37:19Z

Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Which could be written in pseudo natural language like this:

int, float, bool or list of str, optional

While I think pydoclint should support the syntaxes for optional arguments as defined in the official specifications, I don't think it should bother to support things like list of str. However, one could discuss supporting natural language lists like int, float or bool in another issue.

jsh9 · 2023-06-25T00:29:55Z

First of all, I checked the official Google Python style guide, and I didn't see Google recommending/enforcing any guidance for writing type hints in docstrings.

The example shown in the Sphinx documentation is Sphinx's style ("str, optional"). It's a fine style, but not quite practical for a linter.

There is only one way to specify type before Python 3.10, and a new way since Python 3.10. This standardization simplifies people's lives. At least, I personally only need to remember that "docstring's type hint should match the signature's type hint".

On the contrary, allowing natural languages can introduce ambiguities, variations, and people forgetting what the rules are.

Let me give you another example:

int | float | bool | str

Shall we allow int, float, bool, or str or int, float, bool or str or both? The former contains the Oxford comma and the latter does not.

ghjklw · 2023-06-27T10:33:26Z

Hi!

I now realise my initial comment was a bit misleading! I actually didn't understand str, optional as pseudo natural language nor as an equivalent to Optional[str] or str | None. I understood it as a special keyword to indicate that a keyword argument is not required / has a default value.

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:

If it is not necessary to specify a keyword argument, use optional:
x : int, optional
Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:
Description of parameter `x` (the default is -1, which implies summation
over all axes).
or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:
copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested

I couldn't however find a good example to determine whether the type in the docstring should be var1 (str, optional) or var1 (str | None, optional). I personally think the second one makes more sense (and works fine as-is with Pydoclint) and is less ambiguous (it's not obvious with the first option that None is a valid value for var1)

ghjklw · 2023-06-27T11:02:20Z

Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Putting aside the question of the natural language representation of types (which I think, if desirable, should be tracked as an independent issue), this raises another interesting point:

from typing import Optional


def example1(var: str | None) -> None:
    """Test function 1.

    Args:
        var (str | None): Variable description.
    """


def example2(var: str | None) -> None:
    """Test function 2.

    Args:
        var (str|None): Variable description.
    """


def example3(var: Optional[str]) -> None:
    """Test function 3.

    Args:
        var (str | None): Variable description.
    """

Raises the following validation errors:

    12: DOC105: Function `example2`: Argument names match, but type hints do not match 
    20: DOC105: Function `example3`: Argument names match, but type hints do not match

I guess that's because types are compared as strings, but it is also possible to do:

>>> (str | None) == (str|None)
True
>>> (str | None) == Optional[str]
True
>>> Optional[Union[int, float, Optional[bool], List[str]]] == (int | float | bool | List[str] | None)
True

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

real-yfprojects · 2023-06-27T15:17:35Z

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

It definitely be a desirable feature but I imagine it could be quite difficult to implement.

jsh9 · 2023-06-28T09:32:32Z

Interesting!

I didn't know that we can directly compare type hints (str | None == str|None). TIL.

@ghjklw you are correct that the comparison of types in pydoclint is currently done via string comparison.

Here's a code snippet:

pydoclint/pydoclint/utils/arg.py

Lines 31 to 35 in 1a3596f

    
           def __eq__(self, o: 'Arg') -> bool: 
        
               if not isinstance(o, Arg): 
        
                   return False 
        
               return self.name == o.name and self._eq(self.typeHint, o.typeHint)

pydoclint only does static syntax analysis (more in what I mentioned in this comment: #35 (comment)). Doing "dynamic" code analysis on the full source code is most likely why darglint is slow.

That said, dynamically parsing type hint strings may not be that slow.

This is something I can do right now:

exec('left = str | None')
exec('right = str|None')
assert left == right

However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

real-yfprojects · 2023-06-28T15:41:08Z

For example, the argument type is a custom class that needs to be quoted.

What do you mean by that? No type hints have to be quoted and why would that be a problem?

jsh9 · 2023-06-29T06:03:40Z

No type hints have to be quoted and why would that be a problem?

See this example:

pydoclint/pydoclint/utils/arg.py

Line 37 in 1a3596f

def __lt__(self, other: 'Arg') -> bool:

real-yfprojects · 2023-06-29T07:03:50Z

from __future__ import annotations doesn't fix that?

ghjklw · 2023-06-29T07:04:17Z

exec('left = str | None')
exec('right = str|None')
assert left == right
However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

Putting aside the security risk of using exec (probably acceptable in this case, but I might be wrong), I think there's possibly an even greater challenge than quoted type hints: the types could come from a large number of places: standard libraries, imports, wildcard imports, objects defined in the same script, libraries specific to the environment you're using (e.g. Spark when working in an environment like Databricks or Azure Synapse), dyamically created types or typevars... The more I think about it, the more I'm afraid this again comes back to the ast vs inspect approach...

jsh9 · 2023-06-29T07:18:37Z

Oh wow, I didn't know about from __future__ import annotations. TIL.

Naturally, there will be people like me who don't know about from __future__ import annotations and thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

I agree with @ghjklw 's comment above; there may be too many edge cases that we can't possibly think of.

That said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

real-yfprojects · 2023-06-29T07:24:58Z

Thinking about it I only see the following cases where it might make sense to have a docstring type that doesn't match the type hint.

Type Aliases (In case of a library exposing an API, they make sense to be used as a type hint but might not make sense to be use in docstrings read by the users of the API)
Type Variables, ParamSpec, TypeVarTuple (I am very unsure about the best way to document those in the context of an API)
White spaces
Optional[] (However you could use from __futures__ import annotations instead)
Union[] (same as above)
Dict[], Set[], List[], ... (same as above)
Literal (to improve readability)
Annotated
TypeGuard (basically an alias for bool)
Imported types (If I import a type with from foo import T, I would want to document it as foo.T)

real-yfprojects · 2023-06-29T07:28:00Z

thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

The type system will use typing.ForwardRef to represent those. For string to string comparisons you can simply strip any quotes (outside Literal[]).

hat said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

I don't know of a type hint that doesn't separate two types by either , or |. So that should be possible.

ghjklw · 2023-06-29T07:43:52Z

A bit hacky, but can probably be improved and certainly more robust/generic than just ignoring spaces:

ast.dump(ast.parse("str|None")) == ast.dump(ast.parse("str | None"))

jsh9 · 2023-06-30T06:10:32Z

I already have a method to remove quotes from the type hint:

pydoclint/pydoclint/utils/arg.py

Lines 96 to 98 in 1a3596f

    
           @classmethod 
        
           def _stripQuotes(cls, string: str) -> str: 
        
               return string.replace('"', '').replace("'", '')

And I think using ast.dump(ast.parse(...)) is a good idea.

With _stripQuotes() and ast.dump(ast.parse(...)), we can at least tackle the following cases:

str | int | None vs str|int|None
'MyClass' | None vs 'MyClass'|None
Union[int, bool] vs Union[int,bool]
...

But ast.dump(ast.parse('str | None')) == ast.dump(ast.parse('Optional[str]')) is still False, simply because ast doesn't actually know what | does and what Optional means -- it's a static syntax parser and that's why it's fast.

jsh9 · 2023-06-30T06:30:45Z

Hi @ghjklw , I forgot to address your previous point above

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:
If it is not necessary to specify a keyword argument, use optional:
x : int, optional
Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:
Description of parameter `x` (the default is -1, which implies summation
over all axes).
or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:
copy : bool, default True
copy : bool, default=True
copy : bool, default: True
In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested>

I used to write , default=True after bool, but darglint reported it as a style violation ("type hints don't match"), so I trained myself to put "default = True` in the argument description.

And now in pydoclint, I'm actually "inheriting" that behavior from darglint.

I think there are 2 new features that I can potentially add:

For numpy-style docstrings, parse the string "bool, default: True" to ignore , default: True, so that people don't have to move it away from the type hint
Parse the default value from the docstring and compare that with the default value in the signature

No. 2 is a slippery slope, because rule-based natural language parsers are never easy to implement. Too many edge cases. I can already think of some:

arg1 : int, default is 1
arg1 : int, default value: 1
arg1 : int (default = 1)
arg1 : bool = True

jsh9 · 2023-07-20T07:58:47Z

Update: I haven't got a chance to get to it yet. But this issue is still on my mind.

jsh9 · 2023-08-19T07:21:30Z

Hi @ghjklw , I found that this PR (#58) actually fixed the issue of comparing str|int|bool with str | int | bool (only spaces are different).

However, I still think that to consider str | int and Union[str, int] equal, the implementation may be too cumbersome and fragile.

Therefore, I'll label str | int vs Union[str, int] as "may tackle in the future", and close this issue for now.

But if you have any additional comments or suggestions, please feel free to add comments below, or re-open this issue.

Thanks!

jsh9 added the may-tackle-in-the-future label Aug 19, 2023

jsh9 mentioned this issue Aug 12, 2023

Improve type hint comparison algorithm #58

Merged

jsh9 closed this as completed Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional arguments false positive #30

Optional arguments false positive #30

ghjklw commented Jun 23, 2023

jsh9 commented Jun 24, 2023

jsh9 commented Jun 24, 2023

real-yfprojects commented Jun 24, 2023

jsh9 commented Jun 25, 2023

ghjklw commented Jun 27, 2023

ghjklw commented Jun 27, 2023

real-yfprojects commented Jun 27, 2023 •

edited

Loading

jsh9 commented Jun 28, 2023

real-yfprojects commented Jun 28, 2023

jsh9 commented Jun 29, 2023

real-yfprojects commented Jun 29, 2023

ghjklw commented Jun 29, 2023

jsh9 commented Jun 29, 2023

real-yfprojects commented Jun 29, 2023 •

edited

Loading

real-yfprojects commented Jun 29, 2023

ghjklw commented Jun 29, 2023

jsh9 commented Jun 30, 2023 •

edited

Loading

jsh9 commented Jun 30, 2023

jsh9 commented Jul 20, 2023

jsh9 commented Aug 19, 2023

Optional arguments false positive #30

Optional arguments false positive #30

Comments

ghjklw commented Jun 23, 2023

jsh9 commented Jun 24, 2023

jsh9 commented Jun 24, 2023

real-yfprojects commented Jun 24, 2023

jsh9 commented Jun 25, 2023

ghjklw commented Jun 27, 2023

ghjklw commented Jun 27, 2023

real-yfprojects commented Jun 27, 2023 • edited Loading

jsh9 commented Jun 28, 2023

real-yfprojects commented Jun 28, 2023

jsh9 commented Jun 29, 2023

real-yfprojects commented Jun 29, 2023

ghjklw commented Jun 29, 2023

jsh9 commented Jun 29, 2023

real-yfprojects commented Jun 29, 2023 • edited Loading

real-yfprojects commented Jun 29, 2023

ghjklw commented Jun 29, 2023

jsh9 commented Jun 30, 2023 • edited Loading

jsh9 commented Jun 30, 2023

jsh9 commented Jul 20, 2023

jsh9 commented Aug 19, 2023

real-yfprojects commented Jun 27, 2023 •

edited

Loading

real-yfprojects commented Jun 29, 2023 •

edited

Loading

jsh9 commented Jun 30, 2023 •

edited

Loading