Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion examples/example_pkg-stubs/_basic.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,19 @@ def func_empty(a1: Any, a2: Any, a3: Any) -> None: ...
def func_contains(
self,
a1: list[float],
a2: dict[str, Union[int, str]],
a2: sequence[int] | float,
a3: Sequence[int | float],
a4: frozenset[bytes],
) -> tuple[tuple[int, ...], list[int]]: ...
def func_contains_dict(
self,
a1: dict[["str", "int | str"]],
a2: dict[str, Union[int, str]],
a3: mapping[["int", "str"]],
) -> dict[["int | str", "float"]]: ...
def func_literals(
self, a1: Literal["A", "B", "C"], a2: Literal[0, "index", 1, "columns", None]
) -> None: ...
def func_literals(
a1: Literal[1, 3, "foo"], a2: Literal["uno", 2, "drei", "four"] = ...
) -> None: ...
Expand Down
25 changes: 24 additions & 1 deletion examples/example_pkg/_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def func_contains(self, a1, a2, a3, a4):
Parameters
----------
a1 : list[float]
a2 : dict[str, Union[int, str]]
a2 : sequence of int or float
a3 : Sequence[int | float]
a4 : frozenset[bytes]

Expand All @@ -42,6 +42,29 @@ def func_contains(self, a1, a2, a3, a4):
r2 : list of int
"""

def func_contains_dict(self, a1, a2, a3):
"""Dummy.

Parameters
----------
a1 : dict of {str : int or str}
a2 : dict[str, Union[int, str]]
a3 : mapping of {int : str}

Returns
-------
r1 : dict of {int or str : float}
"""

def func_literals(self, a1, a2):
"""Dummy.

Parameters
----------
a1 : {"A", "B", "C"}
a2 : {0 or "index", 1 or "columns", None}, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pandas's type syntax seems a bit dubious. I guess this is equivalent to

Suggested change
a2 : {0 or "index", 1 or "columns", None}, default None
a2 : {0, "index", 1, "columns", None}, default None

and the alternating or is for grouping of equivalent values?

This might be a case I'd leave a third party to configure itself and not support it directly in docstub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they do use the or to indicate equivalent meaning literals, the comma to indicate different meaning literals. I have never used the or in literals though

"""


def func_literals(a1, a2="uno"):
"""Dummy
Expand Down
9 changes: 6 additions & 3 deletions src/docstub/doctype.lark
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
?start : doctype

doctype : type_or ("," optional)? ("," extra_info)?
doctype : (literals | type_or) ("," optional)? ("," extra_info)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, I did some changes to literals to make sure there can be no confusion between
dict subtypes or literals (colons being inside the curly brackets being the only indicator
seemed like a bad idea). I think this is also a closer match to numpydoc, as from how I understand
the description, {} for literals should only be used when only a handful of options are allowed
and therefore is incompatible with type information of any kind.

Restricting literals to the top-level is probably sensible? Though, currently it's nice that something like

dict[{"a", "b"}, int] -> dict[Literals["a", "b"], int]

work. Do you find that readable?

Though,

dict of {{"a", "b"}: int} -> dict[Literals["a", "b"], int]

working is something. 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 5a28828.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had never seen nor considered that option, but now thinking about it there are a couple places I could use it. If you use it or feel strongly about it maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed, and only if they are present can then curly brackets indicate two subtypes with colon. My guess is dict and mapping alone will cover 90% of the cases, maybe mutablemapping could also be there.

Plus a way to extend those names for both dict and array (to allow tensor for example in projects that use it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed

I think it might be more confusing if we restricted who can use the mapping of {KT: VT} syntax? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to keep literals as top level option only


type_or : type (("or" | "|") type)*

literals : "{" literal (("," | "or") literal)* "}"

?type : qualname
| "{" literal ("," literal)* "}" -> literals
| container_of
| shape_n_dtype

Expand All @@ -23,7 +24,9 @@ contains: "[" type_or ("," type_or)* "]"


// Container-of
container_of : NAME "of" type_or
container_of : NAME "of" ( type_or | dict_subtypes )

dict_subtypes : "{" type_or ":" type_or "}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you made me release that we can streamline this and get rid of dict_subtypes and even the existing container_of!

contains: "[" type_or ("," type_or)* "]"
        | "[" type_or "," PY_ELLIPSES "]"
        | "of" type
        | "of" "(" type_or ("," type_or)* ")"
        | "of" "{" type_or ":" type_or "}"

That setup also makes it so that one has to enclose in (...) to allow multiple types inside the container. That get's rid of ambiguity with the top-level "or".

(BTW amazing that GitHub highlights Lark syntax!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 3908f3f.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great, I'll also open an issue or PR to numpydoc itself with these at some point. I have never known how "list of int or float" is supposed to be interpreted (list of int) or float vs list of (int or float)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively I'd say (list of int) or float. I don't think numpydoc worries about those yet and maybe they don't need to.

Part of the aim behind docstub is also to create some kind of standard, with the understanding that "hey if you want something more custom you need to configure it yourself" .

I don't remember who but someone from NumPyDoc told me at some point they'd be happy to go with whatever recommendation docstub settles on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, dict of {str : int} parses everything but I it doesn't take into account that left of the colon are key types right of the colon value types. I have no idea if this should happen at a grammar level, python processing or both.

It doesn't have to because Python's type annotation for dicts dict[key_type, value_type] only make the distinction whether a type is used for key or value by the order they appear. So as the order in {key_type : value_type} is the same we don't have to do anything.



// Array-like form with dtype or shape information
Expand Down