New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple dependent types #3062

Open
JukkaL opened this Issue Mar 27, 2017 · 45 comments

Comments

Projects
None yet
@JukkaL
Copy link
Collaborator

JukkaL commented Mar 27, 2017

[This is based on a discussion between me and @gvanrossum last week. I'm writing this down so that we won't forget about this.]

Several stdlib functions have signatures where the return type depends on the value of an argument. Examples:

  • open returns an IO[str] or IO[bytes] depending on whether the mode argument has 'b' as a substring.
  • subprocess.check_output returns a str or bytes depending on the value of the boolean universal_newlines argument.

A simple way to support these would be to introduce a few additional types that work with str and bool literals:

  • A string pattern type that can describe the contents of a string literal, perhaps using a regular expression.
  • Types for False and True.

For example, if we had FalseType and TrueType, we could write a function whose return value depends on the value of a boolean argument:

@overload
def f(x: FalseType) -> bytes: ...
@overload
def f(x: TrueType) -> str: ...

def f(x: Any) -> Any:
    if x:
        return 'x'
    else:
        return b'x'

reveal_type(f(False))  # bytes
reveal_type(f(True)  # str

Not sure about this one:

...
x = bool('')
reveal_type(f(x))  # Union[str, bytes] since we don't know the boolean value statically?

We could also have a generic class where a generic type argument depends on the value of an argument to __init__. Not sure how we should represent these -- here is one potential approach:

class C(Generic[AnyStr]):
    @overload
    @returntype('C[str]')
    def __init__(self, x: TrueType) -> None: ...
    @overload
    @returntype('C[bytes]')
    def __init__(self, x: FalseType) -> None: ...
    ...

reveal_type(C(True))  # C[str]

Alternatively, we could use __new__, but this would be problematic if there is no corresponding method defined at runtime.

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 27, 2017

I think we can have LiteralType as in LiteralType['w'], LiteralType[True] etc. I think it will be better than special-casing True/False/string. Maybe we will get simple constant propagation "falling out" of type aliases (which might be helpful for e.g. NamedTuple)

I think regex will be problematic. How do you determine whether they overlap? String literals seems to suffice.

(And I don't think it counts as dependent type since it has little to do with runtime).

@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Mar 27, 2017

The mode string syntax for open() is more complicated than just 'r', 'w', etc. Hence the regex.

I find LiteralType[True] pretty verbose; also it's more general than we want to implement (we really don't want this to be a big project, just enough to handle the cases @JukkaL mentioned).

If this doesn't count as a dependent type maybe you can explain your definition of dependent types? And what would you call this?

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 27, 2017

I'm not sure going into definitions is helpful, but the difference is (I guess) the reason you call it "Simple" dependent types - there nothing like def foo(x: int, y: bar[x]) -> str if x == 5 else bool. There's only information immediately available at compile time. Instead of defining True to be a literal of type bool, we define it as being a literal of type TrueType which is a subtype of (or coercible to) bool. I don't see the difference between this and the kind of types we already have. Just because the type has precisely single value you have precise information about its runtime value, but this is already the case with None, Void and Tuple[()].

However, I might be wrong. I am also too easily dragged into issues of terminology.

@sixolet

This comment has been minimized.

Copy link
Collaborator

sixolet commented Mar 27, 2017

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 27, 2017

I suspect special cases will continue to appear and build, and one day we will have to do this big project - which I have the feeling is not big right now, but will be big then and not easily backwards-compatible since special-casing is too easy to over fit for specific applications.

How complex is the mode for open? Correct me if I'm wrong, but it is not a language of infinite size; can't it be handled simply by many @overloads in the stub file?

@sixolet

This comment has been minimized.

Copy link
Collaborator

sixolet commented Mar 27, 2017

Back when there were a couple interns working on this feature for a week, we had a few different use cases that they wanted to support:

  • open (a finite string-based language)
  • pow (wanted to dispatch on positive vs. negative exponents)
  • subprocess.check_output

What else? Is pow an odd one out here? Is open? Or are these complicated ones rare, and in the common case you just want to dispatch on a constant?

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 27, 2017

Maybe also Tuple * 5... Maybe pow can be handled by having positive and negative integer "subset types".

Any finite language is dispatch on a constant.

@JelleZijlstra

This comment has been minimized.

Copy link
Collaborator

JelleZijlstra commented Mar 27, 2017

As of 3.6, the subprocess.check_output case is actually more complicated: it returns str if universal_newlines is True OR encoding is non-None, and bytes otherwise. I suppose existing overloads can take care of the encoding part: you'd have overloads for encoding is non-None returning str, universal_newlines=True returning str, and universal_newlines=False returning bytes.

@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Mar 27, 2017

But the main thing we really want is determining the correct return type when the argument is a compile-time literal. You can't overload on True or False because there's no way to spell "the subtype of bool that only contains True" (etc.). If it can't be determined what the value is at compile time it's okay to revert to the more general overload. Similar for open() and everything that takes a similar mode string (it's bred quite a following).

@pkch

This comment has been minimized.

Copy link
Contributor

pkch commented Mar 28, 2017

I would prefer this syntax:

from pathlib import Path

# type function
def open_return_type(kwargs: Dict[str, object]) -> Type[object]:
    assert isinstance(kwargs['mode'], str)
    if 'b' in kwargs['mode']:
        return IO[bytes]
    else:
        return IO[str]

def open(file: Union[str, bytes, int, Path], mode: str = 'r', buffering: int = -1,
         encoding: str = None, errors: str = None, newline: str = None,
         closefd: bool = ...) -> open_return_type: ...

It's flexible, and can be implemented like this:

  • TypeAnalyser.visit_unbound_type needs to allow "type functions" in function annotation
  • TypeChecker.check_return_statement, if it sees a type function, needs to call it, passing it the dict of all the arguments for which it knows literal values
  • if the type function returns a value, the type checker can use it as the return type
  • if the type function raises KeyError, the type checker should assume it relies on an argument that it doesn't know the value of; since it cannot infer the return type, it should do something sensible (e.g., issue a warning and use the return type based on the type context)
@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 28, 2017

if it sees a type function, needs to call it,

How? Running arbitrary user code is certainly out of question. What do you suggest? To me this sounds like a turing-complete metaprogramming language and full-blown dependent type system.

I do like is the idea of avoiding the scoping problem (how to express def f(x, y:g(x)): ... using another function. But it will only be needed for fully-dependent type system, not for singleton-types.

@pkch

This comment has been minimized.

Copy link
Contributor

pkch commented Mar 28, 2017

@elazarg Ah yes, I did mean to run arbitrary user code. If the type checker is not allowed to do that for security or other reasons, my suggestion won't work.

@JelleZijlstra

This comment has been minimized.

Copy link
Collaborator

JelleZijlstra commented Mar 28, 2017

It can't really do that in general, since there's nothing requiring type checkers to run in the same Python version as they're checking for, or even in Python at all.

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 28, 2017

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 28, 2017

Regarding syntax, perhaps we can allow something like this:

@overload
def f(x: False) -> bytes: ...
@overload
def f(x: True) -> str: ...

It's not too pretty, but it certainly isn't verbose... Strings are problematic, but we can use raw strings:

@overload
def open(file: str, mode: r'w') -> IO[str]: ...
@overload
def open(file: str, mode: r'wb') -> IO[bytes]: ...

It is only intended for stub files, so people will guess the meaning even without dedicated name for the type. And it is consistent with None being used for the type of None.

(TBH I'm not really sure about this proposal)

@elazarg

This comment has been minimized.

Copy link
Contributor

elazarg commented Mar 28, 2017

If we only care about overload dispatch, we may pass it as an argument to overload:

@overload(mode='w')
def open(file: str, mode: str) -> IO[str]: ...
@overload(mode='wb')
def open(file: str, mode: str) -> IO[bytes]: ...

Thus avoiding the question of how to spell the name of the type.

@pkch

This comment has been minimized.

Copy link
Contributor

pkch commented Mar 29, 2017

@elazarg

I think regex will be problematic. How do you determine whether they overlap? String literals seems to suffice.

I don't think we need to perfectly determine overlaps? If someone does a Union, it's ok to just include both regexes, without simplifying. If we need a meet, just return UninhabitedType - it's already quite common that meet is not precise, as long as it's a subtype of both operands, it's fine.

@gvanrossum

You can't overload on True or False because there's no way to spell "the subtype of bool that only contains True" (etc.).

But wouldn't @JukkaL TrueType or @elazarg LiteralType[True] spell that? And with automatic upcasting on second assignment, when mypy sees

x = True
...
x = False

it infers x to be of type TrueType in the scope starting from its first binding up to the next time it's assigned anything; and bool from that second assignment (regardless of what was assigned) till the end. Since TrueType is a subtype of bool, all type checking should still work fine.

@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Mar 29, 2017

That's easy for locals, but what about attributes or class vars? Any time you call anything it may change one of those:

class C:
    def f(self) -> None:
        self.foo = True
        self.bar()
        # Here we don't know if self.foo is still True
    def bar(self): ...  # A subclass may override
@pkch

This comment has been minimized.

Copy link
Contributor

pkch commented Mar 29, 2017

I thought this whole discussion was limited to literals or simple identifiers of immutable basic types. [Edit: now I understand what you mean; what if an attribute gets this type, we still have to deal with it. Auto-upcasting on a function call or auto-upcasting immediately would seem to solve it, though.]

I guess we can make it work for attributes and global variables, by automatically upcasting attributes and global variables on any function calls rather than just on the second assignment. That seems ok, since for example, there's no hope to predict the return type of pow when it's called like this:

class C:
    def f(self) -> None:
        self.foo = 5
        self.bar()
        pow(2, self.foo) # is it int or float?
@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Mar 30, 2017

@pkch pkch referenced this issue Mar 30, 2017

Closed

Dependent Types #366

@pkch

This comment has been minimized.

Copy link
Contributor

pkch commented Apr 17, 2017

TypeScript 2.0 (Oct 2016) added precisely these simple (literal) dependent types. Their stated use cases are enum, dealing with legacy APIs, and as a tag for a Union; all somewhat relevant.

Just thought I'd point this out since they seem to be solving a similar problem to what is being discussed here.

@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Apr 17, 2017

Interesting. We couldn't borrow their syntax because in PEP 484 using a string literal where a type is expected just means a forward reference. Python already has string enums. For union tags PEP 484 uses isinstance() so I think we don't need that either.

I still think that the only use case we don't yet support is overloading on value, with the most complex use cases being pow()/** (since pow(int, int >= 0) -> int but pow(int, int < 0) -> float) and open(str, mode) (which returns IO[bytes] or IO[str] depending on whether mode contains 'b'). Other use cases probably indeed need specific string values, specific integer values, or specific enum values (including bools).

@davidroeca

This comment has been minimized.

Copy link

davidroeca commented Apr 28, 2017

Constant/immutable types as mentioned in #1214 could be treated like literals in the typecheck flow, as long as they were assigned to a literal or another constant.

Cases where reassignment occurs seem to be most challenging:

# sane.py
MODE = 'rb'  # Const[str]

with open(path, MODE) as f:
    # since MODE never gets reassigned, f is binary
    ...
# insane.py
MODE = 'wa'
MODE = MODE.replace('a', 'b')

with open(path, MODE) as f:
    # human understands this is binary, but no simple typecheck
    ...

Also, the flow of a bounded integer type is complex but potentially doable with well-documented functions and immutable variables. Mutability makes these bounds harder to reason about:

# wild.py
from random import randint

# hypothetically assuming randint has input/output bounding rules
my_int = randint(0, 10)  # Bound[int, 0, 10]

# potentially doable but likely more trouble than it's worth
int_two = randint(my_int, 2 * my_int)  # Bound[int, 0, 20]

# rough
int_two /= 10  # Bound[float, 0.0, 2.0]  
@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Jul 18, 2017

We recently added a mypy plugin system which lets mypy infer a precise return type for things like open(...) if the mode argument is not a runtime-computed value. The same approach can be used for some other cases discussed above, such as subprocess.check_output.

Here's a summary of how mypy the plugin hook for open(...) works, for example:

  • Check if the mode argument is not provided. If yes, the return type is TextIO.
  • If the mode argument is not a string literal, fall back to the imprecise type IO[Any].
  • If the mode argument is a string literal, infer TextIO or BinaryIO based on the value of the literal.

Since plugins are already supported and have a pretty low complexity cost we should first see how far we can get with them before we think about more general literal/dependent types.

@silviogutierrez

This comment has been minimized.

Copy link

silviogutierrez commented Oct 25, 2017

Would the plugin system be able to accomodate a use case like this?

@overload
def foo(null: False) -> int:
    ...


@overload
def foo(null: True) -> str:
    ...


def foo(null: Union[True, False]) -> Union[int, str]:
    if null is False:
        return 1
    else:
        return 'asdsd'

# No error
foo(True) + 'a'

# Error, can't add str to an int
foo(False) + 'a'
@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Oct 25, 2017

@brainrape

This comment has been minimized.

Copy link

brainrape commented Nov 17, 2017

Hi! As a long time Python user I appreciate all the hard work. These problems have been researched for decades. I'd like to point out a way this can be exploited, with the following result:

You can write dependently typed Python today.

Idris, a practical programming language with dependent types, also compiles to Python: https://github.com/ziman/idris-py

Here's how a type definition looks, using an existing Python library, tracking the size of matrices in the type, effectively turning IndexErrors into compile time type errors:
https://github.com/ziman/idris-py/blob/master/lib/Python/Lib/Numpy/Matrix.idr#L65
Example usage:
https://github.com/ziman/idris-py/blob/master/examples/np.idr

More examples using requests, beautifulsoup4, threading:
https://github.com/ziman/idris-py/blob/master/examples/example.idr

It seems worthwhile to explore an existing solution that can cover even the hairiest cases that were given up on in mypy, in a rather elegant way. Expressing the type of pow shouldn't be impossible. (Although it conflates functions of different types and should rather be separated. That doesn't seem likely though.)

To complete the cycle and typecheck existing python code (with type annotations), one would need to parse it and output equivalent Idris, which shouldn't be too hard and can be implemented at the AST level. If it compiles, the original should be safe. Properties of Python programs can then be expressed and verified with high precision.

[edit: add numpy matrix example, change example order]

@saulshanabrook

This comment has been minimized.

Copy link

saulshanabrook commented Jul 2, 2018

And here's how a type definition looks for an existing Python library, tracking the size of matrices in the type, effectively turning IndexErrors into compile time type errors:

@brainrape Is it possible to use this compile time type checking from Python? What could this look like?

@brainrape

This comment has been minimized.

Copy link

brainrape commented Jul 3, 2018

@saulshanabrook Here's an example that uses numpy from Idris:
https://github.com/ziman/idris-py/blob/master/examples/np.idr
This will compile to Python (but the commented-out type-incorrect line wouldn't compile, which is not the case with mypy).
As things currently stand, one can use existing Python libraries (given correct type signatures) and target Python VMs. Idris can also rule out more than just runtime exceptions, like infinite loops. The only way left to fail would be to run out of memory.
To use all this "from Python", by which I assume you mean using Python surface syntax, as I pointed out above, one would need to translate the Python source to Idris, which can be done in principle at the syntax level. Incidentally, this would also allow compiling Python code to other Idris backends like native, JS, Java, etc.
For any "real-world" project that needs type safety and has to target Python, mypy is probably still your best bet, although it doesn't (and judging by this thread, won't) offer this level of safety. mypy is a great achievement, a step in the "correct" direction, and fits comfortably with the rest of Python, which can't be said for the above proof-of-concept.

@mehdigmira

This comment has been minimized.

Copy link

mehdigmira commented Oct 19, 2018

Hello,

What's the final answer to this ? Is this still under discussion ?

@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Oct 19, 2018

We have plans to work on this, but no concrete timeline yet.

@mehdigmira

This comment has been minimized.

Copy link

mehdigmira commented Oct 22, 2018

@JukkaL Is there some workaround I can use for now ? you mentioned a plugin. How can i use it in my code ?

@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Oct 30, 2018

@mehdigmira The plugin API is not documented, so you are on your own if you want to go there. There are some examples at https://github.com/python/mypy/tree/master/mypy/plugins.

@mehdigmira

This comment has been minimized.

Copy link

mehdigmira commented Oct 30, 2018

@JukkaL Thank you for the response.
I've peeked into these example.
My understanding is that if i wanted to do that I'd have to write one function-hook for every function in my code that relies on this dependant type logic. Is this true ?

@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Oct 30, 2018

@mehdigmira Yes. The plugin approach is really only practical for library functions with relatively stable interfaces, in my opinion. Otherwise it will be too difficult to maintain.

@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Oct 30, 2018

Here's my current thinking on what's needed to support this in a minimally useful fashion.

Our current idea for the syntax is like this:

  • Type syntax is Literal[x], where x can be True, False, an integer literal or a string literal (not sure about bytes).
  • Literal[x, y] is equivalent to Union[Literal[x], Literal[y]].

Literal types are not inferred for variables unless the lvalue is final:

y: Literal['foo'] = 'foo'  # Without the annotation the type would be 'str'
x: Final = 'foo'  # Type is Literal['foo']
z = 'foo'  # Type of 'z' is 'str'
zz: Final = z  # The type of 'zz' is 'str' here as well

Note that the Final exception only applies to simple types:

a: Final = ['a']  # Inferred type is still List[str]

We'd likely promote a string literal to str in many/most contexts, but I'm not sure what the precise rules should be.

Literal types are compatible with the corresponding non-literal types, but not the other way around:

x = 'foo'
y: Literal['foo'] = x  # Error
z: str = y  # Ok

For overloads we may need a fallback in addition to literal types to avoid false positives with legacy interfaces:

@overload
def f(x: Literal[True]) -> str: ...
@overload
def f(x: Literal[False]) -> bytes: ...
@overload
def f(x: bool) -> Any: ...

y: bool = ...
a = f(y)  # Any inferred
b = f(True)  # str inferred

cc @ilevkivskyi

@srittau

This comment has been minimized.

Copy link
Contributor

srittau commented Oct 30, 2018

I really don't like the enforced Literal[...] syntax. Python type annotations are already annoyingly verbose and often hard to read, compared to, for example, typescript. I understand that strings are problematic, due to forward references, but couldn't at least booleans and integers (and potentially bytes) be abbreviated with just x: True or y: 42? That would open the possibility to have a potential __future__ import (that only works in combination with annotations) to use string literals as dependent types as well.

@lubieowoce

This comment has been minimized.

Copy link

lubieowoce commented Nov 1, 2018

If we don't want to execute arbitrary Python code at the type level, how about specifying a value-to-type mapping with a dict? Something like this:

def f(x: bool) -> FResult[x]: ...
#                         ^ the argument `x`   

FResult = TypeMapping[bool]({
  True:  str,
  False: bytes,
})

It's less verbose than @overloading every case, and perhaps more obvious than Literal[True]. Referring to the value of x in a type looks a bit weird [1], but that's what dependently-typed languages like Idris do [2], so it's not without precedent.

Here's a bigger example where the type depends on two arguments:

def open_file(path: str, access: str = 'r', encoding: str = 'u') -> SomeFile[access, encoding]: ...

SomeFile = TypeMapping[str, str]({
  ('r', 'u'): ReadableFile[str], # a read-only handle
  ('r', 'b'): ReadableFile[bytes],
  ('w', 'u'): WritableFile[str], # a write-only handle
  ('w', 'b'): WritableFile[bytes],
})

There's a bunch of ways TypeMapping can be made more expressive if desired. For example it could take a default parameter, making it like defaultdict. We could also make it more DSL-ish and allow more general pattern matching – I can post ideas for that in a follow-up if anyone's interested.


[1] Using a function argument in a type annotation would constrain this extension to python 3.7+ where type annotations aren't evaluated. This could be remedied with something like x = ArgVar('x') above the definition, but that could interfere with the actual code.

[2] For an example, see the printf function in typesafe printf

@JukkaL

This comment has been minimized.

Copy link
Collaborator Author

JukkaL commented Nov 1, 2018

@srittau Something like the Literal[...] syntax is needed at least until Python 3.8, since things like Tuple[True, False] generate a runtime error at the moment. Bare literal as a type can be proposed for Python 3.8 or later. I think that we should start with something that works with already released Python versions, however.

@lubieowoce Your proposal introduces quite a bit of new syntax for functionality that @overload already supports. I think that there would have to be some important use cases that could be supported with the proposed syntax before it might be worth the extra implementation effort (and cognitive load). Verbosity does not sound like a very compelling argument to me, since I don't expect overloads involving literal types to be that common.

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Nov 1, 2018

@JukkaL -- @Michael0x2a and me are already working on a draft-PEP about Final and Literal (it is not yet public, we will publish when we are relatively happy with the draft). Basically, it is just extended/detailed version of your comment above.

@lubieowoce -- Overloads (including type guards) is an important use case which will be explicitly mentioned in the draft PEP.

@srittau -- I don't think literal types will be used so often that saving 6 keystrokes is important. Second, I think explicit is better than implicit here, some people may be confused by bare literals in annotations. Finally, as Jukka mentioned, it just doesn't work.

Anyway, I would propose to pause the discussion here until the actual draft is posted here (hopefully very soon).

@brainrape

This comment has been minimized.

Copy link

brainrape commented Nov 1, 2018

@ilevkivskyi Would something like Literal[x>0] work?

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Nov 1, 2018

No.

@lubieowoce

This comment has been minimized.

Copy link

lubieowoce commented Nov 1, 2018

@ilevkivskyi @JukkaL thank you for your comments. I'm aware (and not too happy) that my proposal duplicates some functionality – I didn't mention it to keep the post short. I'll hold off with responding until you publish the draft PEP – perhaps it'll clarify some things that concerned me about the @overload+Literal approach.

@Michael0x2a

This comment has been minimized.

Copy link
Collaborator

Michael0x2a commented Nov 20, 2018

Small update -- here's the draft PEP mentioned above: https://github.com/Michael0x2a/peps/blob/literal-types/pep-9999.rst.

I think the new plan is to try discussing it on our shiny new typing-sigs mailing list instead of here, though? We'll see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment