Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariance or contravariance #2

Closed
gvanrossum opened this issue Oct 15, 2014 · 26 comments
Closed

Covariance or contravariance #2

gvanrossum opened this issue Oct 15, 2014 · 26 comments

Comments

@gvanrossum
Copy link
Member

As Lukasz mentions in NOTES-ambv.txt: "The covariance/contravariance discussion needs to happen at some point."

Reference: http://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)

I haven't fully digested all the viewpoints yet. It seems that mypy stays on the safe side and treats e.g. Set[Employee] and Set[Manager] as distinct types. However, Set[Employee] is acceptable where a Set (without parameter) is expected.

@gvanrossum
Copy link
Member Author

The PEP currently says: "As with function arguments, generics are covariant, which is in spirit of duck typing." Jeremy Siek objects to this statement.

Jeremy wants to be able to express strictly static checking as an option, doesn't want to limit strength of type system. Put another way, Jeremy's hope for gradual typing is to let programmers choose anywhere in between fully dynamic and fully static checking, but if the type system is weakened by allowing covariance, then the fully static option goes away.

@JukkaL
Copy link
Contributor

JukkaL commented Oct 16, 2014

Currently mypy special cases certain ABCs to support covariant subtyping (such as Iterable and Sequence). This is obviously pretty ugly. Other classes are invariant.

I don't think that List[Manager] should be a subtype of List[Employee] (or vice versa), since List supports mutation. However, Iterable[Manager] could be (and I argue that it should be) a subtype of List[Employee], since it can only be used for reading values.

We could allow a type variable to be defined as covariant (or contravariant). For example, Iterable could be defined like this:

T = typevar('T', covariant=True)

class Iterable(AbstractGeneric[T]):
    ...

Here covariant is probably not a good name, but it should make my intention clear. By default, a type variable would probably be invariant. Now Iterable[Manager] would be a subtype of Iterable[Employee]. Without covariant=True there would be no subtyping relationship.

A covariant type variable could not generally be used in a method argument type to maintain type safety (I can give an example if it's not obvious why this makes sense). There are some other technicalities which the PEP can probably ignore.

@gvanrossum
Copy link
Member Author

I happened to mention this to Peter Norvig, and he had the following to say:

I remember when my daughter, as a sophomore college student learning Java, asked "how come if I have City as a subclass of Place, and I have a function that expects List<Place>, I can't pass in a List<City>". I was embarrassed to tell her she had to use List<? extends Place>.

He also reminded me that Josh Bloch has said something like "I think we made it too complicated there".

@mvitousek
Copy link

Jukka's suggestion for a covariant setting for type variables seems like a good one; in fact, I think this is pretty similar to how C# does things -- see the section on in T and out T here: http://blogs.msdn.com/b/csharpfaq/archive/2010/02/16/covariance-and-contravariance-faq.aspx Covariant type variables ideally would not be able to appear in contravariant positions (such as the types of method parameters) in order to preserve safety. The terms "in" and "out" might be better than "covariant" and (if included) "contravariant", since they indicate that the type variable can only appear in input or output positions.

One thing I want to note is that languages like C# and Java get away with covariant arrays only because they perform run-time checks that raise exceptions when ill-typed data is written to arrays. Since Python doesn't have those, covariant arrays could provide a really easy and (to most programmers) unexpected way to write programs with type errors that type analyzers can't detect and that are hard to debug.

@JukkaL
Copy link
Contributor

JukkaL commented Nov 9, 2014

I agree that "in" and "out" are better terms. Since "in" is reserved, we could do something like this:

T = typevar('T', kind='in')

As a bonus, 'kind' is probably more Googleable than 'in'.

Michael, good point about the lack of runtime checking. It's a good argument for the importance of getting rid of glaring type safety holes (when feasible).

@vlasovskikh
Copy link
Member

I would suggest to use invariance by default, but to introduce declaration-site variance annotations in order to simplify things for users of generic classes.

You can specify variance of a generic parameter inside a) its usage as a type annotation or b) its declaration as a parameter of a generic class. For many classes it makes sense to specify default variance at the declaration site. (Scala allows that while Java doesn't). For example, immutable classes are often covariant and mutable classes are usually invariant.

Specifying variance at the declaration site would allow the users of these classes not to bother with variance and use simple type annotations like List[T] or Iterable[T] where variance is already specified by the author of a generic class.

The example below statically prevents an Engineer to join a List of Manager objects, but allows him to be a part of evaluating an Iterable of Employee objects. OK: and Error: messages below refer to the results of a static type checker.

from abc import abstractmethod
from typing import Var, Generic, Dict


# Clases from typing with variance set at the declaration site

T = Var('T')  # kind='inout', invariant
TOut = Var('T', kind='out')  # covariant


class Iterable(Generic[TOut]):
    @abstractmethod
    def __iter__(self) -> 'Iterator[TOut]': pass


class Iterator(Iterable[TOut]):
    @abstractmethod
    def __next__(self) -> TOut: pass


class List(Generic[T]):
    def add(self, item: T): pass
    def pop(self) -> T: pass
    ...


def evaluate_employees(employees: Iterable[Employee]) -> Dict[Employee, float]:
    return {x: evaluate(x) for x in employees}


def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
    employees.add(new_employee)


def test(managers: List[Manager]):
    evaluate_employees(managers)  # OK: List[Manager] is a subtype of Iterable[Employee]
    john = Engineer('John')
    hire_employee(managers, john)  # Error: List[Manager] is not a subtype of List[Employee]
    assert john in managers

Sometimes however the user may want to allow List[Manager] value to be passed to a function expecting List[Employee]. As long as he uses only 'out' methods on the List (those returning the value of the generic type parameter), he can specify the 'out' kind in his method annotation:

def best_employee(employees: List[Var('T', Employee, kind='out')]) -> Employee:
    return max(employees, key=evaluate)


def test(managers: List[Manager]):
    best = best_employee(managers)  # OK: List[Manager] is a subtype of List[Var('T', Employee, kind='out')]

The annotation of best_employee() looks verbose to me. But is it necessary if we want a type-safe List class that has to be invariant in this case.

What could happen if List was covariant:

T = Var('T', kind='out')  # covariant


class List(Generic[T]):
    def add(self, item: T): pass
    def pop(self) -> T: pass
    ...


def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
    employees.add(new_employee)


def test(managers: List[Manager]):
    john = Engineer('John')
    hire_employee(managers, john)  # OK: List[Manager] is a subtype of List[Employee]
    assert john in managers  # AssertionError: John is in the list of managers now

An alternative way to simplify things for the user is to use covariance by default, but to limit kinds of methods that could be called on the value of a covariant type to only 'out' methods:

T = Var('T', kind='out')  # covariant


class List(Generic[T]):
    def add(self, item: T): pass
    def pop(self) -> T: pass
    ...


def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
    employees.add(new_employee)  # Error: Employee is not a subtype of Var('T', Employee, kind='out')
                                 # i.e. only super types of Employee allowed here


def test(managers: List[Manager]):
    john = Engineer('John')
    hire_employee(managers, john)  # OK: List[Manager] is a subtype of List[Employee]
    assert john in managers

But this approach might be even more complicated since error messages are harder to understand. And it is not clear what you have to change in order to fix your code. Compare:

  • Invariance by default, invariant List
    • Error: List[Manager] is not a subtype of List[Employee]
    • Hint: Adding Employee to the list of List[Manager] would make any Employee a Manager
    • Action: Don't do this unsafe operation or delete the annotation from test(managers: List[Manager]) or change it to List[Employee] or even to just List
  • Covariance by default, covariant List
    • Error: Employee is not a subtype of Var('T', Employee, kind='out') i.e. only super types of Employee allowed here
    • Hint: List(Var('T', Employee, kind='in')) is required for passing Employee to methods of List[Employee]
    • Action: Change the annotation to employees: List(Var('T', Employee, kind='in')), but hire_employee(managers, john) now fails to typecheck, so you have to fix variance problems on the call site by following the tips from the "Invariance by default" option

@gvanrossum
Copy link
Member Author

I like declaring the variance kind with the type variable, but I also want
to point out that many ABCs are read-only (i.e. have no mutating methods --
even if the actual object passed in may be mutable) and in that case
covariance might be the default. E.g. Sequence[T] could be covariant while
List[T] should default to invariant (because it has a mutable interface).

On Thu, Nov 13, 2014 at 10:11 AM, Andrey Vlasovskikh <
notifications@github.com> wrote:

I would suggest to use invariance by default, but to introduce declaration-site
variance
annotations in order to simplify things for users of generic
classes.

You can specify variance of a generic parameter inside a) its usage as a
type annotation or b) its declaration as a parameter of a generic class.
For many classes it makes sense to specify default variance at the
declaration site. (Scala allows that while Java doesn't). For example,
immutable classes are often covariant and mutable classes are usually
invariant.

Specifying variance at the declaration site would allow the users of these
classes not to bother with variance and use simple type annotations like
List[T] or Iterable[T] where variance is already specified by the author
of a generic class.

The example below statically prevents an Engineer to join a List of
Manager objects, but allows him to be a part of evaluating an Iterable of
Employee objects. OK: and Error: messages below refer to the results of a
static type checker.

from abc import abstractmethodfrom typing import Var, Generic, Dict

Clases from typing with variance set at the declaration site

T = Var('T') # kind='inout', invariant
TOut = Var('T', kind='out') # covariant

class Iterable(Generic[TOut]):
@AbstractMethod
def iter(self) -> 'Iterator[TOut]': pass

class Iterator(Iterable[TOut]):
@AbstractMethod
def next(self) -> TOut: pass

class List(Generic[T]):
def add(self, item: T): pass
def pop(self) -> T: pass
...

def evaluate_employees(employees: Iterable[Employee]) -> Dict[Employee, float]:
return {x: evaluate(x) for x in employees}

def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
employees.add(new_employee)

def test(managers: List[Manager]):
evaluate_employees(managers) # OK: List[Manager] is a subtype of Iterable[Employee]
john = Engineer('John')
hire_employee(managers, john) # Error: List[Manager] is not a subtype of List[Employee]
assert john in managers

Sometimes however the user may want to allow List[Manager] value to be
passed to a function expecting List[Employee]. As long as he uses only
'out' methods on the List (those returning the value of the generic type
parameter), he can specify the 'out' kind in his method annotation:

def best_employee(employees: List[Var('T', Employee, kind='out')]) -> Employee:
return max(employees, key=evaluate)

def test(managers: List[Manager]):
best = best_employee(managers) # OK: List[Manager] is a subtype of List[Var('T', Employee, kind='out')]

The annotation of best_employee() looks verbose to me. But is it
necessary if we want a type-safe List class that has to be invariant in
this case.

What could happen if List was covariant:

T = Var('T', kind='out') # covariant

class List(Generic[T]):
def add(self, item: T): pass
def pop(self) -> T: pass
...

def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
employees.add(new_employee)

def test(managers: List[Manager]):
john = Engineer('John')
hire_employee(managers, john) # OK: List[Manager] is a subtype of List[Employee]
assert john in managers # AssertionError: John is in the list of managers now

An alternative way to simplify things for the user is to use covariance by
default, but to limit kinds of methods that could be called on the value of
a covariant type to only 'out' methods:

T = Var('T', kind='out') # covariant

class List(Generic[T]):
def add(self, item: T): pass
def pop(self) -> T: pass
...

def hire_employee(employees: List[Employee], new_empoloyee: Employee) -> None:
employees.add(new_employee) # Error: Employee is not a subtype of Var('T', Employee, kind='out')
# i.e. only super types of Employee allowed here

def test(managers: List[Manager]):
john = Engineer('John')
hire_employee(managers, john) # OK: List[Manager] is a subtype of List[Employee]
assert john in managers

But this approach might be even more complicated since error messages are
harder to understand. And it is not clear what you have to change in order
to fix your code. Compare:

Invariance by default, invariant List
- Error: List[Manager] is not a subtype of List[Employee]
- Hint: Adding Employee to the list of List[Manager] would make any
Employee a Manager
- Action: Don't do this unsafe operation or delete the annotation
from test(managers: List[Manager]) or change it to List[Employee]
or even to just List
-

Covariance by default, covariant List
- Error: Employee is not a subtype of Var('T', Employee, kind='out')
i.e. only super types of Employee allowed here
- Hint: List(Var('T', Employee, kind='in')) is required for passing
Employee to methods of List[Employee]
- Action: Change the annotation to employees: List(Var('T',
Employee, kind='in')), but hire_employee(managers, john) now fails
to typecheck, so you have to fix variance problems on the call site by
following the tips from the "Invariance by default" option


Reply to this email directly or view it on GitHub
#2 (comment).

--Guido van Rossum (python.org/~guido)

@vlasovskikh
Copy link
Member

@gvanrossum Exactly. I've compared a covariant Iterable to an invariant List in my example above. And the good thing about declaration-site variance that there is no need to treat Sequence or Iterable as special cases, it would be enough to declare them covariant:

 TOut = Var('T', kind='out')  # covariant

 class Sequence(Generic[TOut]):
     @abstractmethod
     def __getitem__(self, index: int) -> TOut]: pass

@gvanrossum
Copy link
Member Author

@vlasovskikh - I can't completely follow your long message -- either you are basically reiterating the need for mutable containers to use invariance, or your point is too subtle (or longwinded) for me. Regarding the use of kind='in' / kind='out', I'm not sure that is easier to remember (except perhaps for C# users) -- I've already forgotten which means covariant and which contravariant, and what should we use to specify the default explicitly? Finally, your use of an in-line Var() call feels suspect -- at least in mypy, type variables cannot be used inline (you must declare them in the form X = typevar('X')) and I like that. But I can't tell whether you were using it as a shorthand or as an essential feature?

@vlasovskikh
Copy link
Member

@gvanrossum I want to stress two points in my message:

  1. By default generic types should be invariant
  2. There should be an ability to use declaration-site variance as well as use-site variance

The rest of my message is just an explanation why I feel that those two are important. I didn't see any mentions of declaration-site variance in discussions of typing, so felt that it was necessary to remind about this option in this discussion.

Anything else including kind='in'/'out' as opposed to some other notation for specifying variance as well as in-line Var() is not important.

@gvanrossum
Copy link
Member Author

OK, I like that generic types should default to invariant.

(However, Tuple and Callable have different rules -- see the theory article, which is now https://www.python.org/dev/peps/pep-0483/. Tuple. being immutable, is covariant -- Callable is covariant in the return type, but contravariant in the arguments. I doubt this is controversial.)

For now I'll take your word on declaration-site variance vs. use-site variance; once I think I understand it I'll let you know what I think of it. :-)

@vlasovskikh
Copy link
Member

@gvanrossum That makes perfect sense for tuple and callable types.

@gvanrossum
Copy link
Member Author

I found a good discussion of declaration-site variance on Wikipedia: http://en.wikipedia.org/wiki/Covariance_and_contravariance_%28computer_science%29#Declaration-site_variance_annotations

Let me summarize it in my own words:

  • With declaration-site variance, you state in the interface definition which of the type variables are co- or contravariant (by default they are invariant). For a sound type system, the type checker must verify that covariant variables are not used as method argument types, and contravariant variables are not used as return types. This exists in C#, Scala, OCaml.
  • Use-site variance, known as wildcards in Java, lets the user override the variance when they declare the type of a variable, e.g. as List<? extends Animal> animals;. (This is the syntax that Norvig and Bloch said is too complicated.)

If we choose declaration-site variance, there is an opportunity here for a "cute" notation borrowed from Scala and OCaml: +T for using T covariantly, -T for using it contravariantly (C# uses out/in, but the latter is a bit problematic because it is a Python keyword). This is syntactically valid Python, and easy to implement in the runtime typing.py that I am developing, by overloading __pos__ and __neg__ on the TypeVar class. Example:

class Node(Generic[+T]):
   def add_left(self, node: T):
       self.left = node

(TBD: Add usage example with Employee/Manager, check with vlasovskikh's comment.)

@gvanrossum
Copy link
Member Author

OK, I think I've got it right, and your List[TypeVar('T', Employee, kind='out')] is similar to Java's List<? extends Employee>, right? And your recommendation is just not to go there at all.

Using my "cute" spelling idea, perhaps we could replace your

T = TypeVar('T', kind='out')

with

T = +TypeVar('T')

(This goes against the rule that a TypeVar() expression must be assigned to a variable with the same name, but perhaps we could relax that just enough to allow unary + and -. Then again it's perhaps too cute. But then again somebody who doesn't get +TypeVar(...) probably also doesn't get TypeVar(..., kind='out').)

I'm not sure what Jukka thinks, and we'll have to see how this works out in a larger practice trial (i.e. Python 3.5 :-) but so far I like it.

Apart from how to spell co/contravariant type variables (and whether that should be part of the variable declaration or part of the class definition) I also think we need to discuss bounded type variables in general, but I think that can be done separate from the co/contravariant discussion. (Maybe in #1.)

@vlasovskikh
Copy link
Member

@gvanrossum That's right. The + and - syntax is nice, but IMHO is equally hard to get and remember as kind='in' / 'out', since the problem of variance is hard in itself.

I should note that declaration-site variance and use-site variance can co-exist in a language. Maybe in order to simplify things we could use only declaration-site variance.

Here is an example of declaration-site variance vs use-site variance.

Given a hierarchy of Employee classes (Manager and Engineer are Employee), in order to define the best_empoyee function we can either:

  • Use a covariant type for the employees argument (such as immutable Iterable[T] or Sequence[T] ABC types):
T = TypeVar('T'): ...

class Iterable(Generic[+T]): ...

def best_employee(employees: Iterable[Employee]) -> Employee:
    return max(employees, key=evaluate)
  • Or use a concrete mutable invariant List type, but change its variance at the use site:
E = TypeVar('E', Employee)

def best_employee(employees: List[+E]) -> E:
    return max(employees, key=evaluate)

In this case having covariant Iterable and Sequence superclasses for List feels natural. In general though using declaration-site variance only would require introducing new classes with covariant subsets of methods of invariant types only for making the type checker happy. But again, it's a pretty rare case.

@gvanrossum
Copy link
Member Author

OK, that's very clear, and I am beginning to understand what the 39% quoted at http://en.wikipedia.org/wiki/Covariance_and_contravariance_%28computer_science%29#Comparing_Declaration-site_and_Use-site_annotations refers to. I also now understand what they mean with "the Scala Collections library defines three separate interfaces for classes which employ covariance".

I'd love some input from Jukka on this issue (summary: whether to support only declaration-site variance declarations (first bullet), or also use-site variance declarations (second bullet); and what you think of the +T/-T syntax instead of (or as a shorthand for) kind='out'/'in' in the TypeVar() call. Also, what "e = TypeVar('E', Employee)" should mean (without the variance declaration).

@JukkaL
Copy link
Contributor

JukkaL commented Jan 15, 2015

My vote goes to only having declaration-site variance. Many uses of wildcard types as in Java programs (use-site variance) could probably be replaced by using just type variables or using Any, even if we can't use declaration-site variance to represent them. For example, consider this example using Java-like syntax:

def f(x: List[? extends object]) -> None: ...

We can use a type variable to achieve a similar effect:

T = TypeVar('T')
def f(x: List[T]) -> None: ...

We could also rewrite it using Any, at the cost of less effective type checking:

def f(x: List[Any]) -> None: ...

I looked at this some years ago and it seemed that wildcard types were rarely used in interesting ways, but they may be more widely used in recent Java code.

@gvanrossum
Copy link
Member Author

SGTM!

On Wednesday, January 14, 2015, Jukka Lehtosalo notifications@github.com
wrote:

My vote goes to only having declaration-site variance. Many uses of
wildcard types as in Java programs (use-site variance) could probably be
replaced by using just type variables or using Any, even if we can't use
declaration-site variance to represent them. For example, consider this
example using Java-like syntax:

def f(x: List[? extends object]) -> None: ...

We can use a type variable to achieve a similar effect:

T = TypeVar('T')
def f(x: List[T]) -> None: ...

We could also rewrite it using Any, at the cost of less effective type
checking:

def f(x: List[Any]) -> None: ...

I looked at this some years ago and it seemed that wildcard types were
rarely used in interesting ways, but they may be more widely used in recent
Java code.


Reply to this email directly or view it on GitHub
#2 (comment).

--Guido van Rossum (on iPad)

@vlasovskikh
Copy link
Member

@JukkaL Your second example is not equivalent to your first example if we consider List to be invariant, not covariant. With declaration-site variance only and invariant List you have to introduce covariant ABC superclasses of List such as class Iterable(Generic[+TypeVar('T')]): ... or just use List[Any] or List (BTW are those equivalent now?).

@gvanrossum
Copy link
Member Author

This is now the most important issue left to decide. It's pretty complex (I always get a headache thinking about this stuff) and I don't really like the TypeVar('T', kind='in') syntax, but let's call that a strawman. I checked in some code to typing.py just now (9a434e6) that implements this syntax and passes some simple test cases. (I had to fix a bunch of semi-unrelated things that were broken too.) I will next try to do a write-up of how I think this should work in the type checker, and why, and post it as the next comment here (unless someone else beats me to it).

@gvanrossum
Copy link
Member Author

I wrote something up here: https://github.com/ambv/typehinting/blob/master/VARIANCE.rst

@gvanrossum
Copy link
Member Author

FWIW, I think I got in/out mixed up. According to the wikipedia covariant is 'out' (or +) and contravariant is 'in' (or -). This is because I wrote both the typing.py patch and the VARIANCE.rst file from memory. So much for in/out being mnemonic. :-(

@gvanrossum
Copy link
Member Author

I can't sleep. :-)

I fixed this in VARIANCE.rst and typing.py. But my real proposal is to drop kind=('in'|'out') and instead add separate keyword args covariant=True or contravariant=True. (Then we can set both if we add bivariance, which is mentioned in the Wikipedia article too. :-)

@JukkaL
Copy link
Contributor

JukkaL commented Apr 14, 2015

@gvanrossum That sounds reasonable -- it has the benefit that anybody can easily google what the technical terms mean, and even though the terms are technical, they are pretty widely used with a generally agreed on meaning.

@NYKevin
Copy link

NYKevin commented May 5, 2015

How would this interact with constraints? IIUC, the current proposal does something like this:

T = TypeVar('T', covariant=True)

class Project(Generic[T]):
    pass

assert issubclass(Project[Manager], Project[Employee])

But consider:

T = TypeVar('T', Employee, covariant=True)

class Project(Generic[T]):
    def __init__(self, leader: T):
        self.leader = leader  # type: T
        leader.employee_method()

Is this legal? Do we need to cast leader to Employee? Or does T just get reduced to Employee since it only has one constraint? More generally, how do we indicate that "T is always (must always be) compatible with Employee" while preserving generic functionality?

@gvanrossum
Copy link
Member Author

We now have agreement on this (we're going to use covariant=True/contravariant=True) and typing.py already implements this, but we still need to update the PEP.

@NYKevin: This is not what constraint arguments to type variables are meant for. You should just use Employee instead of a type variable.

@gvanrossum gvanrossum added the bug label May 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants