Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Type aliases without implicit conversion ("NewType") #1284
Several users (@wittekm among them) have asked for a feature where you can define something that's like a type alias in referring to an existing underlying type at runtime, but which isn't considered equivalent to it in the type-checker -- like what Hack and Haskell call
One classic application of this would be aliases/newtypes of
Other classic uses include distinguishing identifiers of different things (users, machines, etc.) that are all just integers, so they don't get mixed up by accident.
A user can always just define a class, say an empty subclass of the underlying type, but if an application is handling a lot of IDs or text fragments or the like, it costs a lot at runtime for them to be some other class rather than actual
The main open question I see in how this feature might work is how to provide for converting these values to the underlying type. For a feature like this to be useful there has to be some way to do that -- so it's possible to write the conversion functions that take appropriate care like escaping text into HTML -- but preferably one that's private to a limited stretch of code, or failing that is at least easy to audit with
The Hack solution could work, and has the advantage that it means no run-time overhead at all, other than invoking the intended conversion functions that live in the newtype's file. It feels odd to me, though, because Python generally doesn't treat specially whether a thing is defined in a given module vs. imported into it. Another solution could be something like
@gvanrossum, @ddfisher, and I discussed this for a while in person. The use case we understand relatively well is for things like user IDs, where one rarely wants to manipulate them as the underlying type (like doing arithmetic.) For things like HTML fragments, we'll want to understand the use case better.
Even for user IDs, one thing to be careful of getting right is equality -- you need to be able to write
It'll also be important to make a newtype like this possible to gradually adopt in an already-annotated codebase -- so that if you have a bunch of code already annotated with
I was playing with this the other day and had the idea to use classes that the type checker would see, but which would never be instantiated at runtime. It kind-of works :-)
At first I tried:
class Celsius(int): def __new__(cls, value): return value class Fahrenheit(int): def __new__(cls, value): return value def is_it_boiling(temp: Celsius) -> bool: return temp > 100 c = Celsius(80) f = Fahrenheit(120) print('is', c, 'boiling?', 'yes' if is_it_boiling(c) else 'no') print('is', f, 'boiling?', 'yes' if is_it_boiling(f) else 'no')
The classes are only there to introduce types for the type checker — at runtime there is only
This correctly flags the second call to
However, since both classes inherit from
x = Celsius(10) + Fahrenheit(20)
That does not give an error since the type checker sees that as a call to
To get around that, I tried removing the common base class. That looks like this:
from typing import Union class Celsius: def __new__(cls, value): return value def _binop(self, other: Union['Celsius', int]) -> 'Celsius': pass __add__ = __sub__ = _binop def _boolbinop(self, other: Union['Celsius', int]) -> bool: pass __lt__ = __le__ = __eq__ = __ne__ = __ge__ = __gt__ = _boolbinop
x = Celsius(10) + Fahrenheit(20)
is flagged with
The use of
So, the basic idea was: make a new type unrelated to all other types for each newtype. The new type gets the special methods needed to behave like the base type it is an alias for. At runtime, the new type is never instantiated — so there's no performance penalty due to an extra layer of indirection.
Since the types are not actually used at runtime,
Excuse me for not bringing much except a long
We very much would like to start using mypy, but since most of our codebase is just very nasty python3 code (none of them have stubs so far, okay maybe requests), so the important parts would be the "business logic", and those parts would mostly benefit from these custom value types. Though using classes works, and they only have a negligible memory overhead, they are a bit distracting (especially since developers have to be conscious of them to not use them for anything else than annotating/hinting).
Thanks for your efforts!
This has been a fairly frequently requested feature. The main challenge is that there are multiple potential, different use cases, and we don't understand how important all of these are. Supporting all of the possible use cases might be hard, and almost certainly not a good idea anyway.
Here's a list of things which might be useful to support (some of these could be worthless):
Use cases 1, 2 and 4 could be implemented via wrapper objects, but this would have extra memory overhead, and potentially also speed impact. Also, use case 4 would potentially require replicating the entire interface of the target class, resulting in a lot of boilerplate code. However, use case 4 seems pretty marginal. Currently use case 3 isn't properly supported, as far as I can see.
My guess is that use case 1 is reasonably common, but the others are less so.
For start simply the ability to define a type, while avoiding the use of
Usually the fact that a
(I'm not Pas, but here's my two cents for an example:)
from typing import TypeAlias UserId = TypeAlias('UserId', int) class User(object): @classmethod def get_by_user_id(cls, user_id): # type: (UserId) -> User db_result = db.hey_this_function_accepts_an_int(user_id) return User(db_result) @property def name(self): ...
from user import UserId, User # just use a type: comment to say 'hey, I want to treat user_id_as_int as a UserId ' def get_user_name(user_id_as_int): # type: (int) -> str user_id = user_id_as_int # type: UserId # the above would fail if 'user_id_as_int' were not an int. # could probably also be accomplished by a cast(). return User.get_by_user_id(user_id).name
The basic rules I could foresee would be:
RULE 2: functions that accept a UserId need to have their parameters explicitly identified as a UserId, through either a #type: or a cast().
[sidenote for if this proposal does go through: this will probably be used for IDs a lot. You should publish some opinion like the mypy authors think capitalizating like 'UserId' is more idiomatic casing than 'UserID', or the other way around. The most annoying thing in the world would be if somebody had a TypeAlias called UserID and another guy had a type alias called EmailId.]
Thanks to wittekm's detailed example all I have to add is an other use case where type equivalence can cause problems.
So let's say we have this:
Then hinting these callbacks (and other dynamically loaded and utilized Callables) seems like a big selling point of mypy, but in this case I wasn't able to come up with a class that somehow subtypes this type. (Probably because I don't know enough about how special the code in typing.py is, or maybe just because Callable is Final.) And of course it's unlikely that there are many similarly strange/ugly typed variables (functions), though there are a lot of uses of callable values with simpler types.
For example Flask and Werkzeug use class valued variables (response_class, app_ctx_globals_class and others), and they could be typed as
(I'll respond to @wittekm's example and rules in a separate comment.)
@PAStheLoD: Callable is indeed final, you can't subclass it. It's about as special as Union or Tuple. And type-checking of Callable types is entirely structural, since that's what "callable" means fundamentally. (In fact it's a lot more fundamental than "int" or "str".)
I think what you're trying to do here is the following.
You have an API that takes a callback that implements some policy (e.g. for caching). There may be some predefined policies you are also fine with users implementing their own policies. However, you want them to explicitly state when they are writing a policy, so that only functions explicitly marked as policy functions are accepted as policy parameters in your APIs.
In a dynamic world, you could easily do this by requiring a
I think this is an interesting if slightly esoteric use case, and it looks possible that the solution onto which we are converging for simpler type aliases will made to work for callables too. For example:
# Library code T = TypeVar('T') _Policy = Callable[[Any, str, Any, T], Tuple[Optional[T], Optional[T]]] # The raw signature Policy = TypeAlias('Policy', _Policy) # The marked type def make_policy(func: _Policy) -> Policy: # The decorator return cast(func, Policy) # Maybe the cast isn't even needed def some_api(p: Policy) -> None: <this only accepts functions marked with @Policy> # User code from library import T, Policy, make_policy, some_api @make_policy def my_policy(a, b: str, c, d: T) -> Tuple[Optional[T], Optional[T]]: <implement a policy> def not_a_policy(a, b: str, c, d: T) -> Tuple[Optional[T], Optional[T]]: <looks like a policy but isn't> some_api(my_policy) # OK some_api(not_a_policy) # Error (from mypy!)
@wittekm: I have a few questions about the details of your proposal.
(I also edited your example a bit to correct obvious typos, like a missing
You are proposing asymmetric rules: any function that takes an int will silently accept a UserId, while any function that takes a UserId won't take an int. (In mypy -- at runtime they both accept either type since UserId is just int at runtime.)
A slight problem with this is that it seems to allow int operations on UserId instances, since operations are just syntactic sugar for functions. So this code would still be allowed:
def get(uid: UserId) -> None: x = uid + 1 # Has type int, but makes no sense <do something with x>
I'm not sure if this is a show-stopping deficiency or something that we can live with. It seems to go against Jukka's (1) from #1284 (comment), where he says that the alias wouldn't inherit most operations. I'm curious if we can define a rule that actually follows (1). Certainly some operations should still be allowed, e.g.
My other question is about using
user_id = user_id_as_int # type: UserId
and you wrote (both in the comment and in RULE 2) that it might also possibly use a cast. I would really hope we won't need to use or encourage casts here, since it's an expensive runtime operation (at least it is until we teach CPython about it -- right now it is a user-defined function, which is incredibly slow compared to a plain assignment). Unfortunately (and here @JukkaL might know better) I think that there's not much difference between the above and
since in both cases mypy sees an int (
var user_id: UserId = user_id_as_int
However, all that seems to point us in the direction of requiring a cast, and as I said I really don't like that. How can we formulate a rule that does what we want? You tried with RULE 1 and RULE 2 but I think we'll need some additional subtlety that eludes me. Maybe we need to distinguish between different kinds of contexts (e.g. assignment being different from arguments) and allow conversion of int to UserId in one context but not another.
Going back to @JukkaL's list of use cases (1)-(4) and the Celsius/Fahrenheit example, I would also like to discuss runtime costs more. Some use cases require defining a new class using standard Python class syntax (e.g.
Finally. Once we agree on how to do it I think we should come up with a better name, since "type alias" in PEP 484 is already used for pure aliases that aren't type-checked -- they are just shorthands to avoid having to write the same thing over and over.
Sorry for the rambling!
PS. Re: capitalization of UserID vs. UserId, I find the latter looks better. PEP 8 has something to say about abbreviations like HTTP (it prefers HTTPServerError), but I think that's really more about initialisms like HTTP (== HyperText Transfer Protocol), while to me ID feels more like a shortening of "Identity". I could see HTTP being evolved from H.T.T.P., but writing I.D. for Identity makes no sense -- even though I've seen it.
@gvanrossum: You are correct, mypy uses the same rules for assignment and functions calls. Making assignments special wouldn't be too hard technically, but this special casing feels pretty ad-hoc to me. Users might expect that an assignment with a type annotation can be used as a cast in general and would get confused as it would only work for certain types. Using a type alias would still have a runtime cost (an extra assignment statement), and I'm not yet convinced that the runtime overhead of a call is significant enough to make the type system less consistent.
Instead of using
A) Make the alias a callable object
This should be slightly more efficient than a cast as there is only a single argument. This is arguably more readable and less error-prone that with a cast:
B) Special type that allows both
We'd add a new kind of type, but we wouldn't need to touch the semantics of assignments. The idea is that a special type would accept
At first look, I like (A) best. The notation is very intuitive, since
So I think I'm okay with the cost of the alias function.
Some downsides of (B) include: two new primitives (NewType and
@JukkaL You write
I think the cases 3 and 4 are also covered by the Celsius class I wrote. Mypy believes
Instead of trying to cover specific use cases, I was trying to model Haskell's
@gvanrossum I agree that A is better. Here's a more detailed proposal for how it could work:
I think that this would cover the
This actually would not directly correspond to any of my four ideas from above. This would be a simplified variant of (3) where there is less new magic.
Some open issues:
To make the
This would be similar to the original proposal, but it would allow defining custom types for operations in the body of the class. However, this would be more confusing as the syntax would imply that this is a class, even though it's actually not a real class -- the class decorator would return a function. The name