Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type aliases without implicit conversion ("NewType") #1284

gnprice opened this issue Mar 10, 2016 · 23 comments · Fixed by #1939

Type aliases without implicit conversion ("NewType") #1284

gnprice opened this issue Mar 10, 2016 · 23 comments · Fixed by #1939


Copy link

gnprice commented Mar 10, 2016

Several users (@wittekm among them) have asked for a feature where you can define something that's like a type alias in referring to an existing underlying type at runtime, but which isn't considered equivalent to it in the type-checker -- like what Hack and Haskell call newtype.

One classic application of this would be aliases/newtypes of str (or unicode or bytes) to distinguish fragments of HTML, Javascript, SQL, etc., from arbitrary text and enforce that conversions are only done with proper escaping, to prevent classes of vulnerabilities like XSS and SQL injection. A definition might look like HtmlType = NewType("HtmlType", str).

Other classic uses include distinguishing identifiers of different things (users, machines, etc.) that are all just integers, so they don't get mixed up by accident.

A user can always just define a class, say an empty subclass of the underlying type, but if an application is handling a lot of IDs or text fragments or the like, it costs a lot at runtime for them to be some other class rather than actual str or int, so that that isn't a good solution.

The main open question I see in how this feature might work is how to provide for converting these values to the underlying type. For a feature like this to be useful there has to be some way to do that -- so it's possible to write the conversion functions that take appropriate care like escaping text into HTML -- but preferably one that's private to a limited stretch of code, or failing that is at least easy to audit with grep. In Hack the types are implicitly equivalent just within the source file where the newtype is defined; in Haskell the newtype comes with a data constructor which is the only way to convert, and typically one just doesn't export that from the module where it's defined.

The Hack solution could work, and has the advantage that it means no run-time overhead at all, other than invoking the intended conversion functions that live in the newtype's file. It feels odd to me, though, because Python generally doesn't treat specially whether a thing is defined in a given module vs. imported into it. Another solution could be something like html = HtmlType.make(text) and text = HtmlType.unmake(html), which would follow perfectly normal Python scoping and would be reasonably auditable.

Copy link
Collaborator Author

gnprice commented Mar 10, 2016

@gvanrossum, @ddfisher, and I discussed this for a while in person. The use case we understand relatively well is for things like user IDs, where one rarely wants to manipulate them as the underlying type (like doing arithmetic.) For things like HTML fragments, we'll want to understand the use case better.

Even for user IDs, one thing to be careful of getting right is equality -- you need to be able to write if user_id == requester_user_id: or whatever, so you need at least that one operation. And you probably want an error if someone ever compares a user ID to a plain int.

It'll also be important to make a newtype like this possible to gradually adopt in an already-annotated codebase -- so that if you have a bunch of code already annotated with int for the user IDs, you can go back and annotate a swath of it at a time with UserId instead. David suggested having a feature like UncheckedUnion, which is sort of like Any but narrowed to the specific types in the union; then one could write e.g. UncheckedUserId = UncheckedUnion[UserId, int], and at the boundary between UserId code and still-int code type things as UncheckedUserId. That saves having to write a bunch of casts, and it means no runtime effect.

Copy link

I was playing with this the other day and had the idea to use classes that the type checker would see, but which would never be instantiated at runtime. It kind-of works :-)

At first I tried:

class Celsius(int):
    def __new__(cls, value):
        return value

class Fahrenheit(int):
    def __new__(cls, value):
        return value

def is_it_boiling(temp: Celsius) -> bool:
    return temp > 100

c = Celsius(80)
f = Fahrenheit(120)
print('is', c, 'boiling?', 'yes' if is_it_boiling(c) else 'no')
print('is', f, 'boiling?', 'yes' if is_it_boiling(f) else 'no')

The classes are only there to introduce types for the type checker — at runtime there is only ints. The __new__ methods lets you use the class when creating a new value. The type checker apparently expects that __new__ returns the cls passed in, so c = Celsius(123) will conveniently make the type checker assign the type Celsius to c. Alternatively, you could have used cast(Celsius, value).

This correctly flags the second call to is_it_boiling where the argument was in Fahrenheit, not Celsius.

However, since both classes inherit from int, they can be used in mixed expressions like

x = Celsius(10) + Fahrenheit(20)

That does not give an error since the type checker sees that as a call to int.__add__(other: int) and Fahrenheit is a subclass of int.

To get around that, I tried removing the common base class. That looks like this:

from typing import Union

class Celsius:
    def __new__(cls, value):
        return value

    def _binop(self, other: Union['Celsius', int]) -> 'Celsius':

    __add__ = __sub__ = _binop

    def _boolbinop(self, other: Union['Celsius', int]) -> bool:

    __lt__ = __le__ = __eq__ = __ne__ = __ge__ = __gt__ = _boolbinop

The Fahrenheit class would need similar boilerplate, which one might be able to put in a generic base class. With these definitions, a line like

x = Celsius(10) + Fahrenheit(20)

is flagged with

error: Argument 1 has incompatible type "Fahrenheit"; expected "Union[Celsius, Number]"

The use of Union[..., int] allows you to do comparisons with normal integers and you can thus save on the number of Celsius(...) wrappers needed. (I had originally thought I could use numbers.Number instead of int, but that doesn't work for some reason.)

So, the basic idea was: make a new type unrelated to all other types for each newtype. The new type gets the special methods needed to behave like the base type it is an alias for. At runtime, the new type is never instantiated — so there's no performance penalty due to an extra layer of indirection.

Since the types are not actually used at runtime, isinstance checks will fail (isinstance(Celsius(123), Celsius) is False since c is an int at runtime).

Copy link


Excuse me for not bringing much except a long +1.

We very much would like to start using mypy, but since most of our codebase is just very nasty python3 code (none of them have stubs so far, okay maybe requests), so the important parts would be the "business logic", and those parts would mostly benefit from these custom value types. Though using classes works, and they only have a negligible memory overhead, they are a bit distracting (especially since developers have to be conscious of them to not use them for anything else than annotating/hinting).

Thanks for your efforts!

Copy link

JukkaL commented Mar 18, 2016

This has been a fairly frequently requested feature. The main challenge is that there are multiple potential, different use cases, and we don't understand how important all of these are. Supporting all of the possible use cases might be hard, and almost certainly not a good idea anyway.

Here's a list of things which might be useful to support (some of these could be worthless):

  1. Create a very limited type for things like IDs that is separate from other types but is represented at runtime as an int/str or similar. It wouldn't inherit most operations from the target type. Currently defining a dummy class as proposed above can be used as a workaround, and we could easily provide a better syntax for this. Type checking of == operations is still an open issue, as mypy allow arbitrary objects compared for equality.
  2. Create something that provides a (tweaked) subset of operations supported by int or another type, but is represented at runtime as int (or whatever). The Celsius example above is like this.
  3. Create something that behaves more or less like a subclass of int/str but is a regular int/str at runtime. This already kind of works via a __new__ hack, but it's not quite perfect as binary operations will likely have wrong result types by default (e.g. MyInt() + MyInt() is int instead of MyInt).
  4. Create something that behaves exactly like another type such as str but that is not compatible at all with the another type. For example, create an alias Path for str, and don't support mixing Path and str objects in operations at all. Also, Path would not be a subtype of str. However, Path() + Path() would be okay.

Use cases 1, 2 and 4 could be implemented via wrapper objects, but this would have extra memory overhead, and potentially also speed impact. Also, use case 4 would potentially require replicating the entire interface of the target class, resulting in a lot of boilerplate code. However, use case 4 seems pretty marginal. Currently use case 3 isn't properly supported, as far as I can see.

My guess is that use case 1 is reasonably common, but the others are less so.

Copy link

For start simply the ability to define a type, while avoiding the use of class, without any primitive type semantics would be okay.

Usually the fact that a user_id is an integral type (or that it's numeric at all) doesn't matter. It's just a unique id for lookups and update access. And it'd help catch a lot of bugs due to accidentally calling a function with the wrong arguments, returning the user name instead of the id and so on. (Or maybe a better example is user name vs. email address, both are usually represented as simple (base)string typed variables/values, yet semantically they are completely different.)

Copy link

Could you work out a complete example of how that would look?

Copy link

wittekm commented Mar 19, 2016

(I'm not Pas, but here's my two cents for an example:)
This is a model. Maybe some business logic.

from typing import TypeAlias

UserId = TypeAlias('UserId', int)

class User(object):
  def get_by_user_id(cls, user_id):
    # type: (UserId) -> User
    db_result = db.hey_this_function_accepts_an_int(user_id)
    return User(db_result)

  def name(self):
some HTTP controller, whose actions can only take in primitive arguments. Its only purpose in life is to call into smarter logic like the above, and maybe some validation.

from user import UserId, User

# just use a type: comment to say 'hey, I want to treat user_id_as_int as a UserId '
def get_user_name(user_id_as_int):
  # type: (int) -> str

  user_id = user_id_as_int  # type: UserId
  # the above would fail if 'user_id_as_int' were not an int.
  # could probably also be accomplished by a cast().
  return User.get_by_user_id(user_id).name

The basic rules I could foresee would be:
RULE 1: functions that accept an int, can also accept a UserId

  • as exemplified by db.hey_this_function_accepts_an_int in get_by_user_id - the passed in variable is a UserId.
  • Highly simplifies migrations to a TypeAlias-friendly setup.

RULE 2: functions that accept a UserId need to have their parameters explicitly identified as a UserId, through either a #type: or a cast().

  • as exemplified in the controller.
  • forces the person consuming the library to say, "oh geez, okay, i promise you that this is a UserId".

[sidenote for if this proposal does go through: this will probably be used for IDs a lot. You should publish some opinion like the mypy authors think capitalizating like 'UserId' is more idiomatic casing than 'UserID', or the other way around. The most annoying thing in the world would be if somebody had a TypeAlias called UserID and another guy had a type alias called EmailId.]

Copy link

Thanks to wittekm's detailed example all I have to add is an other use case where type equivalence can cause problems.

So let's say we have this:

ConflictResolutionFunction = Callable[[Any, str, Any, T], Tuple[Optional[T], Optional[T]]]

Then hinting these callbacks (and other dynamically loaded and utilized Callables) seems like a big selling point of mypy, but in this case I wasn't able to come up with a class that somehow subtypes this type. (Probably because I don't know enough about how special the code in is, or maybe just because Callable is Final.) And of course it's unlikely that there are many similarly strange/ugly typed variables (functions), though there are a lot of uses of callable values with simpler types.

For example Flask and Werkzeug use class valued variables (response_class, app_ctx_globals_class and others), and they could be typed as Callable[[], Any]s, because due to duck typing the actual classes instantiated by the framework doesn't have to implement anything. (Or even if Flask uses ABCs, some other code might not.)

Copy link

(I'll respond to @wittekm's example and rules in a separate comment.)

@PAStheLoD: Callable is indeed final, you can't subclass it. It's about as special as Union or Tuple. And type-checking of Callable types is entirely structural, since that's what "callable" means fundamentally. (In fact it's a lot more fundamental than "int" or "str".)

I think what you're trying to do here is the following.

You have an API that takes a callback that implements some policy (e.g. for caching). There may be some predefined policies you are also fine with users implementing their own policies. However, you want them to explicitly state when they are writing a policy, so that only functions explicitly marked as policy functions are accepted as policy parameters in your APIs.

In a dynamic world, you could easily do this by requiring a @policy decorator that just adds a special _policy_ attribute to the function object, and in the API you check the policy callbacks for that attribute. But you wish that you could let mypy do this for you -- users would still add a @policy decorator, but that decorator would subtly modify the type, and your policy API arguments would be declared as only accepting such a policy. Then mypy would catch mistakes without runtime overhead.

I think this is an interesting if slightly esoteric use case, and it looks possible that the solution onto which we are converging for simpler type aliases will made to work for callables too. For example:

# Library code
T = TypeVar('T')

_Policy = Callable[[Any, str, Any, T], Tuple[Optional[T], Optional[T]]]  # The raw signature
Policy = TypeAlias('Policy', _Policy)  # The marked type

def make_policy(func: _Policy) -> Policy:  # The decorator
    return cast(func, Policy)  # Maybe the cast isn't even needed

def some_api(p: Policy) -> None:
    <this only accepts functions marked with @Policy>

# User code
from library import T, Policy, make_policy, some_api

def my_policy(a, b: str, c, d: T) -> Tuple[Optional[T], Optional[T]]:
    <implement a policy>

def not_a_policy(a, b: str, c, d: T) -> Tuple[Optional[T], Optional[T]]:
      <looks like a policy but isn't>

some_api(my_policy)  # OK
some_api(not_a_policy)  # Error (from mypy!)

Copy link

@wittekm: I have a few questions about the details of your proposal.

(I also edited your example a bit to correct obvious typos, like a missing self for the name property and to make the get() call correspond to the definition of get_by_user_id(). I hope that's OK.)

You are proposing asymmetric rules: any function that takes an int will silently accept a UserId, while any function that takes a UserId won't take an int. (In mypy -- at runtime they both accept either type since UserId is just int at runtime.)

A slight problem with this is that it seems to allow int operations on UserId instances, since operations are just syntactic sugar for functions. So this code would still be allowed:

def get(uid: UserId) -> None:
    x = uid + 1  # Has type int, but makes no sense
    <do something with x>

I'm not sure if this is a show-stopping deficiency or something that we can live with. It seems to go against Jukka's (1) from #1284 (comment), where he says that the alias wouldn't inherit most operations. I'm curious if we can define a rule that actually follows (1). Certainly some operations should still be allowed, e.g. __str__ or __eq__. How would we define the set of operations that TypeAlias() strips without special-casing e.g. int? This is essentially the problem from #1284 (comment) (the example with Celsius and Fahrenheit). We really need to decide about this before we can move forward.

My other question is about using cast() vs # type comments. Your example has

user_id = user_id_as_int  # type: UserId

and you wrote (both in the comment and in RULE 2) that it might also possibly use a cast. I would really hope we won't need to use or encourage casts here, since it's an expensive runtime operation (at least it is until we teach CPython about it -- right now it is a user-defined function, which is incredibly slow compared to a plain assignment). Unfortunately (and here @JukkaL might know better) I think that there's not much difference between the above and


since in both cases mypy sees an int (user_id_as_int) in a context where a UserId is required. To understand this it helps to realize that putting a # type comment on an assignment is really a crutch for a variable declaration -- in some hypothetical future Python syntax the above example might be written as

var user_id: UserId = user_id_as_int

IOW the # type comment just gives the type of the variable, not of the expression, and mypy then has to decide whether the expression's type is acceptable for the variable's type. There really can be only a single rule to decide whether an expression's type is acceptable for a given context (either a variable or argument supplies a context).

However, all that seems to point us in the direction of requiring a cast, and as I said I really don't like that. How can we formulate a rule that does what we want? You tried with RULE 1 and RULE 2 but I think we'll need some additional subtlety that eludes me. Maybe we need to distinguish between different kinds of contexts (e.g. assignment being different from arguments) and allow conversion of int to UserId in one context but not another.

Going back to @JukkaL's list of use cases (1)-(4) and the Celsius/Fahrenheit example, I would also like to discuss runtime costs more. Some use cases require defining a new class using standard Python class syntax (e.g. class Celsius(int): ...). There's no way that can have zero runtime cost, and I think if we're going that road it's not worth also adding special handling to mypy. The thing that's exciting would be to have a TypeAlias() special form that disappears entirely at runtime. But then I want to avoid the need for casts too. Here I am getting a headache. :-)

Finally. Once we agree on how to do it I think we should come up with a better name, since "type alias" in PEP 484 is already used for pure aliases that aren't type-checked -- they are just shorthands to avoid having to write the same thing over and over.

Sorry for the rambling!

PS. Re: capitalization of UserID vs. UserId, I find the latter looks better. PEP 8 has something to say about abbreviations like HTTP (it prefers HTTPServerError), but I think that's really more about initialisms like HTTP (== HyperText Transfer Protocol), while to me ID feels more like a shortening of "Identity". I could see HTTP being evolved from H.T.T.P., but writing I.D. for Identity makes no sense -- even though I've seen it.

Copy link

@gvanrossum: Yes, thank you, that's the broadest use case I had in mind, sans the decorator, as I was envisioning that marking a function as this "non-equal alias" type would be enough information (both for the developer and mypy).

Copy link

gvanrossum commented Mar 20, 2016 via email

Copy link

JukkaL commented Mar 20, 2016

@gvanrossum: You are correct, mypy uses the same rules for assignment and functions calls. Making assignments special wouldn't be too hard technically, but this special casing feels pretty ad-hoc to me. Users might expect that an assignment with a type annotation can be used as a cast in general and would get confused as it would only work for certain types. Using a type alias would still have a runtime cost (an extra assignment statement), and I'm not yet convinced that the runtime overhead of a call is significant enough to make the type system less consistent.

Instead of using cast or an assignment, there are other options we could pursue. Here are a few ideas.

A) Make the alias a callable object

This should be slightly more efficient than a cast as there is only a single argument. This is arguably more readable and less error-prone that with a cast:

UserId = NewType('UserId', int)

def get_user_name(user_id):
  # type: (int) -> str
  return User.get_by_user_id(UserId(user_id)).name

B) Special type that allows both UserId and int

We'd add a new kind of type, but we wouldn't need to touch the semantics of assignments. The idea is that a special type would accept int as a value, but the type would still be a subtype of UserId. This is similar to the unchecked union proposed by @ddfisher, but I'd rather use a different name for this. I don't have a great proposal for the name, though. The code would be efficient as we don't need an assignment or a call.


from typing import NewType, Implicit

UserId = NewType('UserId', int)

def get_user_name(user_id):
  # type: (Implicit[UserId]) -> str
  return User.get_by_user_id(user_id).name  # ok

get_user_name(2)  # ok

The type Implicit[UserID] would allow UserId or int, but maybe it wouldn't allow other aliased types such as FileId (even if that is internally an int). It would be compatible with UserId -- it's basically a bridge between int and UserId. Implicit[t] probably isn't the best way to phrase this. Some other ideas, none of which seem very intuitive:

  • UserId.implicit
  • Unchecked[UserId]
  • Promote[UserId]

Copy link

At first look, I like (A) best. The notation is very intuitive, since
we already write int(x), str(x), float(x) etc. so UserId(x) is
instantly understandable. At runtime it would be an identity function.
I tried to understand the timing of this but it appears a lot of the
time goes towards looking up the function to be called, not calling
it. Best I can tell, on my Mac:

  • pass takes 15 nsec
  • x = 0 takes 25 nsec (that's a LOAD_CONST + STORE_FAST)
  • x = UserId(0) takes 95 nsec (using def UserId(a): return a)
  • x = cast(int, 0) takes 125 nsec (I guess that's the cost of the extra arg and looking up int)
  • x = int(0) takes 125 nsec (strange; its complex signature must slow it down)
  • x = id(0) takes 70 nsec (that's the fastest built-in function I could find)

So I think I'm okay with the cost of the alias function.

Some downsides of (B) include: two new primitives (NewType and
Implicit) rather than one (just NewType); and everybody will have to
look up what Implicit[X] means when they first encounter it.

Copy link

@JukkaL You write

(3.) Create something that behaves more or less like a subclass of int/str but is a regular int/str at runtime. This already kind of works via a __new__ hack, but it's not quite perfect as binary operations will likely have wrong result types by default (e.g. MyInt() + MyInt() is int instead of MyInt).

(4.) Create something that behaves exactly like another type such as str but that is not compatible at all with the another type. For example, create an alias Path for str, and don't support mixing Path and str objects in operations at all. Also, Path would not be a subtype of str. However, Path() + Path() would be okay.

Use cases 1, 2 and 4 could be implemented via wrapper objects, but this would have extra memory overhead, and potentially also speed impact.

I think the cases 3 and 4 are also covered by the Celsius class I wrote. Mypy believes Celsius(10) + Celsius(20) returns another Celsius and the type is distinct from int and all other type. There's no runtime overhead apart from the Celsius(...) invocations that make Mypy believe that we're dealing with Celsius objects when we're really dealing with ints all along.

Instead of trying to cover specific use cases, I was trying to model Haskell's newtype as closely as possible. Of course Haskell doesn't have a concept of (object oriented) classes and subtypes, so the semantics will have to be tweaked a little. But I would hope one could stay fairly close.

Copy link

JukkaL commented Mar 21, 2016

@gvanrossum I agree that A is better. Here's a more detailed proposal for how it could work:

  • C = typing.NewType('C', t) defines a new type that is treated by mypy equivalent to this (but there would be no such class definition at runtime):

    class C(t): 
        def __init__(self, _x: t): ...
  • At runtime NewType evaluates to an identity function with name 'C'. The function would have some attribute __foo__ that contains a reference to t (for introspection). At runtime instances of C would actually be instances of t. Potential implementation of NewType:

    def NewType(name, t):
        def f(_x): return _x
        f.__name__ = name
        f.__foo__ = t
        return f
  • Normal rules for subtyping and inheritance would apply. The type would not be special in other ways.

  • isinstance and issubclass would fail for C since function objects don't support these operations. This is actually good since there's no way to distinguish instances of C at runtime from instances of t.

  • C cannot be used as a base class, since it's not a real class. However, perhaps it could be used as the target type in another NewType definition. Not sure if the latter would be useful.

I think that this would cover the UserId use case pretty well:

  • Plain int can't be used when UserId is expected.
  • Another int-based NewType such as DocumentId can't be used used when UserId is expected.
  • UserId(1) + 1 is fine, but the result has type int and it can't be used when UserId is expected, so we'd prevent some operations on user ids indirectly (though not all of them).
  • Going from int to UserId is easy and reasonably efficient via UserId(x).
  • UserId is valid whenever int is expected, hopefully making migration from int to UserId easy.

This actually would not directly correspond to any of my four ideas from above. This would be a simplified variant of (3) where there is less new magic.

Some open issues:

  • Should it be okay to create new types based on Callable, Union and Tuple? This could be useful, but it would complicate the implementation somewhat. If yes, @PAStheLoD's example could work.
  • There would be no way to tweak or restrict the exposed interface of the type. In particular, all operations would be inherited from the target type. Thus the Celsius example wouldn't be directly supported. For example, Celsius + int would return int instead of Celsius.

To make the Celsius example work, we could perhaps use a slightly different definition:

class UserId(int): ...

This would be similar to the original proposal, but it would allow defining custom types for operations in the body of the class. However, this would be more confusing as the syntax would imply that this is a class, even though it's actually not a real class -- the class decorator would return a function. The name NewType / newtype also wouldn't work, as a class definition always defines a new type.

Copy link

JukkaL commented Mar 21, 2016

Most of the proposals would require a PEP change (or a custom extension).

Copy link

For PEP discussion please see python/typing#189. I'm going to shoot for the 3.5.2 deadline there too. (It's easy to shoot for since the release date hasn't been set yet. :-)

Copy link

I don't even know if this PEP change will go through, or when (even though I have it in the 3.5.2 milestone for the PEP -- that just means I want to think about it).

@gvanrossum gvanrossum changed the title Type aliases without implicit conversion ("newtype") Type aliases without implicit conversion ("Newtype") Jun 6, 2016
Copy link

So the PEP work on this feature (spec in pep-0484.txt and runtime support in is complete (python/typing#226) -- all we need is the mypy implementation now!

Copy link

wittekm commented Jun 6, 2016


@gvanrossum gvanrossum changed the title Type aliases without implicit conversion ("Newtype") Type aliases without implicit conversion ("NewType") Jun 6, 2016
Copy link

roganov commented Jul 20, 2016

Since's implementation of NewType returns new function every time, are T1 and T2 going to be different types?

T1 = NewType('T', int)
T2 = NewType('T', int)

Copy link

JukkaL commented Jul 20, 2016

The example would be rejected since the name passed to NewType doesn't match the assignment target. However, this would define two different types:

T1 = NewType('T1', int)
T2 = NewType('T2', int)

@gvanrossum gvanrossum modified the milestones: 0.5, Undetermined priority Jul 20, 2016
gvanrossum pushed a commit that referenced this issue Jul 28, 2016
This pull request implements NewType as described in PEP 484 and in issue #1284. It also adds a variety of test cases to verify NewType works correctly when used and misused and adds information about using NewType to the mypy docs.

Fixes #1284.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging a pull request may close this issue.

7 participants