# Data types as objects

Python is dynamically typed, meaning that types are determined at runtime, not at compile time.

In Python, you'll find yourself needing to compute the types of objects, and you can do so using `type()` function:

In [2]:
print(type(5))  # noqa: UP003

print(type(["foo", "bar"]))


<class 'int'>
<class 'list'>


Note that Python returns objects (`<class 'int'>`, `<class 'list'>`) in response to the calls to type.

Let's go one step further and let's check what's the type of the result of invoking `type()`:

In [3]:
print(type(type(5)))  # noqa: UP003


<class 'type'>


The object returned by `type()` is itself another class `<class 'type'>` &mdash; a class that represents types.

In Python, we can conclude that "type" and "class" are interchangeable concepts, and both of them are objects, like almost everything else.

## Using types

Types can be represented as instances of type object classes. As a result, any two Python objects can be compared (as you can do with other classes):

In [2]:
assert type("foo") == type("bar")  # noqa: E721

assert type("foo") is type("bar")

assert type("foo") is not type(5)

### Types and user-defined classes

One of the most common use cases related to types is to find whether a particular object is an instance of a class.

In [None]:
class A:
    ...

class B(A):
    ...

b = B()

# Print the type of the instance: <class '__main__.B'>
print(type(b))

# The same information is available in the
# __class__ attribute of the instance
print(b.__class__)

# Because it's regular Python object you can assign it to a variable
# and pass it around
b_class = b.__class__

assert b_class == B

# You can get the name of the class using the __name__ attribute
print(b_class.__name__)

# You can explore the class hierarchy using `__bases__`
# That returns a tuple with all the base classes
print(b_class.__bases__)

<class '__main__.B'>
<class '__main__.B'>
B
(<class '__main__.A'>,)


With:
+ `__class__` to get the type/class of an instance
+ `__bases__` to get the base classes of a class
+ `__name__` to get the name of a class

you can do a full analysis of the class inheritance structure associated with any instance.

However, Python provides better options for obtaining most of the information you typically need:
+ `isinstance(obj, cls)`: used to determine if the class passed is of the expected type.
+ `issubclass(cls1, cls2)`: used for classes, lets you determine if the given class is a subclass of other.


In [17]:
class C:
    ...

class D:
    ...

class E(D):
    ...

x = 12  # it's an int
c = C()
d = D()
e = E()

assert not isinstance(x, E)
assert not isinstance(c, E)
assert isinstance(e, E)
assert isinstance(e, D) # returns True even if it's a subclass
assert not isinstance(d, E) # but not the other way around
assert isinstance(x, type(5))
assert isinstance(x, int)

Note that E inherits from D. When using `isinstance(e, D)` it returns True, because all instances of `e` are also instances of `D`.

This becomes clearer when using `subclass(cls1, cls2)`

In [20]:
class C:
    ...

class D:
    ...

class E(D):
    ...

assert not issubclass(C, D)
assert issubclass(E, D)
assert not issubclass(D, E)
assert issubclass(D, D)

# if you don't have access to the class, you can rely on obj.__class__
# attribute
e = E()
assert issubclass(e.__class__, D)

Note that `issubclass(A, A)` returns True, and that the first argument to `issubclass()` is the class were're checking whether it is a subclass of the second one.

### Exercise

Suppose that you want to make sure that object `x` is a list before you try appending to it. What code would you use? What would be the difference between using `type()` and `isinstance()`? Would this be the "Look Before You Leap" (LBYL) or the "Easier to Ask Forgiveness than Permission" (EAFP) approach? What other options might you have besides checking the type explicitly?

In [25]:
x = [1, 2, 3]
target_l = []

if type(x) is type([]):
    target_l.append(x)

print(target_l)

# Alternatively
y = [4, 5, 6]
if isinstance(y, target_l.__class__):
    target_l.append(y)
print(target_l)

# Alternatively
z = [7, 8]
if isinstance(y, list):
    target_l.append(z)
print(target_l)

# Alternatively
a = [9]
if type(x) is list:
    target_l.append(a)
print(target_l)



[[1, 2, 3]]
[[1, 2, 3], [4, 5, 6]]
[[1, 2, 3], [4, 5, 6], [7, 8]]
[[1, 2, 3], [4, 5, 6], [7, 8], [9]]


This is the "LBYL" approach, because we're checking before performing the action.

## Duck Typing

Using `type(obj)`, `isinstance(obj, cls)`, and `issubclass(subcls, supercls)` makes it fairly easy to make code correctly determine and object's class or inheritance hierarchy.

Additionally, Python can work with *duck typing* (as in "if it walks like a duck and quacks like a duck, it probably is a duck").

The underlying idea is that Python's way of determining whether an object can be used in a particular scenario is not through the determination of its type (as it happens in Java), but rather in the interface of the given object.

For example, if an operation needs an iterator, the object doesn't need to be a subclass of any particular iterator class &mdash; as long as the object conforms to the expected interface of iterators to yield new objects everything will be fine.

Note that this kind of flexibility can sometimes allow bugs to slip by undetected. It is becoming more common nowadays to restrict duck typing in favor of explicit type checking through type hints that could be verified by a type checker.

## Special/dunder/magic methods

A special/dunder/magic method is an attribute of a Python class with special meaning to Python.

It's defined as a method but it isn't intended to be used directly by client code, but instead by Python itself in response to a demand made on an object of that class.

Special method attributes are marked by double underscore characters (thus the "dunder" as shortcut for double underscore) and they are also sometimes called "magic" methods.

The simplest example is the `__str__` method which is intended to return a user-readable representation of an instance.

In [26]:
class Color:
    def __init__(self, red, green, blue):
        self.red = red
        self.green = green
        self.blue = blue

    def __str__(self):
        # :d is for printing numbers in decimal format
        return f"Color: R={self.red:d}, G={self.green:d}, B={self.blue:d}"

c = Color(15, 35, 55)
print(c)

Color: R=15, G=35, B=55


You can see how `print()` is invoking behind the scenes `c.__str__()` method.

## Making an object behave like a list

This section explores how to use *duck typing* to make an object behave like a list.

Let's assume that you have a large text file containing records of people, with each:

```
John Smith::37::Springfield, Massachusetts, USA
Ellen Nelle::25::Springfield, Connecticut, USA
Dale McGladdery::29::Springfield, Hawaii, USA
```

The goal is to treat that text file as a list of lines, but without reading the entire text file in memory at once:

In [1]:
# This works, but as the file grows large it will cause an OOM
from pathlib import Path

with Path("./people.txt").open("r") as f:
    lines = f.readlines()

lines[4]

'Ellen Nelle::25::Springfield, Connecticut, USA\n'

The first step is to introduce the `__getitem__()` special method in a custom class.

This will enable the instances of that class to respond to list access syntax and semantics (`obj[n]` and `for x in obj`):

In [None]:
from pathlib import Path


class PeopleLineReader:
    def __init__(self, filename):
        self.file = Path(filename).open("r")  # noqa: SIM115

    def __getitem__(self, index):
        # we ignore the index
        line = self.file.readline()
        if line == "":
            self.file.close()
            raise IndexError
        # return only name and age
        return line.split("::")[:2]

for name, age in PeopleLineReader("people.txt"):
    print(f"{name=}, {age=}")

people = PeopleLineReader("people.txt")
people[3] # We're not correctly using the index

name='Jason Isaacs', age='61'
name='Mahersala Ali', age='50'
name='Zendaya', age='28'
name='John Smith', age='37'
name='Ellen Nelle', age='25'
name='Dale McGladdery', age='29'
name='Florence Pugh', age='28'
name='Margot Robbie', age='34'
name='Riz Ahmed', age='42'


['Jason Isaacs', '61']

Note that the example above is only intended to illustrate how to work with `__getitem__()` special method attribute and what it provides.

Let's look at another more comprehensive example, in which we define a `TypedList` object that is a list that can only contain elements of a given type.

In [11]:
class TypedList:
    def __init__(self, example_element, initial_list=None):
        self.type = type(example_element)
        if initial_list and not isinstance(initial_list, list):
            raise TypeError("Second argument of TypedList must be a list")
        for element in initial_list:
            if not isinstance(element, self.type):
                raise TypeError("Attempted to add an element of incorrect type to the list in initialization.")
        self.elements = initial_list[:]

    def __check_type(self, element):
        if type(element) is not self.type:
            raise TypeError("Attempted to add an element of incorrect type to the list.")

    def __setitem__(self, i, element):
        self.__check_type(element)
        self.elements[i] = element

    def __getitem__(self, i):
        return self.elements[i]

    def __str__(self):
        return f"TypedList: {self.elements}"


# typed list of strings initialized with 5 empty strings
# (see list initialization techniques)
print(5 * [""])
print(3 * ["ab"])
print(2 * ["abc"])

x = TypedList("", 5 * [""])
print(x)

x[2] = "Hello"
x[3] = "to"
x[4] = "Jason Isaacs"

print(f"{x[2]}-{x[3]}-{x[4]}")

a, b, c, d, e = x
print(f"{a=}, {b=}, {c=}, {d=}, {e=}")

['', '', '', '', '']
['ab', 'ab', 'ab']
['abc', 'abc']
TypedList: ['', '', '', '', '']
Hello-to-Jason Isaacs
a='', b='', c='Hello', d='to', e='Jason Isaacs'


This can be enhanced with:

+ `__len__` to make the object respond to `len(x)`
+ `__delitem__` to make it respond to `del x[i]`
+ `__add__` to allow list concatenation as in `x + y`
+ `__mul__` (and `__rmul__`) to allow for list initialization as in `5 * x`

Additionally, we could define an `append(elem)` method.

In [22]:
class TypedList:
    def __init__(self, example_element, initial_list=None):
        self.type = type(example_element)
        if initial_list:
            if not isinstance(initial_list, list):
                raise TypeError("Second argument of TypedList must be a list")
            for element in initial_list:
                if not isinstance(element, self.type):
                    raise TypeError("Attempted to add an element of incorrect type to the list in initialization.")
            self.elements = initial_list[:]
        else:
            self.elements = []

    def __check_type(self, element):
        if type(element) is not self.type:
            raise TypeError("Attempted to add an element of incorrect type to the list.")

    def __setitem__(self, i, element):
        self.__check_type(element)
        self.elements[i] = element

    def __getitem__(self, i):
        return self.elements[i]

    def __len__(self):
        return len(self.elements)

    def append(self, element):
        self.__check_type(element)
        self.elements.append(element)

    def __delitem__(self, i):
        del self.elements[i]

    def __add__(self, typed_list):
        if self.type is not typed_list.type:
            raise TypeError("Attempted to concatenate lists of different types.")
        return TypedList(self.elements[0], self.elements + typed_list.elements)

    def __rmul__(self, num):
        return TypedList(self.elements[0], num * self.elements)

    def __mul__(self, num):
        return self.__rmul__(num)

    def __str__(self):
        return f"TypedList: {self.elements}"


x = TypedList("example")
assert len(x) == 0

x.append("one")
assert len(x) == 1
assert x[0] == "one"

del x[0]
assert len(x) == 0

# list concatenation: first with real lists
a = ["one", "two", "three"]
b = ["one", "two"]
print(a + b)


# now with TypedLists
x.append("one")
x.append("two")
x.append("three")

y = TypedList("example", ["uno", "dos"])
z = x + y
print(z)

# mul
x = TypedList(0, [123])
y = 5 * x
print(y)

x = TypedList(0, [321])
y = x * 5
print(y)



['one', 'two', 'three', 'one', 'two']
TypedList: ['one', 'two', 'three', 'uno', 'dos']
TypedList: [123, 123, 123, 123, 123]
TypedList: [321, 321, 321, 321, 321]


## Subclassing from built-in types

Instead of creating a class for a typed list as done in the previous examples, it is possible to subclass the list type and override only the methods that need to be different.

By doing so, you wil inherit all of the list operations out of the box.


In [30]:
class TypedList(list):
    def __init__(self, example_element, initial_list=None):
        self.type = type(example_element)
        if initial_list:
            if not isinstance(initial_list, list):
                raise TypeError("Second argument of TypedList must be a list")
            for element in initial_list:
                if not isinstance(element, self.type):
                    raise TypeError("Attempted to add an element of incorrect type to the list in initialization.")
        super().__init__(initial_list if initial_list else [])

    def __setitem__(self, i, element):
        if type(element) is not self.type:
            raise TypeError("Attempted to add an element of incorrect type")
        super().__setitem__(i, element)

In [34]:
# testing the implementation
x = TypedList("")
assert len(x) == 0

x.append("one")
print(x)
assert x[0] == "one"

x[0] = "uno"
assert x[0] == "uno"

z = 3 * x
print(z)

['one']
['uno', 'uno', 'uno']


### Subclassing `UserList`

An alternative approach to the previous one is subclassing the `UserList`, a wrapper class found in the `collections` module.

This class exposes the underlying list as the `data` attribute, which simplifies how we interact with the underlying list:

In [35]:
from collections import UserList


class TypedList(UserList):
    def __init__(self, example_element, initial_list=None):
        self.type = type(example_element)
        if initial_list:
            if not isinstance(initial_list, list):
                raise TypeError("Second argument of TypedList must be a list")
            for element in initial_list:
                if not isinstance(element, self.type):
                    raise TypeError("Attempted to add an element of incorrect type to the list in initialization.")
        super().__init__(initial_list if initial_list else [])

    def __setitem__(self, i, element):
        if type(element) is not self.type:
            raise TypeError("Attempted to add an element of incorrect type")
        self.data[i] = element

# testing the implementation
x = TypedList("")
assert len(x) == 0

x.append("one")
print(x)
assert x[0] == "one"

x[0] = "uno"
assert x[0] == "uno"

z = 3 * x
print(z)

['one']
[]


Note that multiplication does not work out of the box with `UserList`.

## When to use special method attributes

You should be cautious with the use of special method attributes. What might be natural for you as the class designer might feel complicated for the class consumers.

As a rule of thumb, use the special methods:
+ If you have a class that behaves like a Python built-in type, for example, sequence-like objects, or Math-type objects.
+ If you have a class that behaves identically or almost identically to a built-in class (for example, when implementing lists with an optimized underlying implementation).

That doesn't apply for `__str__` and `__repr__` which you should always implement in your classes.

### Exercise

Create a dictionary that only allows strings for both keys and values by subclassing the `dict` type.

In [43]:
class StringDict(dict):
    def __init__(self):
        super().__init__()

    def __setitem__(self, key, value):
        if not isinstance(key, str):
            raise TypeError("key must be a string")
        if not isinstance(value, str):
            raise TypeError("value must be a string")
        return super().__setitem__(key, value)

str_dict = StringDict()
str_dict["one"] = "uno"
str_dict["two"] = "dos"

assert len(str_dict) == 2
assert str_dict["one"] == "uno"
assert str_dict["two"] == "dos"

for k, v in str_dict.items():
    print(f"{v!r} is Spanish for {k!r}")

print("Counting in English")
for k in str_dict:
    print(k)

print("Counting in Spanish")
for v in str_dict.values():
    print(v)

'uno' is Spanish for 'one'
'dos' is Spanish for 'two'
Counting in English
one
two
Counting in Spanish
uno
dos


That works well when the string dictionary is initialized as empty, but we'd like to have the same flexibility we have with the `dict`: