# Pythonic Code

In programming, an idiom is a particular way of writing code in order to perform a specific task. It is something common that repeats and follows the same structure every time. Some could even argue and call them a pattern, but be careful because they are not designed patterns (which we will explore later on). The main difference is that design patterns are high-level ideas, independent from the language (sort of), but they do not translate into code immediately. On the other hand, idioms are actually coded. It is the way things should be written when we want to perform a particular task.

As idioms are code, they are language dependent. Every language will have its own idioms, which means the way things are done in that particular language (for example, how you would open and write a file in C, C++, and so on). When the code follows these idioms, it is known as being idiomatic, which in Python is often referred to as Pythonic.

There are multiple reasons to follow these recommendations and write Pythonic code first (as we will see and analyze), writing code in an idiomatic way usually performs better. It is also more compact and easier to understand. These are traits that we always want in our code so that it works effectively. Secondly, as introduced in the previous chapter, it is important that the entire development team can get used to the same patterns and structure of the code because this will help them focus on the true essence of the problem, and will help them avoid making mistakes.

## Indexes and Slices - Creating your own sequeces

**\_\_getitem\_\_**  is the method that is called, when something like **myobject[key]** is called, passing the key (value inside the square brackets) as a parameter. A sequence, in particular, is an object that implements both **\_\_getitem\_\_** and **\_\_len\_\_**, and for this reason, it can be iterated over. Lists, tuples, and strings are examples of sequence objects in the standard library.

In the case that your class is a wrapper around a standard library object, you might as well delegate the behavior as much as possible to the underlying object. This means that if your class is actually a wrapper on the list, call all of the same methods on that list to make sure that it remains compatible. In the following listing, we can see an example of how an object wraps a list, and for the methods we are interested in, we just delegate to its corresponding version on the list object:

In [1]:
class Items:
    def __init__(self, *values):
        self._values = list(values)

    def __len__(self):
        return len(self._values)

    def __getitem__(self, item):
        return self._values.__getitem__(item)

If, however, you are implementing your own sequence, that is not a wrapper or does not rely on any built-in object underneath, then keep in mind the following points:
  - When indexing by a range, the result should be an instance of the same type of the class
  - In the range provided by the slice, respect the semantics that Python uses, excluding the element at the end

## Context Managers

Context managers are a distinctively useful feature that Python provides. The reason why they are so useful is that they correctly respond to a pattern. The pattern is actually every situation where we want to run some code, and has preconditions and postconditions, meaning that we want to run things before and after a certain main action.

Most of the time, we see context managers around resource management. For example, on situations when we open files, we want to make sure that they are closed after processing (so we do not leak file descriptors), or if we open a connection to a service (or even a socket), we also want to be sure to close it accordingly, or when removing temporary files, and so on.

 

In all of these cases, you would normally have to remember to free all of the resources that were allocated and that is just thinking about the best case—but what about exceptions and error handling? Given the fact that handling all possible combinations and execution paths of our program makes it harder to debug, the most common way of addressing this issue is to put the cleanup code on a finally block so that we are sure we do not miss it. For example, a very simple case would look like the following:

In [9]:
filename = "foo.txt"
def process_file(fd):
    pass

fd = open(filename)
try:
    process_file(fd)
finally:
    fd.close()

Nonetheless, there is a much elegant and Pythonic way of achieving the same thing:

In [10]:
with open(filename) as fd:
    process_file(fd)

The with statement (PEP-343) enters the context manager. In this case, the open function implements the context manager protocol, which means that the file will be automatically closed when the block is finished, even if an exception occurred.

Context managers consist of two magic methods: **\_\_enter\_\_** and **\_\_exit\_\_**. On the first line of the context manager, the with statement will call the first method, **\_\_enter\_\_**, and whatever this method returns will be assigned to the variable labeled after as. This is optional—we don't really need to return anything specific on the **\_\_enter\_\_** method, and even if we do, there is still no strict reason to assign it to a variable if it is not required.

After this line is executed, the code enters a new context, where any other Python code can be run. After the last statement on that block is finished, the context will be exited, meaning that Python will call the __exit__ method of the original context manager object we first invoked.

If there is an exception or error inside the context manager block, the **\_\_exit\_\_** method will still be called, which makes it convenient for safely managing cleaning up conditions. In fact, this method receives the exception that was triggered on the block in case we want to handle it in a custom fashion.

Despite the fact that context managers are very often found when dealing with resources (like the example we mentioned with files, connections, and so on), this is not the sole application they have. We can implement our own context managers in order to handle the particular logic we need.

 

Context managers are a good way of separating concerns and isolating parts of the code that should be kept independent, because if we mix them, then the logic will become harder to maintain.

As an example, consider a situation where we want to run a backup of our database with a script. The caveat is that the backup is offline, which means that we can only do it while the database is not running, and for this we have to stop it. After running the backup, we want to make sure that we start the process again, regardless of how the process of the backup itself went. Now, the first approach would be to create a huge monolithic function that tries to do everything in the same place, stop the service, perform the backup task, handle exceptions and all possible edge cases, and then try to restart the service again. You can imagine such a function, and for that reason, I will spare you the details, and instead come up directly with a possible way of tackling this issue with context managers:

In [13]:
def run(cmd):
    print(cmd)

def stop_database():
    run("systemctl stop postgresql.service")


def start_database():
    run("systemctl start postgresql.service")


class DBHandler:
    def __enter__(self):
        stop_database()
        return self

    def __exit__(self, exc_type, ex_value, ex_traceback):
        start_database()


def db_backup():
    run("pg_dump database")


def main():
    with DBHandler():
        db_backup()

In this example, we don't need the result of the context manager inside the block, and that's why we can consider that, at least for this particular case, the return value of **\_\_enter\_\_** is irrelevant. This is something to take into consideration when designing context managers—what do we need once the block is started? As a general rule, it should be good practice (although not mandatory), to always return something on the **\_\_enter\_\_**.

 

In this block, we only run the task for the backup, independently from the maintenance tasks, as we saw previously. We also mentioned that even if the backup task has an error, the **\_\_exit\_\_** will still be called.

Notice the signature of the **\_\_exit\_\_** method. It receives the values for the exception that was raised on the block. If there was no exception on the block, they are all none.

The return value of **\_\_exit\_\_** is something to consider. Normally, we would want to leave the method as it is, without returning anything in particular. If this method returns True, it means that the exception that was potentially raised; it will not propagate to the caller and will stop there. Sometimes, this is the desired effect, maybe even depending on the type of exception that was raised, but in general it is not a good idea to swallow the exception. Remember: errors should never pass silently.

## Implementing context managers

In general, we can implement context managers like the one in the previous example. All we need is just a class that implements the **\_\_enter\_\_** and **\_\_exit\_\_** magic methods, and then that object will be able to support the context manager protocol. While this is the most common way for context managers to be implemented, it is not the only one.

In this section, we will see not only different (sometimes more compact) ways of implementing context managers but also how to take full advantage of them by using the standard library, in particular with the contextlib module.

The contextlib module contains a lot of helper functions and objects to either implement context managers or use some already provided ones that can help us write more compact code.

Let's start by looking at the contextmanager decorator.

When the contextlib.contextmanager decorator is applied to a function, it converts the code on that function into a context manager. The function in question has to be a particular kind of function called a generator function, which will separate the statements into what is going to be on the **\_\_enter\_\_** and **\_\_exit\_\_** magic methods, respectively.

 

If at this point you are not familiar with decorators and generators, this is not a problem because the examples we will be looking at will be self-contained, and the recipe or idiom can be applied and understood regardless. These topics are discussed in detail in Chapter 7, Using Generators.

The equivalent code of the previous example can be rewritten with the contextmanager decorator like this:

In [14]:
import contextlib

@contextlib.contextmanager
def db_handler():
    stop_database()
    yield
    start_database()


with db_handler():
    db_backup()

systemctl stop postgresql.service
pg_dump database
systemctl start postgresql.service


Here, we define the generator function and apply the **@contextlib.contextmanager** decorator to it. The function contains a yield statement, which makes it a generator function. Again, details on generators are not relevant in this case. All we need to know is that when this decorator is applied, everything before the yield statement will be run as if it were part of the **\_\_enter\_\_** method. Then, the yielded value is going to be the result of the context manager evaluation (what \_\_enter\_\_ would return), and what would be assigned to the variable if we chose to assign it like as **x:—in** this case, nothing is yielded (which means the yielded value will be none, implicitly), but if we wanted to, we could yield a statement which will become something we might want to use inside the context manager block.

At that point, the generator function is suspended, and the context manager is entered, where, again, we run the backup code for our database. After this completes, the execution resumes, so we can consider that every line that comes after the yield statement will be part of the **\_\_exit\_\_** logic.

Writing context managers like this has the advantage that it is easier to refactor existing functions, reuse code, and in general is a good idea when we need a context manager that doesn't belong to any particular object. Adding the extra magic methods would make another object of our domain more coupled, with more responsibilities, and supporting something that it probably shouldn't. When we just need a context manager function, without preserving many states, and completely isolated and independent from the rest of our classes, this is probably a good way to go.

 

There are, however, more ways in which we can implement context manager, and once again, the answer is in the contextlib package from the standard library.

Another helper we could use is **contextlib.ContextDecorator**. This is a mixin base class that provides the logic for applying a decorator to a function that will make it run inside the context manager, while the logic for the context manager itself has to be provided by implementing the aforementioned magic methods.

In order to use it, we have to extend this class and implement the logic on the required methods:


In [15]:
class dbhandler_decorator(contextlib.ContextDecorator):
    def __enter__(self):
        stop_database()

    def __exit__(self, ext_type, ex_value, ex_traceback):
        start_database()


@dbhandler_decorator()
def offline_backup():
    run("pg_dump database")

Do you notice something different from the previous examples? There is no with statement. We just have to call the function, and **offline_backup()** will automatically run inside a context manager. This is the logic that the base class provides to use it as a decorator that wraps the original function so that it runs inside a context manager.

The only downside of this approach is that by the way the objects work, they are completely independent (which is a good trait)—the decorator doesn't know anything about the function that is decorating, and vice versa. This, however good, means that you cannot get an object that you would like to use inside the context manager (for example, assigning with offline_backup() as bp:), so if you really need to use the object returned by the **\_\_exit\_\_** method, one of the previous approaches will have to be the one of choice.

Being a decorator, this also poses the advantage that the logic is defined only once, and we can reuse it as many times as we want by simply applying the decorators to other functions that require the same invariant logic.

Let's explore one last feature of contextlib, to see what we can expect from context managers and get an idea of the sort of thing we could use them for.

Note that **contextlib.suppress** is a **util** package that enters a context manager, which, if one of the provided exceptions is raised, doesn't fail. It's similar to running that same code on a try/except block and passing an exception or logging it, but the difference is that calling the suppress method makes it more explicit that those exceptions that are controlled as part of our logic.

For example, consider the following code:

In [17]:
import contextlib

with contextlib.suppress(DataConversionException):
      parse_data(input_json_or_dict)

NameError: name 'DataConversionException' is not defined

## Properties, attributes, and different types of methods for objects

All of the properties and functions of an object are public in Python, which is different from other languages where properties can be public, private, or protected. That is, there is no point in preventing caller objects from invoking any attributes an object has. This is another difference with respect to other programming languages in which you can mark some attributes as private or protected.

There is no strict enforcement, but there are some conventions. An attribute that starts with an underscore is meant to be private to that object, and we expect that no external agent calls it (but again, there is nothing preventing this).

Before jumping into the details of properties, it's worth mentioning some traits of underscores in Python, understanding the convention, and the scope of attributes.

## Underscores in Python

There are some conventions and implementation details that make use of underscores in Python, which is an interesting topic that's worthy of analysis.
 

Like we mentioned previously, by default all attributes of an object are public. Consider the following example to illustrate this:

In [19]:
class Connector:
    def __init__(self, source):
        self.source = source
        self._timeout = 60
 
conn = Connector("postgresql://localhost")
conn.source
'postgresql://localhost'
conn._timeout
60
conn.__dict__
{'source': 'postgresql://localhost', '_timeout': 60}

{'source': 'postgresql://localhost', '_timeout': 60}

Here, a **Connector** object is created with **source**, and it starts with two attributes—the aforementioned **source** and **timeout**. The former is public, and the latter private. However, as we can see from the following lines when we create an object like this, we can actually access both of them.

The interpretation of this code is that **\_timeout** should be accessed only within connector itself and never from a caller. This means that you should organize the code in a way so that you can safely refactor the timeout at all of the times it's needed, relying on the fact that it's not being called from outside the object (only internally), hence preserving the same interface as before. Complying with these rules makes the code easier to maintain and more robust because we don't have to worry about ripple effects when refactoring the code if we maintain the interface of the object. The same principle applies to methods as well.

This is the Pythonic way of clearly delimiting the interface of an object. There is, however, a common misconception that some attributes and methods can be actually made private. This is, again, a misconception. Let's imagine that now the timeout attribute is defined with a double underscore instead:

In [23]:
class Connector:
    def __init__(self, source):
        self.source = source
        self.__timeout = 60
    def connect(self):
        print("connecting with {0}s".format(self.__timeout))
        # ...

conn = Connector("postgresql://localhost")
conn.connect()
conn.__timeout

connecting with 60s


AttributeError: 'Connector' object has no attribute '__timeout'

Some developers use this method to hide some attributes, thinking, like in this example, that timeout is now private and that no other object can modify it. Now, take a look at the exception that is raised when trying to access **\_\_timeout**. It's **AttributeErro**r, saying that it doesn't exist. It doesn't say something like **"this is private"** or **"this can't be accessed"** and so on. It says it does not exist. This should give us a clue that, in fact, something different is happening and that this behavior is instead just a side effect, but not the real effect we want.

What's actually happening is that with the double underscores, Python creates a **different name** for the attribute (**this is called name mangling**). What it does is create the attribute with the following name instead: $"_<class-name>__<attribute-name>"$. In this case, an attribute named **'\_Connector\_\_timeout'**, will be created, and such an attribute can be accessed (and modified) as follows:



In [24]:
vars(conn)
conn._Connector__timeout
conn._Connector__timeout = 30
conn.connect()

connecting with 30s


## Properties

When the object needs to just hold values, we can use regular attributes. Sometimes, we might want to do some computations based on the state of the object and the values of other attributes. Most of the time, properties are a good choice for this.

Properties are to be used when we need to define access control to some attributes in an object, which is another point where Python has its own way of doing things. In other programming languages (like Java), you would create access methods (getters and setters), but idiomatic Python would use properties instead.

Imagine that we have an application where users can register and we want to protect certain information about the user from being incorrect, such as their email, as shown in the following code:

In [26]:
import re

EMAIL_FORMAT = re.compile(r"[^@]+@[^@]+\.[^@]+")


def is_valid_email(potentially_valid_email: str):
    return re.match(EMAIL_FORMAT, potentially_valid_email) is not None


class User:
    def __init__(self, username):
        self.username = username
        self._email = None

    @property
    def email(self):
        return self._email

    @email.setter
    def email(self, new_email):
        if not is_valid_email(new_email):
            raise ValueError(f"Can't set {new_email} as it's not a valid email")
        self._email = new_email

By putting email under a property, we obtain some advantages for free. In this example, the first **@property** method will return the value held by the private attribute **email**. As mentioned earlier, the leading underscore determines that this attribute is intended to be used as private, and therefore should not be accessed from outside this class.

Then, the second method uses ``` @email.setter ```, with the already defined property of the previous method. This is the one that is going to be called when ``` <user>.email = <new_email> ``` runs from the caller code, and ``` <new_email> ``` will become the parameter of this method. Here, we explicitly defined a validation that will fail if the value that is trying to be set is not an actual email address. If it is, it will then update the attribute with the new value as follows:

In [28]:
u1 = User("jsmith")
u1.email = "jsmith@"

ValueError: Can't set jsmith@ as it's not a valid email

In [29]:
u1.email = "jsmith@g.co"
u1.email

'jsmith@g.co'

This approach is much more compact than having custom methods prefixed with **get\_** or **set\_**. It's clear what is expected because it's just email.

You might find that properties are a good way to achieve command and query separation (CC08). Command and query separation state that a method of an object should either answer to something or do something, but not both. If a method of an object is doing something and at the same time it returns a status answering a question of how that operation went, then it's doing more than one thing, clearly violating the principle that functions should do one thing, and one thing only.

Depending on the name of the method, this can create even more confusion, making it harder for readers to understand what the actual intention of the code is. For example, if a method is called set_email, and we use it as if self.set_email("a@j.com"): ..., what is that code doing? Is it setting the email to a@j.com? Is it checking if the email is already set to that value? Both (setting and then checking if the status is correct)?

 

With properties, we can avoid this kind of confusion. The **@property** decorator is the query that will answer to something, and the **@<property_name>.setter** is the command that will do something.

Another piece of good advice derived from this example is as follows—don't do more than one thing on a method. If you want to assign something and then check the value, break that down into two or more sentences.

## Iterable objects

In Python, we have objects that can be iterated by default. For example, lists, tuples, sets, and dictionaries can not only hold data in the structure we want but also be iterated over a for loop to get those values repeatedly.

However, the built-in iterable objects are not the only kind that we can have in a for loop. We could also create our own iterable, with the logic we define for iteration.

In order to achieve this, we rely on, once again, magic methods.

Iteration works in Python by its own protocol (namely the iteration protocol). When you try to iterate an object in the form ```for e in myobject:...```, what Python checks at a very high level are the following two things, in order:
  - If the object contains one of the iterator methods: ```__next__``` or ```__iter__```
  - If the object is a sequence and has ```__len__``` and ```__getitem__```

## Creating iterable objects

When we try to iterate an object, Python will call the `iter()` function over it. One of the first things this function checks for is the presence of the `__iter__` method on that object, which, if present, will be executed.

 

The following code creates an object that allows iterating over a range of dates, producing one day at a time on every round of the loop:



In [33]:
from datetime import timedelta
from datetime import date

class DateRangeIterable:
    """An iterable that contains its own iterator object."""

    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date
        self._present_day = start_date

    def __iter__(self):
        return self

    def __next__(self):
        if self._present_day >= self.end_date:
            raise StopIteration
        today = self._present_day
        self._present_day += timedelta(days=1)
        return today

for day in DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5)):
    print(day)

2018-01-01
2018-01-02
2018-01-03
2018-01-04


Here, the for loop is starting a new iteration over our object. At this point, Python will call the `iter()` function on it, which in turn will call the `__iter__` magic method. On this method, it is defined to return self, indicating that the object is an iterable itself, so at that point every step of the loop will call the `next()` function on that object, which delegates to the `__next__` method. In this method, we decide how to produce the elements and return one at a time. When there is nothing else to produce, we have to signal this to Python by raising the `StopIteration` exception.

This means that what is actually happening is similar to Python calling `next()` every time on our object until there is a StopIteration exception, on which it knows it has to stop the for loop:

In [34]:
r = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
print(next(r))
print(next(r))
print(next(r))
print(next(r))
print(next(r))

2018-01-01
2018-01-02
2018-01-03
2018-01-04


StopIteration: 

This example works, but it has a small problem—once exhausted, the iterable will continue to be empty, hence raising `StopIteration`. This means that if we use this on two or more consecutive `for` loops, only the first one will work, while the second one will be empty:

In [35]:
r1 = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
", ".join(map(str, r1))

'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'

In [36]:
max(r1)

ValueError: max() arg is an empty sequence

This is because of the way the iteration protocol works— an iterable constructs an iterator, and this one is the one being iterated over. In our example, `__iter__` just returned self, but we can make it create a new iterator every time it is called. One way of fixing this would be to create new instances of `DateRangeIterable`, which is not a terrible issue, but we can make `__iter__` use a generator (which are iterator objects), which is being created every time:

In [37]:
class DateRangeContainerIterable:
    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date

    def __iter__(self):
        current_day = self.start_date
        while current_day < self.end_date:
            yield current_day
            current_day += timedelta(days=1)

In [38]:
r1 = DateRangeContainerIterable(date(2018, 1, 1), date(2018, 1, 5))
", ".join(map(str, r1))

'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'

In [39]:
max(r1)

datetime.date(2018, 1, 4)

The difference is that each for loop is calling `__iter__` again, and each one of those is creating the generator again.

This is called a container iterable.

Details on generators will be explained in more detail in Chapter 7, Using Generators.

## Creating sequences

Maybe our object does not define the `__iter__()` method, but we still want to be able to iterate over it. If `__iter__` is not defined on the object, the `iter()` function will look for the presence of `__getitem__`, and if this is not found, it will raise `TypeError`.

A sequence is an object that implements `__len__` and `__getitem__` and expects to be able to get the elements it contains, one at a time, in order, starting at zero as the first index. This means that you should be careful in the logic so that you correctly implement `__getitem__` to expect this type of index, or the iteration will not work.

The example from the previous section had the advantage that it uses less memory. This means that is only holding one date at a time, and knows how to produce the days one by one. However, it has the drawback that if we want to get the nth element, we have no way to do so but iterate n-times until we reach it. This is a typical trade-off in computer science between memory and CPU usage.

 

The implementation with an iterable will use `less memory`, but it takes up to `O(n)` to get an element, whereas implementing a sequence will use `more memory` (because we have to hold everything at once), but supports indexing in constant time, `O(1)`.

This is what the new implementation might look like:

In [40]:
class DateRangeSequence:
    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date
        self._range = self._create_range()

    def _create_range(self):
        days = []
        current_day = self.start_date
        while current_day < self.end_date:
            days.append(current_day)
            current_day += timedelta(days=1)
        return days

    def __getitem__(self, day_no):
        return self._range[day_no]

    def __len__(self):
        return len(self._range)

In [41]:
s1 = DateRangeSequence(date(2018, 1, 1), date(2018, 1, 5))
", ".join(map(str, s1))

'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'

In [42]:
s1[0]

datetime.date(2018, 1, 1)

In the preceding code, we can see that negative indices also work. This is because the `DateRangeSequence` object delegates all of the operations to its wrapped object `(a list)`, which is the best way to maintain compatibility and a consistent behavior.

# Container Objects

Containers are objects that implement a `__contains__` method (that usually returns a Boolean value). This method is called in the presence of the in keyword of Python.

Something like the following:

`element in container`

When used in Python becomes this:

`container.__contains__(element)`

You can imagine how much more readable (and Pythonic!) the code can be when this method is properly implemented.

Let's say we have to mark some points on a map of a game that has two-dimensional coordinates. We might expect to find a function like the following:

In [43]:
def mark_coordinate(grid, coord):
    if 0 <= coord.x < grid.width and 0 <= coord.y < grid.height:
        grid[coord] = MARKED

Now, the part that checks the condition of the first `if` statement seems convoluted; it doesn't reveal the intention of the code, it's not expressive, and worst of all it calls for code duplication (every part of the code where we need to check the boundaries before proceeding will have to repeat that if statement).

What if the map itself (called `grid` on the code) could answer this question? Even better, what if the map could delegate this action to an even smaller (and hence more cohesive) object? Therefore, we can ask the map if it contains a coordinate, and the map itself can have information about its limit, and ask this object the following:

In [44]:
class Boundaries:
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def __contains__(self, coord):
        x, y = coord
        return 0 <= x < self.width and 0 <= y < self.height


class Grid:
    def __init__(self, width, height):
        self.width = width
        self.height = height
        self.limits = Boundaries(width, height)

    def __contains__(self, coord):
        return coord in self.limits

This code alone is a much better implementation. First, it is doing a simple composition and it's using delegation to solve the problem. Both objects are really cohesive, having the minimal possible logic; the methods are short, and the logic speaks for `itself` — coord in `self.limits` is pretty much a declaration of the problem to solve, expressing the intention of the code.

From the outside, we can also see the benefits. It's almost as if Python is solving the problem for us:

In [45]:
def mark_coordinate(grid, coord):
    if coord in grid:
        grid[coord] = MARKED

# Dynamic attributes for objects

It is possible to control the way attributes are obtained from objects by means of the `__getattr__` magic method. When we call something like `<myobject>.<myattribute>`, Python will look for `<myattribute>` in the dictionary of the object, calling `__getattribute__` on it. If this is not found (namely, the object does not have the attribute we are looking for), then the extra method, `__getattr__`, is called, passing the name of the attribute (`myattribute`) as a parameter. By receiving this value, we can control the way things should be returned to our objects. We can even create new attributes, and so on.

In the following listing, the `__getattr__` method is demonstrated:


In [46]:
class DynamicAttributes:

    def __init__(self, attribute):
        self.attribute = attribute

    def __getattr__(self, attr):
        if attr.startswith("fallback_"):
            name = attr.replace("fallback_", "")
            return f"[fallback resolved] {name}"
        raise AttributeError(
            f"{self.__class__.__name__} has no attribute {attr}"
        )

In [47]:
dyn = DynamicAttributes("value")
dyn.attribute

'value'

In [48]:
dyn.fallback_test

'[fallback resolved] test'

In [50]:
dyn.__dict__["fallback_new"] = "new value"
dyn.fallback_new

'new value'

In [52]:
getattr(dyn, "something", "default")

'default'

The first call is straightforward—we just request an attribute that the object has and get its value as a result. The second is where this method takes action because the object does not have anything called fallback_test, so the `__getattr__` will run with that value. Inside that method, we placed the code that returns a string, and what we get is the result of that transformation.

The third example is interesting because there a new attribute named fallback_new is created (actually, this call would be the same as running dyn.fallback_new = "new value"), so when we request that attribute, notice that the logic we put in `__getattr__` does not apply, simply because that code is never called.

Now, the last example is the most interesting one. There is a subtle detail here that makes a huge difference. Take another look at the code in the `__getattr__` method. Notice the exception it raises when the value is not retrievable `AttributeError`. This is not only for consistency (as well as the message in the exception) but also required by the built-in `getattr()` function. Had this exception been any other, it would raise, and the default value would not be returned.



# Callable objects

It is possible (and often convenient) to define objects that can act as functions. One of the most common applications for this is to create better decorators, but it's not limited to that.

The magic method `__call__` will be called when we try to execute our object as if it were a regular function. Every argument passed to it will be passed along to the `__call__` method.

The main advantage of implementing functions this way, through objects, is that objects have states, so we can save and maintain information across calls.

When we have an object, a statement like this object`(*args, **kwargs)` is translated in Python to `object.__call__(*args, **kwargs)`.

This method is useful when we want to create callable objects that will work as parametrized functions, or in some cases functions with memory.

The following listing uses this method to construct an object that when called with a parameter returns the number of times it has been called with the very same value:

In [53]:
from collections import defaultdict

class CallCount:

    def __init__(self):
        self._counts = defaultdict(int)

    def __call__(self, argument):
        self._counts[argument] += 1
        return self._counts[argument]

In [56]:
cc = CallCount()
print(cc(1))
print(cc(2))
print(cc(1))
print(cc(1))
print(cc("something"))

1
1
2
3
1


# Summary of magic methods

We can summarize the concepts we described in the previous sections in the form of a cheat sheet like the one presented as follows. For each action in Python, the magic method involved is presented, along with the concept that it represents:

| Statement                             | Magic Method                | Python Concept                |
|---------------------------------------|-----------------------------|-------------------------------|
| `obj[key]`  `obj[i:j]`   `obj[i:j:k]` | `__getitem__(key)`          | Subscriptable object          |
| `with obj: ...`                       | `__enter__` / `__exit__`    | Context manager               |
| `for i in obj: ...`                   | `__iter__` / `__next__`     | Iterable Object               |
| `for i in obj: ...`                   | `__len__ / __getitem__`     | Sequence                      |
| `obj.<attribute>`                     | `__getattr__`               | `Dynamic attribute retrieval` |
| `obj(*args, **kwargs)`                | `__call__(*args, **kwargs)` | Callable object               |

## Caveats in Python

Besides understanding the main features of the language, being able to write idiomatic code is also about being aware of the potential problems of some idioms, and how to avoid them. In this section, we will explore common issues that might cause you long debugging sessions if they catch you off guard.

Most of the points discussed in this section are things to avoid entirely, and I will dare to say that there is almost no possible scenario that justifies the presence of the anti-pattern (or idiom, in this case). Therefore, if you find this on the code base you are working on, feel free to refactor it in the way that is suggested. If you find these traits while doing a code review, this is a clear indication that something needs to change.

## Mutable default arguments

Simply put, don't use mutable objects as the default arguments of functions. If you use mutable objects as default arguments, you will get results that are not the expected ones.

Consider the following erroneous function definition:

In [57]:
def wrong_user_display(user_metadata: dict = {"name": "John", "age": 30}):
    name = user_metadata.pop("name")
    age = user_metadata.pop("age")

    return f"{name} ({age})"

This has two problems, actually. Besides the default mutable argument, the body of the function is mutating a mutable object, hence creating a side effect. But the main problem is the default argument for user_medatada.

This will actually only work the first time it is called without arguments. For the second time, we call it without explicitly passing something to user_metadata. It will fail with a KeyError, like so:



In [58]:
wrong_user_display()

'John (30)'

In [59]:
wrong_user_display({"name": "Jane", "age": 25})

'Jane (25)'

In [60]:
wrong_user_display()

KeyError: 'name'

The explanation is simple—by assigning the dictionary with the default data to user_metadata on the definition of the function, this dictionary is actually created once and the variable user_metadata points to it. The body of the function modifies this object, which remains alive in memory so long as the program is running. When we pass a value to it, this will take the place of the default argument we just created. When we don't want this object it is called again, and it has been modified since the previous run; the next time we run it, will not contain the keys since they were removed on the previous call.

The fix is also simple—we need to use None as a default sentinel value and assign the default on the body of the function. Because each function has its own scope and life cycle, user_metadata will be assigned to the dictionary every time None appears:

In [62]:
def user_display(user_metadata: dict = None):
    user_metadata = user_metadata or {"name": "John", "age": 30}

    name = user_metadata.pop("name")
    age = user_metadata.pop("age")

    return f"{name} ({age})"

## Extending built-in types

The correct way of extending built-in types such as lists, strings, and dictionaries is by means of the collections module.

If you create a class that directly extends dict, for example, you will obtain results that are probably not what you are expecting. The reason for this is that in CPython the methods of the class don't call each other (as they should), so if you override one of them, this will not be reflected by the rest, resulting in unexpected outcomes. For example, you might want to override `__getitem__`, and then when you iterate the object with a for loop, you will notice that the logic you have put on that method is not applied.

This is all solved by using collections.UserDict, for example, which provides a transparent interface to actual dictionaries, and is more robust.

Let's say we want a list that was originally created from numbers to convert the values to strings, adding a prefix. The first approach might look like it solves the problem, but it is erroneous:



In [63]:
class BadList(list):
    def __getitem__(self, index):
        value = super().__getitem__(index)
        if index % 2 == 0:
            prefix = "even"
        else:
            prefix = "odd"
        return f"[{prefix}] {value}"

In [64]:
bl = BadList((0, 1, 2, 3, 4, 5))
print(bl[0])
print(bl[1])
"".join(bl)

[even] 0
[odd] 1


TypeError: sequence item 0: expected str instance, int found


The join function will try to iterate (run a for loop over) the list, but expects values of type string. This should work because it is exactly the type of change we made to the list, but apparently when the list is being iterated, our changed version of the `__getitem__` is not being called.

This issue is actually an implementation detail of CPython (a C optimization), and in other platforms such as PyPy it doesn't happen (see the differences between PyPy and CPython in the references at the end of this chapter).

Regardless of this, we should write code that is portable and compatible in all implementations, so we will fix it by extending not from list but from UserList:

In [65]:
from collections import UserList

class GoodList(UserList):
    def __getitem__(self, index):
        value = super().__getitem__(index)
        if index % 2 == 0:
            prefix = "even"
        else:
            prefix = "odd"
        return f"[{prefix}] {value}"

In [66]:
gl = GoodList((0, 1, 2, 3, 4, 5))
print(gl[0])
print(gl[1])
"".join(gl)

[even] 0
[odd] 1


'[even] 0[odd] 1[even] 2[odd] 3[even] 4[odd] 5'