### Idiomatic Python - Using Named Tuples for Function Return Values

Remember that although tuples are often explained as "immutable lists", I think they are in fact more appropriately defined as data structures.

What I mean is that we often use tuples as a way to group a certain number of related objects together, each of which have a specific meaning, not just a list of values. And often we use the same structure with multiple "instances" of that structure. And that's where named tuples come in.

The `namedtuple` is essentially a  little code generator that gives us the ability to combine the efficiency of tuples, along with the ease of naming the elements of the tuple - which is exactly in line with the structure view of tuples.

Before I show you the Pythonic way of returning multiple variables from a function, let's do a quick refresher on `namedtuple`.

To use named tuples, we first have to define the structure of the tuple:

In [1]:
from collections import namedtuple

Coordinates = namedtuple("Coordinates", "latitude, longitude")

Now, we can create as many instances of that structure this way:

In [2]:
london = Coordinates(51.51, -0.12)

or, better yet, using named arguments:

In [3]:
paris = Coordinates(latitude=48.86, longitude=2.35)

The additional benefit of using named arguments is that we lower the risk of introducing a bug if we inadvertently switch the position of latitude and longitude:

In [4]:
paris = Coordinates(longitude=2.35, latitude=48.86)

Now, these namedtuple instances behave like regular tuples:

In [5]:
paris[0], london[1]

(48.86, -0.12)

And in fact they **are** tuples (the namedtuples inherit from the `tuple` class):

In [6]:
isinstance(paris, tuple)

True

But, they also allow accessing the structure based on the names we specified:

In [7]:
paris.latitude, london.longitude

(48.86, -0.12)

Again, this helps reduce the risk of reading the wrong value (is latitude the first or second element of the tuple?)

So, the point I am trying to make here, is that named tuples are an efficient way of creating small data structures that allow us to use names to both specify and read back the values in the structure - reduces the risk of making mistakes for both reading and writing the data, and also allows IDEs to provide auto-completion.

Try this, even in Jupyter:

```python
london.
```

then press the `Tab` key, and you get the properties of the named tuple, including our field names.

One of the main benefits of using named tuples is when returning multiple values from a function.

In Python, functions can return multiple values by returning a tuple of these values.

Let's take a look at this example, which calculates and returns the minimum and maximum values in a list:

In [8]:
def min_max(lst: list[int]) -> tuple:
    min_ = min(lst)
    max_ = max(lst)
    return min_, max_

When we call this function:

In [9]:
l = [1, 3, 2, 0, 20, -5]
results = min_max(l)
print(results)

(-5, 20)


we get the result back as a tuple, which we could just unpack directly:

In [10]:
max_, min_ = min_max(l)
print(f"max={max_}, min={min_}")

max=-5, min=20


Ah, so I unpacked the results into the wrong variable names - and that's always the problem - every time we unpack the results of a regular tuple returned by a call, we need to know the specific order in which the values are assembled into teh returned tuple.

The Pythonic way of solving this potential issue, is to use a named tuple:

In [11]:
MinMax = namedtuple("MinMax", "min, max")

def min_max(lst: list[int]) -> MinMax:
    min_ = min(lst)
    max_ = max(lst)
    return MinMax(min=min_, max=max_)

And now we can use field names when retrieving the values (and even use auto completion):

In [12]:
minmax = min_max(l)

print(f"min={minmax.min}, max={minmax.max}")

min=-5, max=20


Which is much more readable, and reduces the risk of swapping the results than this:

In [13]:
minmax = min_max(l)

print(f"min={minmax[0]}, max={minmax[1]}")

min=-5, max=20


So, use named tuples when returning multiple values from a function - reduces the risk of introducing bugs, and makes the code much clearer and explicit.

A question you are probably asking yourself, is whether there is a performance penalty, and if so, how large. (just a reminder, don't fall into the premature optimization trap!!).

So, let's investigate.

In [14]:
import random
from timeit import timeit

In [15]:
def min_max(lst):
    return min(lst), max(lst)

def min_max_named(lst):
    return MinMax(min=min(lst), max=max(lst))

def generate_lists(row_count, col_count):
    random.seed(0)
    return [
        [
            random.randint(-5_000, 5_000) for _ in range(col_count)
        ] for _ in range(row_count)
    ]

In [16]:
from pprint import pprint

lists = generate_lists(5, 5)
pprint(lists)

[[1311, 1890, -4337, -758, 3376],
 [2961, 1634, -31, 2808, 866],
 [4558, -1422, 3268, -2719, -383],
 [-2711, -3447, -896, 3725, 4861],
 [-2593, 81, -3382, -3792, 409]]


Next, let's define some code that will call `min_max` and `min_max_named`:

In [17]:
def all_min_max(lists):
    for lst in lists:
        yield min_max(lst)

In [18]:
def all_min_max_named(lists):
    for lst in lists:
        yield min_max_named(lst)

In [19]:
list(all_min_max(lists))

[(-4337, 3376), (-31, 2961), (-2719, 4558), (-3447, 4861), (-3792, 409)]

In [20]:
list(all_min_max_named(lists))

[MinMax(min=-4337, max=3376),
 MinMax(min=-31, max=2961),
 MinMax(min=-2719, max=4558),
 MinMax(min=-3447, max=4861),
 MinMax(min=-3792, max=409)]

Now let's time using a named tuple vs a regular tuple for the function return value:

In [21]:
lists = generate_lists(row_count=100, col_count=10_000)

In [22]:
timeit("list(all_min_max(lists))", globals=globals(), number=100)

1.9665906250011176

In [23]:
timeit("list(all_min_max_named(lists))", globals=globals(), number=100)

1.9606279579456896

So, looks like there isn't much of a performance hit.

What about reading the values?

In [24]:
def read_values_by_index(min_max_results: list[tuple | MinMax]):
    for result in min_max_results:
        yield result[0], result[1]
        
def read_values_unpacking(min_max_results: list[tuple | MinMax]):
    for min_, max_ in min_max_results:
        yield min_, max_
        
def read_values_by_name(min_max_named_results: list[MinMax]):
    for result in min_max_named_results:
        yield result.min, result.max


Let's create our results lists first:

In [25]:
lists = generate_lists(row_count=1, col_count=100)
results = list(all_min_max(lists))
results_named = list(all_min_max_named(lists))

In [26]:
results

[(-4982, 4882)]

And let's time things:

In [27]:
number = 10_000_000

In [28]:
standard_tuple_by_index = timeit(
    "list(read_values_by_index(results))", 
    globals=globals(), 
    number=number
)

named_tuple_by_index = timeit(
    "list(read_values_by_index(results_named))", 
    globals=globals(), 
    number=number
)

standard_tuple_unpacking = timeit(
    "list(read_values_unpacking(results))", 
    globals=globals(), 
    number=number
)

named_tuple_unpacking = timeit(
    "list(read_values_unpacking(results_named))", 
    globals=globals(), 
    number=number
)

named_tuple_by_name = timeit(
    "list(read_values_by_name(results_named))", 
    globals=globals(), 
    number=number
)

In [29]:
print(f"{standard_tuple_by_index=:.3f}")
print(f"{named_tuple_by_index=:.3f}")
print(f"{standard_tuple_unpacking=:.3f}")
print(f"{named_tuple_unpacking=:.3f}")
print(f"{named_tuple_by_name=:.3f}")

standard_tuple_by_index=1.735
named_tuple_by_index=1.772
standard_tuple_unpacking=1.693
named_tuple_unpacking=1.844
named_tuple_by_name=1.990


As you can see, over `10_000_000` iterations, although we see using named tuples to read information back is slower, it is relatively negligible.

So, yes, named tuples do have a slight performance penalty, but unless that area of the code is bottlenecked by the use of named tuple over standard tuples, prefer named tuples for readability and reducing the risk of bugs.