# Subtle bugs and puzzles

> A collection of subtle (or not so subtle) mistakes I made and puzzles I've come across.

- toc: true
- badges: true
- comments: true
- categories: [python]

In [1]:
import pandas as pd
import numpy as np

## Changing a mutable element of an immutable sequence

The puzzle is from page 40 in [Fluent Python](https://www.oreilly.com/library/view/fluent-python/9781491946237/).

In [2]:
t = (1, 2, [3, 4])
t[2] += [5, 6]

TypeError: 'tuple' object does not support item assignment

In [4]:
type(t).__name__

'tuple'

In [3]:
t

(1, 2, [3, 4, 5, 6])

What's going on here? As part of the assignment, Python does the following:

1. Performs augmented addition on the value of `t[2]`, which works because that value is the list `[3, 4]`, which is mutable.

2. Then it tries to assign the result from 1 to `t[2]`, which doesn't work, because `t` is immutable.

3. But because the 2nd element in `t` is not the list itself but a reference to it, and because the list was changed in step 1, the value of `t[2]` has changed, too.

A great way to visualise the process is to see what happens under the hood using the amazing [Python Tutor](http://www.pythontutor.com).

![](input/puzzler1.png)

![](input/puzzler2.png)

## NANs are True

I have a dataframe with some data: 

In [62]:
df = pd.DataFrame({'data': list('abcde')})
df

Unnamed: 0,data
0,a
1,b
2,c
3,d
4,e


I can shift the data column:

In [63]:
df.data.shift()

0    NaN
1      a
2      b
3      c
4      d
Name: data, dtype: object

I want to add a check column that tells me where the shift is missing:

In [64]:
df['check'] = np.where(df.data.shift(), 'ok', 'missing')
df

Unnamed: 0,data,check
0,a,ok
1,b,ok
2,c,ok
3,d,ok
4,e,ok


That's not what I wanted. The reason it happens is that **missing values that aren't `None` evaluate to `True`** (follows from the [docs](https://docs.python.org/2/library/stdtypes.html#truth-value-testing)). One way to see this:

In [104]:
[e for e in [np.nan, 'hello', True, None] if e]

[nan, 'hello', True]

Hence, to get the check I wanted I should do this:

In [65]:
df['correct_check'] = np.where(df.data.shift().notna(), 'ok', 'missing')
df

Unnamed: 0,data,check,correct_check
0,a,ok,missing
1,b,ok,ok
2,c,ok,ok
3,d,ok,ok
4,e,ok,ok


## Truthy vs True

As follows clearly from the [docs](https://docs.python.org/2/library/stdtypes.html#truth-value-testing), `True` is one of many values that evaluate to `True`. This seems clear enough. Yet I just caught myself getting confused by the following:

I have a list of values that I want to filter for Truthy elements -- elements that evaluate to `True`:

In [102]:
mylist = [np.nan, 'hello', True, None]
[e for e in mylist if e]

[nan, 'hello', True]

This works as intended. For a moment, however, I got confused by the following:

In [103]:
[e for e in mylist if e is True]

[True]

I expected it to yield the same result as the above. But it doesn't becuase it only returns valus that actually are `True`, as in having the same object ID as the value `True` ([this](https://stackoverflow.com/a/20421344/13666841) Stack Overflow answer makes the point nicely). We can see this below:

In [97]:
[id(e) for e in mylist]

[4599359344, 4859333552, 4556488160, 4556589160]

In [91]:
id(True)

4556488160