# Python Gotchas
> We love Python and its third party libraries. But some features will once in a while cost us a couple of hours chasing a silly bug. Ad eternum growing list with some of the ones that gave me a headache.

- toc: true
- author: Fabrizio Damicelli
- badges: true
- comments: false
- categories: [python, numpy]

![](https://imgs.xkcd.com/comics/python.png)

## Numpy
---

### ```array *= something``` is very different from ```array = array * something```

#### TL; DR: Only use the form ```array *= something``` if you're 100% sure you are doing the right thing, otherwise, just go for ```array = array * something```.

We define two functions that to the eyes of many (including past me) do just the same.

In [1]:
import numpy as np

def multiply(array, scalar):
    array *= scalar  # <-- handy short hand, right? ;)
    return array

def multiply2(array, scalar):
    array = array * scalar
    return array

Let's see them in action

In [2]:
a = np.arange(10.)  # dot casts to float to avoid type errors
b = np.arange(10.)

In [3]:
multiply(a, 2)

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

In [4]:
multiply(a, 2)

array([ 0.,  4.,  8., 12., 16., 20., 24., 28., 32., 36.])

Hey, wait! What's going on?

In [5]:
a

array([ 0.,  4.,  8., 12., 16., 20., 24., 28., 32., 36.])

> Warning: The operation modified the array **in place**.

Let's see what the other version of our function does.

In [6]:
multiply2(b, 2)

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

In [7]:
multiply2(b, 2)

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

This time the input array stays the same, ie., the modification remained in the scope of the function.

Despite it being very basic, it is actually more difficult to debug than for the toy example in real life cases.\
For instance, in the middle of a long data preprocessing pipeline.
If you load your data *once* and run the preprocessing pipeline *once*, you will probably not notice the bug (that's the tricky thing!).\
But if the loaded data are passed more than once to the pipeline (without reloading the whole data), each pass will be actually feeding *different input*.\
For instance, if you run K-Fold cross-validation, most likely it won't crash or anything, but you will be passing K different datasets to your model and your validation will be just rubbish!

**Conclusion:** you'd better be *really* sure of what you're doing with ```array = array * something```.