## Explore Python Pointers

In [1]:
import numpy as np
import pandas as pd
import sys

In [2]:
a = [1, 2, 3]
b = a
b[0] = 99
print(a)

[99, 2, 3]


Changing `b` appears to change `a`. This is because b points to `a`'s location in memory. We can check this

In [3]:
print(id(b), id(a))
print(b is a)
print(sys.getrefcount(a))

140451832672416 140451832672416
True
3


Above we print the location in memory of `a` and `b` and can see they are the same. This can also be check with `is`. We can also count the references to this location in memory. I'm not sure why this is 3 and not 2.

A simple way to change this behavoir is to add a `.copy()` to `a` to force python to create a copy when creating `b`

In [4]:
a = [1, 2, 3]
b = a.copy()
b[0] = 99
print(a)

[1, 2, 3]


However let's see what happens in different scenarios to better understand python's pointer behavoir.

In [5]:
a = [1, 2, 3]
b = a[0]
b = 99
print(a)

[1, 2, 3]


This time python does create a copy, rather than a pointer. Why? Because `a[0]` is a scaler and scaler are immutable, meaning they cannot be changed. So therefore python must create a new copy for `b`

In [6]:
a = np.array([1, 2, 3])
b = a
b = b + 1
print(a)

[1 2 3]


The addition operator creates a new list

In [7]:
a = np.array([1, 2, 3])
b = a
b += 1
print(a)

[2 3 4]


The += operator tries inplace addition first [(reference)](https://stackoverflow.com/questions/2347265/why-does-behave-unexpectedly-on-lists?noredirect=1&lq=1), which means b continues to point as a

In [55]:
a = pd.DataFrame({'test':[1, 2, 3, np.nan]})
b = a
b.dropna(inplace=True)
print(len(a))

3


Here `b` continues to be a point when using `dropna` with `inplace`

In [54]:
a = pd.DataFrame({'test':[1, 2, 3, np.nan]})
b = a
b = b.dropna()
print(len(a))

4


But without `inplace` we see we get a new copy, and now `b` does not point to `a`

In [57]:
a = pd.DataFrame({'test':[1, 2, 3, np.nan]})
b = a
b = b[b.notnull()]
print(len(a))

4


We see similar behavior using a boolean mask

In [9]:
a = pd.DataFrame({'test':[1, 2, 3, np.nan]})
b = a
def drop_null(df):
    df.dropna(inplace=True)
b = drop_null(b)
print(len(a))


3


Does wrapping `inplace` within a function change this pointer behavior? No it does not

### Performance?
Using `inplace` appears faster, but it is not reccomended

In [80]:
%%timeit
df2 = df.dropna(inplace=False)

15.3 ms ± 692 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [81]:
%%timeit
df.dropna(inplace=True)

11.6 ms ± 67.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
