<a href="https://colab.research.google.com/github/rahiakela/fluent-python-book-practice/blob/master/part-iv-object-oriented-idioms/8-object_references_mutability_and_recycling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object references, mutability and recycling

## Variables are not boxes

Python variables are like reference variables in Java, so it’s better to think of them as labels attached to objects.

So variables a and b hold references to the same list, not copies of the list.

In [None]:
a = [1, 2, 3]
b = a
a.append(4)
b

[1, 2, 3, 4]

If you imagine variables are like boxes, you can’t make sense of assignment
in Python. Think of variables as Post-it notes. Then this becomes easy to explain.

<img src='https://github.com/rahiakela/img-repo/blob/master/var-box.png?raw=1' width='800'/>

With reference variables it makes much more sense to say that the variable is assigned to an object, and not the other way around. After all, the object is created before the assignment.

In [None]:
# Variables are assigned to objects only after the objects are created.
class Gizmo:
  def __init__(self):
    print(f"Gizmo id: {id(self)}")

In [None]:
# The output Gizmo id: ... is a side effect of creating a Gizmo instance.
x = Gizmo()

Gizmo id: 140464294279992


In [None]:
# Multiplying a Gizmo instance will raise an exception.
y = Gizmo() * 10

Gizmo id: 140464294279936


TypeError: ignored

Here is proof that a second Gizmo was actually instantiated before the
multiplication was attempted.

But variable y was never created, because the exception happened while the righthand side of the assignment was being evaluated.

In [None]:
dir()

['Gizmo',
 'In',
 'Out',
 '_',
 '_1',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 '_sh',
 'a',
 'b',
 'exit',
 'get_ipython',
 'quit',
 'x']

## Identity, equality and aliases

Since variables are mere labels, nothing prevents an object from having several labels assigned to it. When that happens, you have aliasing.

For example Gulzar is the pen name of famous Indian poet Sampooran Singh Kalra. Gulzar is not only equal to Sampooran Singh Kalra: they are one and the same.

So gulzar and ss_kalra refer to the same object.

In [None]:
gulzar = {'name': 'Sampooran Singh Kalra', 'born': 1934}
# ss_kalra is an alias for gulzar
ss_kalra = gulzar
ss_kalra is gulzar

True

In [None]:
# The is operator and the id function confirm it.
id(gulzar), id(ss_kalra)

(140464294434784, 140464294434784)

In [None]:
# Adding an item to ss_kalra is the same as adding an item to gulzar.
ss_kalra['balance'] = 86790
ss_kalra

{'balance': 86790, 'born': 1934, 'name': 'Sampooran Singh Kalra'}

In [None]:
gulzar

{'balance': 86790, 'born': 1934, 'name': 'Sampooran Singh Kalra'}

However, suppose an impostor — let’s call him Dr. Alexander Pedachenko — claims he is Sampooran Singh Kalra, born in 1934. His credentials may be the same, but Dr. Pedachenko is not Sampooran Singh Kalra.

<img src='https://github.com/rahiakela/img-repo/blob/master/var-label.png?raw=1' width='800'/>

So gulzar and ss_kalra are bound to the same object; alex is bound to a separate object of equal contents.



In [None]:
# alex and gulzar compare equal, but alex is not gulzar.

# alex refers to an object that is a replica of the object assigned to gulzar.
alex = {'name': 'Sampooran Singh Kalra', 'born': 1934, 'balance': 86790}

# The objects compare equal, because of the __eq__ implementation in the dict class.
alex == gulzar

True

In [None]:
# But they are distinct objects.
alex is not gulzar

True

In [None]:
alex is gulzar

False

The key point is that the id is guaranteed to be a unique numeric
label, and it will never change during the life of the object.

The == operator compares the values of objects (the data they hold), while is compares their identities.

We often care about values and not identities, so == appears more frequently than is in Python code.

However, if you are comparing a variable to a singleton, then it makes sense to use is.

### The relative immutability of tuples

Tuples, like most Python collections — lists, dicts, sets etc. — hold references to objects. If the referenced items are mutable, they may change even if the tuple itself does not. In other words, the immutability of tuples really refers to the physical contents of the tuple data structure (ie. the references it holds), and does not extend to the referenced objects.

Here, t1 and t2 initially compare equal, but changing a mutable item inside
tuple t1 makes it different.

In [None]:
# t1 is immutable, but t1[-1] is mutable.
t1 = (1, 2, [30, 40])
# Build a tuple t2 whose items are equal to those of t1.
t2 = (1, 2, [30, 40])

# Although distinct objects, t1 and t2 compare equal, as expected.
t1 == t2

True

In [None]:
# Inspect the identity of the list at t1[-1].
id(t1[-1])

140464293598280

In [None]:
# Modify the t1[-1] list in place.
t1[-1].append(99)
t1

(1, 2, [30, 40, 99])

In [None]:
# The identity of t1[-1] has not changed, only its value.
id(t1[-1])

140464293598280

In [None]:
# t1 and t2 are now different.
t1 == t2

False

## Copies are shallow by default

The easiest way to copy a list (or most built-in mutable collections) is to use the builtin constructor for the type itself.

In [1]:
l1 = [3, [55, 44], (7, 8, 9)]
# list(l1) creates a copy of l1
l2 = list(l1)
l2

[3, [55, 44], (7, 8, 9)]

In [2]:
# the copies are equal;
l2 == l1

True

In [3]:
# but refer to two different objects.
l2 is l1

False

However, using the constructor or [:] produces a shallow copy, i.e. the outermost container is duplicated, but the copy is filled with references to the same items held by the original container. This saves memory and causes no problems if all the items are immutable. But if there are mutable items, this may lead to unpleasant surprises.

<img src='https://github.com/rahiakela/img-repo/blob/master/object-assignment.png?raw=1' width='800'/>

In [5]:
l1 = [3, [66, 55, 44], (7, 8, 9)]
# l2 is a shallow copy of l1. This state is depicted in Figure
l2 = list(l1)
# Appending 100 to l1 has no effect on l2.
l1.append(100)
# Here we remove 55 from the inner list l1[1]. This affects l2 because l2[1] is bound to the same list as l1[1].
l1[1].remove(55)
print('l1:', l1)
print('l2:', l2)

# For a mutable object like the list referred by l2[1], the operator += changes the
# list in-place. This change is visible at l1[1], which is an alias for l2[1].
l2[1] += [33, 22]
# += on a tuple creates a new tuple and rebinds the variable l2[2] here. This is
# the same as doing l2[2] = l2[2] + (10, 11).
# Now the tuples in the last position of l1 and l2 are no longer the same object.
l2[2] += (10, 11)
print('l1:', l1)
print('l2:', l2)

l1: [3, [66, 44], (7, 8, 9), 100]
l2: [3, [66, 44], (7, 8, 9)]
l1: [3, [66, 44, 33, 22], (7, 8, 9), 100]
l2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]


<img src='https://github.com/rahiakela/img-repo/blob/master/object-assignment2.png?raw=1' width='800'/>

It should be clear now that shallow copies are easy to make, but they may or may not be what you want.

### Deep and shallow copies of arbitrary objects

Working with shallow copies is not always a problem, but sometimes you need to make deep copies, i.e. duplicates that do not share references of embedded objects. 

The copy module provides the deepcopy and copy functions that return deep and shallow copies of arbitrary objects.

A simple class Bus, representing a school bus that is loaded with passengers and then picks or drops passengers on its route.

In [6]:
class Bus:

  def __init__(self, passengers=None):
    if passengers is None:
      self.passengers = []
    else:
      self.passengers = list(passengers)

  def pick(self, name):
    self.passengers.append(name)

  def drop(self, name):
    self.passengers.remove(name)

we will create a Bus instance, bus1 and two clones: a shallow copy (bus2) and a deep copy (bus3), to observe what happens as bus1 drops a student.

In [10]:
import copy

bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.copy(bus1)
bus3 = copy.deepcopy(bus1)
# Using copy and deepcopy we create three distinct Bus instances.
id(bus1), id(bus2), id(bus3)

(139800284821152, 139800284822104, 139800284821656)

In [12]:
bus1.passengers, bus2.passengers, bus3.passengers

(['Alice', 'Bill', 'Claire', 'David'],
 ['Alice', 'Bill', 'Claire', 'David'],
 ['Alice', 'Bill', 'Claire', 'David'])

In [13]:
bus1.drop('Bill')
# After bus1 drops 'Bill', he is also missing from bus2.
bus2.passengers

['Alice', 'Claire', 'David']

In [14]:
# Inspection of the passengers atributes shows that bus1 and bus2 share the same
# list object, because bus2 is a shallow copy of bus1.
id(bus1.passengers), id(bus1.passengers), id(bus2.passengers)

(139800284954952, 139800284954952, 139800284954952)

In [15]:
# bus3 is a deep copy of bus1, so its passengers attribute refers to another list.
bus1.passengers, bus2.passengers, bus3.passengers

(['Alice', 'Claire', 'David'],
 ['Alice', 'Claire', 'David'],
 ['Alice', 'Bill', 'Claire', 'David'])

Note that making deep copies is not a simple matter in the general case. Objects may have cyclic references which would cause a naïve algorithm to enter an infinite loop.

The deepcopy function remembers the objects already copied to handle cyclic references gracefully.

In [16]:
# Cyclic references: b refers to a, and then is appended to a; deepcopy still manages to copy a.
a = [10, 20]
b = [a, 30]
a.append(b)
a

[10, 20, [[...], 30]]

In [17]:
from copy import deepcopy

c = deepcopy(a)
c

[10, 20, [[...], 30]]

Also, a deep copy may be too deep in some cases. For example, objects may refer to external resources or singletons that should not be copied.

You may control the behavior of both copy and deepcopy by implementing the `__copy__()` and `__deepcopy__()` special methods.

## Function parameters as references

The only mode of parameter passing in Python is call by sharing.

Call by sharing means that each formal parameter of the function gets a copy of each reference in the arguments.

In other words, the parameters inside the function become aliases of the actual arguments.

The result of this scheme is that a function may change any mutable object passed as a parameter, but it cannot change the identity of those objects, i.e. it cannot replace altogether an object with another.

In [18]:
# A function may change any mutable object it receives.
def f(a, b):
  a += b
  return a

In [19]:
x = 1
y = 2

# The number x is unchanged.
f(x, y)

3

In [20]:
a = [1, 2]
b = [3, 4]

f(a, b)

[1, 2, 3, 4]

In [21]:
# The list a is changed.
a, b

([1, 2, 3, 4], [3, 4])

In [22]:
t = (10, 20)
u = (30, 40)

f(t, u)

(10, 20, 30, 40)

In [23]:
# The tuple t is unchanged.
t, u

((10, 20), (30, 40))

### Mutable types as parameter defaults: bad idea

Optional parameters with default values are a great feature of Python function definitions, allowing our APIs to evolve while remaining backward-compatible. However, you should avoid mutable objects as default values for parameters.

Here we tried to be clever and instead of having a default value of `passengers=None` we have `passengers=[]`, thus avoiding the if in the
previous `__init__`. This “cleverness” gets us into trouble.

In [41]:
# A simple class to illustrate the danger of a mutable default.
class HauntedBus:
  """A bus model haunted by ghost passengers"""

  # When the passengers argument is not passed, this parameter is bound to the
  # default list object, which is initially empty.
  def __init__(self, passengers=[]):
    # This assignment makes self.passengers an alias for passengers which is itself
    # an alias for the default list, when no passengers argument is given.
    self.passengers = passengers

  # When the methods .remove() and .append() are used with self.passengers
  # we are actually mutating the default list, which is an attribute of the function object.
  def pick(self, name):
    self.passengers.append(name)

  def drop(self, name):
    self.passengers.remove(name)

In [42]:
bus1 = HauntedBus(['Alice', 'Bill'])
bus1.passengers

['Alice', 'Bill']

In [43]:
bus1.pick('Charlie')
bus1.drop('Alice')
# So far, so good: no surprises with bus1.
bus1.passengers

['Bill', 'Charlie']

In [44]:
# bus2 starts empty, so the default empty list is assigned to self.passengers.
bus2 = HauntedBus()
bus2.pick('Carrie')
bus2.passengers

['Carrie']

In [45]:
# bus3 also starts empty, again the default list is assigned.
bus3 = HauntedBus()
# The default is no longer empty!
bus3.passengers

['Carrie']

In [46]:
bus3.pick('Dave')
# Now Dave, picked by bus3, appears in bus2.
bus2.passengers

['Carrie', 'Dave']

In [47]:
# The problem: bus2.passengers and bus3.passengers refer to the same list.
bus2.passengers is bus3.passengers

True

In [48]:
# But bus1.passengers is a distinct list.
bus1.passengers

['Bill', 'Charlie']

The problem is that Bus instances that don’t get an initial passenger list end up sharing the same passenger list among themselves.

The problem is that each default value is evaluated when the function is defined — i.e. usually when the module is loaded — and the default values become attributes of the function object. So if a default value is a mutable object, and you change it, the change will affect every future call of the function.

you can inspect the HauntedBus.`__init__` object and see the ghost students haunting its `__defaults__` attribute.

In [49]:
dir(HauntedBus.__init__)

['__annotations__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

In [50]:
HauntedBus.__init__.__defaults__

(['Carrie', 'Dave'],)

Finally, we can verify that bus2.passengers is an alias bound to the first element of the HauntedBus.`__init__.__defaults__` attribute:

In [51]:
HauntedBus.__init__.__defaults__[0] is bus2.passengers

True

The issue with mutable defaults explains why None is often used as the default value for parameters that may receive mutable values.

### Defensive programming with mutable parameters

When you are coding a function that receives a mutable parameter you should carefully consider whether the caller expects the argument passed to be changed.

TwilightBus violates the “Principle of least astonishment”, a best practice of interface design. It surely is astonishing that when the bus drops a student, her name is removed from the basketball team roster.

In [52]:
# A simple class to show the perils of mutating received arguments.
class TwilightBus:
  """A bus model that makes passengers vanish"""
  def __init__(self, passengers=None):
    if passengers is None:
      # Here we are careful to create a new empty list when passengers is None.
      self.passengers = []
    else:
      # However, this assignment makes self.passengers an alias for passengers
      # which is itself an alias for the actual argument passed to __init__ — basket ball_team
      self.passengers = passengers

  def pick(self, name):
    self.passengers.append(name)

  def drop(self, name):
    self.passengers.remove(name)

In [53]:
# Passengers disappear when dropped by a TwilightBus.
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']

# A TwilightBus is loaded with the team.
bus = TwilightBus(basketball_team)

# The bus drops one student, then another.
bus.drop('Tina')
bus.drop('Pat')

# The dropped passengers vanished from the basketball team!
basketball_team

['Sue', 'Maya', 'Diana']

The problem here is that the bus is aliasing the list that is passed to the constructor. Instead, it should keep its own passenger list. 

The fix is simple: in `__init__`, when the passengers parameter is provided, self.passengers should be initialized with a copy of it.

In [None]:
# A simple class to show the perils of mutating received arguments.
class TwilightBus:
  """A bus model that makes passengers vanish"""
  def __init__(self, passengers=None):
    if passengers is None:
      # Here we are careful to create a new empty list when passengers is None.
      self.passengers = []
    else:
      # Make a copy of the passengers list, or convert it to a list if it’s not one.
      self.passengers = list(passengers)

  def pick(self, name):
    self.passengers.append(name)

  def drop(self, name):
    self.passengers.remove(name)

Now our internal handling of the passenger list will not affect the argument used to initialize the bus.

As a bonus, this solution is more flexible: now the argument passed
to the passengers parameter may be a tuple or any other iterable, like a set or even database results, because the list constructor accepts any iterable.

## del and garbage collection