# Python Internals


## Asking for an address is not the same as asking for a house

In [1]:
davids_address = "32 Lark Lane"

davids_address

'32 Lark Lane'

In [2]:
# I've given you my address, and you've written it down in your address book...

address_book = {
    'david': davids_address
}

address_book

{'david': '32 Lark Lane'}

In [3]:
class House:
    def __init__(self, address, occupants, contents):
        self.address = address
        self.occupants = occupants
        self.contents = contents
        
    def __repr__(self):
        return self.address + ' : ' + str(self.occupants)
        
# I live with a mouse
davids_house = House(davids_address, ["David", "Jeremy (the mouse)"], ['A bronze horse sculpture', 'A beachball'])
davids_house

32 Lark Lane : ['David', 'Jeremy (the mouse)']

In [4]:
# Your address book contains my address.  Not my house.  It wouldn't fit, and it's hard to copy houses.

address_book['david']

'32 Lark Lane'

## A brief primer on C

Python shares many syntactic similarities to C, and the reference implementation of Python
(ie what you get when you download and install Python) is CPython - Python written in C.

The C language has basic datatypes similar to Python (int, float etc).  There are differences, but they represent conceptually the same kinds of values.

In addition, C has "pointers", indicated by the * symbol - these do _not_ contain the underlying data, but rather the _address_ of the data

You must manually specify all types (including pointer types) in C

Understanding a little about pointers explains a lot of what happens in Python

# Back to Python

In [5]:
# Nearly everything in Python is really a _pointer_, not a value
# That is, we are not moving data around in memory, or making copies - 
# we are simply giving out addresses

In [6]:
a = ['shoes', 'bats', 'wheels']
b = ['red', 'green', 'shocking pink']

In [7]:
# The names a and b are 'identifiers'
# Identifiers are really just entries in an address book, which Python neatly has a builtin datatype for; the dict
# At the top level, these are stored as 'globals', which can be seen anywhere
# At other scope (inside functions), they are stored as locals

globals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  'davids_address = "32 Lark Lane"\n\ndavids_address',
  "# I've given you my address, and you've written it down in your address book...\n\naddress_book = {\n    'david': davids_address\n}\n\naddress_book",
  'class House:\n    def __init__(self, address, occupants, contents):\n        self.address = address\n        self.occupants = occupants\n        self.contents = contents\n        \n    def __repr__(self):\n        return self.address + \' : \' + str(self.occupants)\n        \n# I live with a mouse\ndavids_house = House(davids_address, ["David", "Jeremy (the mouse)"], [\'A bronze horse sculpture\', \'A beachball\'])\ndavids_house',
  "# Your address book contains my address.  Not my house.  It wouldn't fit, 

In [8]:
def see_locals(x):
    y = 123
    print(locals())

see_locals(a)

{'y': 123, 'x': ['shoes', 'bats', 'wheels']}


In [9]:
# When a line of code is parsed by the Python interpreter (right to left) it looks up entries in the address book, and 
# possibly creates new ones

a = 5  # Take the integer literal 5, add an 'a' entry to the address book whose value is 5
a = [a] # Create a new list; look up 'a', and place it in the list;  update 'a' to point to this new list

b = a # Look up a (which is a list);  update 'b' to refer to this list

a,b

([5], [5])

In [10]:
# These are equal to other (which is what you'd expect)

a == b

True

In [11]:
# But they're not just equal... a _is_ b

a is b

True

In [12]:
# Things can be equal, but not be the same object

c = [0,1,2]
d = [0,1,2]

c == d, c is d

(True, False)

In [13]:
d.append(3)
c == d

False

In [14]:
a, b

([5], [5])

In [15]:
# In this example, we might say we're appending to a...

a.append(7)

#... but really, we're appending to the object that is referred to by a
# (ie the object we get when we look up a in the address book)
# This is the same object we'd get when we look up b; 'a' and 'b' really contain the addresses of the objects (pointers)
# not the objects themselves

a, b

([5, 7], [5, 7])

In [16]:
# What will happen here?

a.append(a)

a

[5, 7, [...]]

In [17]:
a[2] is a

True

In [18]:
# There are some inconsistencies... Python _nearly_ always behaves like it's using pointers, but basic datatypes like
# int, float and str don't quite work like this...  they really do store the _value_
# This is largely for efficiency

In [19]:
f = 5.1
g = f

# Looks up f, which really is a floating point number, adds 1.0, updates the entry for f in the symbol table
f += 1.0

f,g

(6.1, 5.1)

In [20]:
h = [1]
i = h

# Calls the append method directly on our list
h += [2]


# h and i both still point to the same object
h, i

([1, 2], [1, 2])

In [21]:
# If you really want a copy of something (instead of pointer to it), you can call... copy
# Most standard python objects have a copy method, and there is a copy module for cases where this is not the case

j = h.copy()

j == h, j is h

(True, False)

In [22]:
# However, copy only goes one layer down - consider the following

a = dict(x = 12.1, y = 17.2)
b = [a]

c = b.copy()

c.append(3)

print(b, c)

a["z"] = 31.1 

print(b, c)

[{'x': 12.1, 'y': 17.2}] [{'x': 12.1, 'y': 17.2}, 3]
[{'x': 12.1, 'y': 17.2, 'z': 31.1}] [{'x': 12.1, 'y': 17.2, 'z': 31.1}, 3]


In [23]:
# you can use deepcopy for these situations
# from copy import deepcopy

# Numpy arrays (hint: they're C arrays)

In [24]:
# Numpy is much more like pure C than the Python standard libraries.
# This is why it's fast!

In [25]:
import numpy as np

x = np.linspace(0,1.0, 11)

x

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [26]:
# Numpy arrays are Python wrappers around actual C arrays
# That means they are stored in order, can be passed to C functions etc

x.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [27]:
# We can look at the real underlying address in memory:

x.ctypes.data

2308006670784

In [28]:
# Numpy tries really really hard not to copy data, only pointers (and some indexing information)

# y is a pointer into x
y = x[5:]

y[:] = 3.0

x

array([0. , 0.1, 0.2, 0.3, 0.4, 3. , 3. , 3. , 3. , 3. , 3. ])

In [29]:
x.ctypes.data

2308006670784

In [30]:
# Remember, while arrays are views into memory, basic python datatypes still operate as values, not real pointers

z = x[0]
print(z)
z = 10.0  # The z identifier is updated, but it is not referring to any part of our array
print(z, x)

0.0
10.0 [0.  0.1 0.2 0.3 0.4 3.  3.  3.  3.  3.  3. ]


In [31]:
# However, operating on a length one array subset is still referring to the actual array

z = x[0:1]
print(z)
z[0] = 10.0
print(x)

[0.]
[10.   0.1  0.2  0.3  0.4  3.   3.   3.   3.   3.   3. ]


In [32]:
# Copies (or new data) will be created using stanard parsing logic...

# Get y - add 3.0, but not in place, add a new identifier entry called z, 
z = y + 3.0
print(z)
z[:] = 0.0
print(z)
print(y) # Unchanged

[6. 6. 6. 6. 6. 6.]
[0. 0. 0. 0. 0. 0.]
[3. 3. 3. 3. 3. 3.]


# Garbage Collection

In [33]:
# Some final notes about memory, identifiers/symbol tables
# In C (and C++), you need to manually manage memory
# If you don't delete something, it lives forever, and you run out of memory
# In Python this is automatic... but how?

class Thing:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return self.name

# It's a small universe...
things_that_exist = [Thing('dogs'), Thing('hats'), Thing('the future')]

# I only care about hats
things_that_i_care_about = things_that_exist[1]

# Let's clean up the universe
things_that_exist = None 

# Because there are things still referenced in this list, they still exist
print(things_that_i_care_about)

# However, the others are gone!  There is no way to look up their addresses, and so no possible meaningful way to interact
# with them - they are garbage, and it is collected

# This is called 'reference counting'; every time an object is bound (e.g placed in a container, assigned to an identifier)
# it has 1 added to it's reference count.  If this disappear, then 1 is subtracted
# When the count reaches 0, they are deleted (garbage collected)

# This is easy and possible because of the 'everything is a pointer' approach

hats
