# 01 - Variables are Memory References

We can find the memory address using `id()` which returns a base-10 number and can be converted to hex using `id(hex())`

If we first assign a variable to an integer, that variable looks points to the memory address (reference) which points to the integer. 

If we assign a second variable to the first variable, the second variable points to the reference of the first variable, not the integer. 

So, the integer has two references to it (count=2). We can safely delete/reassign the first variable away and then, count=1. Once the count drops to 0, that object is thrown away and that space (memory address) can be used anywhere else.

# 02 Reference Counting


We can get reference count with getrefcount via `sys.getrefcount(my_var)`. One nuance of this is that passing `my_var` to `getrefcount()` actually creates another reference to `my_var`, because variables are always passed by reference in Python. 

In [7]:
import sys
my_var = [1,2,3]
sys.getrefcount(my_var)

2

We can get round this extra reference using a method from the C library: `ctypes.c_long.from_address(address).value`, but we need to pass the memory address (reference) instead.

In [8]:
import ctypes

def ref_count(address: int):
    return ctypes.c_long.from_address(address).value

ref_count(id(my_var))

1

# 03 - Garbage Collection

Below is `my_var` which points to an object with an instance variable `var_1` (the name of the variable passed into a class which is turned into an attributed; in `self.age = age`, `age` is the instance variable) which points to another object with another instance variable.

<img src=s3-images/3.1.png width=500 />

Now, in a scenario where `var_2` does not point to `Object A`, setting `my_var = None` would remove reference to Object A, therefore Object A has no references **to** it. So with its ref count = 0, it disappears. Therefore, Object B has nothing pointing to it, so it must disappear. And all that memory can be used elsewhere. 

Now if instead we have `var_2` pointing to `Object A` (as in above image), then, deleting `my_var` does not produce that chain reaction because Objects A and B are self-referencing each other. This is known as **Circular References**. 

Since the ref_count is non-zero for both, we cannot get rid of them via Python Memory Manager, and we have a **memory leak**.

To solve this, we need a **Garbage Collector** which can be controlled via the module **gc**. It runs periodically, but we can also call it manually. It's used to clean up circular references.


Let's see what this looks like in code:

In [9]:
import ctypes
import gc

def ref_count(address):
    return ctypes.c_long.from_address(address).value

We create a function that will search the objects in the GC for a specified id and tell us if the object was found or not:

In [10]:
def object_by_id(object_id):
    for obj in gc.get_objects():
        if id(obj) == object_id:
            return "Object exists"
    return "Not found"

In [11]:
class A:
    def __init__(self):
        self.b = B(self)
        print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b))))

Let's break this down:

The`self` in the `__init__` constructor refers to an **instance of A**. `B(self)` is an **instance of B**. 

So, the 1st term in the print statement is the **instance of A** and the 2nd is **instance of B**.

In [12]:
class B:
    def __init__(self, a):
        self.a = a
        print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a))))

Let's break this down:

The`self` in the `__init__` constructor in `class B` now refers to **instance of B**. 

What is `a`? From class A's calling of `B(self)`, `a` is `self` and `self` is an **instance of A**.

So, the 1st term in the print statement is the **instance of B** and the 2nd is **instance of A**.

Comparing the two markdowns above, we expect that the 1st printed term in the 1st markdown is equal to the 2nd printed term in the 2nd markdown (both being **instance of A**

In [13]:
gc.disable() 

We turn off the GC so we can see how reference counts are affected when the GC does not run and when it does (by running it manually).

In [14]:
my_var = A()

B: self: 0x7f55ff2d0070, a: 0x7f55ff2d2920
A: self: 0x7f55ff2d2920, b:0x7f55ff2d0070


Sure enough, we're correct.

From the above line, we can easily see that `my_var` is an instance of A().

In [24]:
print('a: \t{0}'.format(hex(id(my_var))))
print('a.b: \t{0}   <- This is my_var.b; my_var is an instance of A and b = B(self) and thus, b is an instance of B.'.format(hex(id(my_var.b)))) 
print('a.b.a: \t{0}   <- b is an instance of B, and b has an attribute called a which is equal to a which is equal to A\'s self (instance of A).'.format(hex(id(my_var.b.a))))

a: 	0x7f55ff2d2920
a.b: 	0x7f55ff2d0070   <- This is my_var.b; my_var is an instance of A and b = B(self) and thus, b is an instance of B.
a.b.a: 	0x7f55ff2d2920   <- b is an instance of B, and b has an attribute called a which is equal to a which is equal to A's self (instance of A).


# 04 - Dynamic vs Static Typing

In statically typed languages, we have the following arrangement:

<img src="s3-images/3.2.png" width="600"/>

The key detail is that **myVar has been declared as a string**

In Python, my_var is just a reference, nothing more, that points to an object that *happens to be* a string. 

<img src="s3-images/3.3.png" width="600"/>

No type is attached to my_var.

All we do when we reassign `my_var` is to make it point to a **brand new**, different object. Remember **`my_var` in python has no type at all**. All that's changed is the type of object that `my_var` is pointing to.

# 05 - Variable Re-Assignment

<img src="s3-images/3.4.png" width='400'/>

**the value inside the <u>int</u> objects can *never* be changed. We always reassign a variable to a new memory reference with that newly calculated int value.**
Therefore, integers are immutable.

# 06 - Object Mutability

Certain Python built-in object types (aka data types) are **mutable**.

That is, the internal contents (state) of the object in memory can be modified.

Immutable Examples are: numbers (floats, int, booleans), strings, tuples, frozen sets, user-defined classes (this can be forced to be immutable)

Mutable Examples are: lists, sets, dictionaries, user-defined classes (can be made to be mutable). 

Consider:

In [1]:
a = [1, 2]
b = [3, 4]

In [2]:
t = (a, b)
print(t)

([1, 2], [3, 4])


In [3]:
a.append(3)
b.append(5)

In [4]:
t

([1, 2, 3], [3, 4, 5])

The tuple is immutable, therefore the memory addresses of the 1st and 2nd slot in the tuple are fixed forever; we can't even add or remove slots to the tuple. Therefore, [1, 2] and [1, 2, 3] have the same memory address. However, since lists are mutable, we were able to **change the state** of the object (memory address) referenced by [1, 2] to a different state.

In [5]:
c = 7
d = 8

t2 = (c,d)
t2

(7, 8)

In [6]:
c = 9
d = 10
t2

(7, 8)

As expected, t2 did not change because the integers are immutable.

Now as we said before, lists are mutable, so the memory id of a list doesn't change when we add values via `list.append(some_value)`. 

But there are other ways that we can append a value which *does* change create a new memory id

In [7]:
my_list_1 = [1, 2, 3]
id(my_list_1)

140690577989696

In [8]:
my_list_1 = my_list_1 + [4]
id(my_list_1)

140690578193600

This is because, on the RHS, `my_list_1` has a memory id and `[4]` has a memory id; the result of concatenation points to a *completely different/new* memory id 

# 07 - Function Arguments and Mutability

Consider a function that receives a *string* argument, and changes the argument in some way:

In [50]:
def process(s):
    print('initial s # = {0}'.format(hex(id(s))))
    s = s + ' world'
    print('s after change # = {0}'.format(hex(id(s))))

In [51]:
my_var = 'hello'
print('my_var # = {0}'.format(hex(id(my_var))))

my_var # = 0x7f55ffaae330


In [53]:
process(my_var)
print('my_var # = {0}'.format(hex(id(my_var))))

initial s # = 0x7f55ffaae330
s after change # = 0x7f55d790fbb0
my_var # = 0x7f55ffaae330


Why is this? Because the change made in the function creates a new reference for `s` to point to, but that `s` is limited to the `process()` scope.

So, calling my_var in the module scope, it will point to the original reference. So, functions are unable to change the value of an immutable object.

<img src='s3-images/3.5.png' width=400 />

# 08 - Shared References and Mutability

One interesting thing to note is that if two variables have the same integer value, they will both point to the same memory reference, even though they were created independently. This is because integers are mutable objects and so we get a shared reference, done by Python Memory Manager.

In [57]:
a = 10
b = 10

print(hex(id(a)))
print(hex(id(b)))

0x7f5602434210
0x7f5602434210


But this isn't always the case; let's try a large number

In [62]:
a = 500
b = 500

print(hex(id(a)))
print(hex(id(b)))

0x7f55d79ee330
0x7f55d79ee7f0


We will see why in Section 09 - Variable Equality

With mutable objects like lists, we will never create a shared reference...

In [63]:
c = [1,2,3]
d = [1,2,3]

print(hex(id(a)))
print(hex(id(b)))   

0x7f55d79ee330
0x7f55d79ee7f0


...unless we point one variable to another.

In [64]:
e = [1,2,3]
f = e

print(hex(id(e)))
print(hex(id(f)))   

0x7f55d790da80
0x7f55d790da80


# 09 - Variable Equality

We compare memory addresses using `is (not)`: the identity operator.

We compare object states (data) using `== (!=)`: the equality operator.

The `None` object is a real object assigned to one memory address. When other variables are set to `None`, they all point to this one address.

# 10 - Everything is an Object

Functions, Classes (not just the instances of the class, but the class itself) and Type are all objects in Python.

This means that somewhere, there is: `Class Function:` etc.
 
**If something is an object, then it must have a memory address**

In [67]:
def my_func():
    pass

id(my_func)

140007019172272

So, functions are first-class citizens. You can treat them like how you treat all other variables.

# 11 - Python Optimizations - Interning

So why does Python use a shared reference for:

In [76]:
a = 10
b = 10

hex(id(a)) == hex(id(b))

True

But not for:

In [77]:
a = 500
b = 500

hex(id(a)) == hex(id(b))

False

This is called **interning** which is basically reusing objects on-demand.

At startup, Python (CPython), preloads (caches) a global list of integers in the range [-5, 256]. So memory addresses for all these numbers already exist.

**Singletons**: The range [-5, 256] are singleton objects. Basically, they are from classes which can only be instantiated once. 

Python does this as an optimisation strategy since these numbers show up a lot. 

It does not matter how we instantiate these Singleton objects, they will always point to the same address.

In [78]:
a = 10
b = int(10)
c = int('10')
d = int('1010', base=2)

a is b is c is d

True

# 12 - Python Optimizations - String Interning

Python will automatically intern *certain* strings.

In particular all the identifiers (variable names, function names, class names, etc) are interned (singleton objects created).

Python will also intern string literals that look like identifiers.

For example:

In [80]:
a = 'hello'
b = 'hello'
print(id(a))
print(id(b))

140007338337072
140007338337072


But not the following:

In [81]:
a = 'hello, world!'
b = 'hello, world!'
print(id(a))
print(id(b))

140007016788848
140006666628080


but it will work if they resemble identifiers:

In [82]:
a = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
b = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
print(id(a))
print(id(b))

140007009816896
140007009816896


We can force any string to be interned, that is, force the string to be a Singleton using sys.intern(\<str>).

In general, we dont need to use this. But it can be useful when tokenising a large corpus of text in Natural Language Processing. For example, if we had a book with 1000 'the's, then we would prefer to create one memory address and 1000 references as opposed to 1000 addresses and 1000 references. This will reduce the memory overhead.

Comparing memory addresses via `is` is *very* fast compared to comparing two strings via `==` because `==` needs to check for equality on a character-by-character basis.

Let's run a comparison using `==` many times to make the time savings non-negligible

In [83]:
def compare_using_equals(n):
    a = 'a long string that is not interned' * 200 # This is a very very long string that is not interned.
    b = 'a long string that is not interned' * 200
    for i in range(n):
        if a == b:
            pass   

In [84]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [85]:
import time

start = time.perf_counter()
compare_using_equals(100000000)
end = time.perf_counter()

print('equality: ', end - start)

equality:  23.455144189996645


In [87]:
start = time.perf_counter()
compare_using_interning(100000000)
end = time.perf_counter()

print('identity: ', end-start)

identity:  9.880999281002005


# 13 - Python Optimizations - Peephole

#### Constant Expressions

Peephole optimizations refer to a certain class of optimization strategies Python employs during any compilation phases.

Let's see how Python reduces constant expressions for optimization purposes:

In [99]:
def my_func():
    a = 24 * 60
    b = (1, 2) * 5
    c = 'abc' * 3
    d = 'ab' * 11
    e = 'the quick brown fox' * 500
    f = [1, 2] * 5

We can access the compiled constants of this function using:

In [100]:
my_func.__code__.co_consts

(None,
 1440,
 (1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
 'abcabcabc',
 'ababababababababababab',
 'the quick brown fox',
 500,
 1,
 2,
 5)

We see all numbers and strings in the present code, but some of them have been **pre-calculated/cached** like 24*\60 = 1440. But we don't see the quick brown fox printed 500 times because its too long.

We also don't see [1, 2, 1, 2, 1, 2, 1, 2, 1, 2]. This is because a list is a mutable object so its not a constant expression. The list may be modified during compilation e.g. a number may be appended to the list.

#### Membership Tests

If we perform a membership test such as `if i in [1, 2, 3]:`, then in this case, the list is a **constant** even though it's mutable. This is because the list can't change once the script is run. That is, we can't append anything to that list.

In this case, **the constant mutable list expression is replaced by its immutable expression, i.e., a tuple**.

For completeness, sets (which are mutable and created using {}) get converted to their immutable expression, i.e., frozen sets.

They are converted to these immutable forms because its much faster to perform a membership test on immutables. 

write `if i in {1, 2, 3}:` because it's faster (if the set is large and used repeatedly).

In [103]:
def my_func():
    if e in [1, 2, 3]:
        pass

In [104]:
my_func.__code__.co_consts

(None, (1, 2, 3))

In [105]:
def my_func():
    if e in {1, 2, 3}:
        pass

In [106]:
my_func.__code__.co_consts

(None, frozenset({1, 2, 3}))