# Variables and Memory

## Variables are memory references.

- Memory can be considered as a series of slots or boxes that exist in our computer, and we can store and retrieve data from those slots.
- For each of those slots, an address is needed which is referred to as *Memory Address*.
- When an object is stored in *Memory Address*, it might use more than one slot based on the requirement of space for the data. 
- Objects can overflow from one slot to another and form a *heap* in the memory. Storing and retrieving object from heap is carried out by *Python Memory Manager*.
- Consider the code: `my_var_1 = 10`. When the code is run, Python creates an object in the memory at some address (consider it '0x1000'). `my_var_1` is simply a name or alias for the memory address where the object is stored. It is said to be *reference* to the object at memory address '0x1000'.
- In Python, `id()` can be used to find out the memory address referenced by a variable. It returns a base-10 number. It can be converted to hexadecimal, by using the `hex()` function.

In [1]:
# Create a variable
my_var = 10
# Check the memory address of my_var
print(id(my_var))

2845974594064


In [2]:
# Check the memory address in hexadecimal form
print(hex(id(my_var)))

0x296a14e0210


In [3]:
greetings = "Hello"
# Check memory address of `greetings`
print(id(greetings))
# Check memory address of `greetings` in hexadecimal
print(hex(id(greetings)))

2846051346480
0x296a5e12830


### Reference Counting

- Consider the code `my_var = 10`. *Python Memory Manager* creates a reference to a memory address with alias `my_var`. It asl means, that *reference count* of that memory address is 1. Consider following code `other=my_var`. In this case, the reference count of that memory address increases to 2.
- Now, if `my_var` is deleted, then the reference counts for the memory address decrease to 1. And when `other` is deleted, then the reference count is 0. At this point, *Python Memory Manager* throws away the object, and the memory can be reused.
- `sys.getrefcount()` can be used to get reference count. However, in this case, we are passing a variable(which is a reference) to method and that creates an extra reference. So, a downside is it increases the reference count by 1.
- `ctypes.c_long.from_address().value` is another method to get the reference count. In this case, as we are directly passing the **memory address not the reference**, it does not increase reference count.

In [4]:
import sys
import ctypes

In [5]:
# Create a list
a = [1, 2, 3]
print(id(a))

2846050336448


In [6]:
# Get the reference count using sys module
sys.getrefcount(a)

2

In [7]:
# Get reference count using ctypes
def ref_count(address: int):
    return ctypes.c_long.from_address(address).value

ref_count(id(a))

1

In [8]:
# Try creating a new reference to the same address
b = a
# Check reference count of memory address referred by a
ref_count(id(a))

2

In [9]:
# Create a new reference
c = a
ref_count(id(a))

3

In [10]:
# Change the value of c to change the memory address
c = 10
ref_count(id(a))

2

In [11]:
# Change the value of b to change the memory address
b = None
ref_count(id(a))

1

In [12]:
# Change value of a to change the memory address
a_id = id(a)
a = None
ref_count(a_id)

1

**As memory address of a is changed now, the initial object is removed by Python memory Manager and the memory is free for different use. However, we don't know where it is used, that's why Python returns an arbitrary 1 in the above case.**

## Garbage Collection

### Circular References

- Consider a variable `my_var` and it points to `Object A`. Assume `Object A` has an instance `var_1` and it points to another `Object B`. At this stage, if `my_var` is modified or deleted then reference count of `Object A` is 0 and Python removes it. Due to this, reference count for `Object B` also becomes 0 and gets removed.
- Assume `Object B` also has an instance `var_2` which points to `Object A` in a circular manner. In this case, even if `my_var` is modified and deleted, the reference count for `Object A` and `Object B` is still 1. They won't get destroyed. 
- This is known as **Circular Reference.** This can lead to **Memory Leak** as `my_var` is removed, still `Object A` and `Object B` are still there.
- This is where *Garbage Collector* comes into play. GC will be able to identify the Circular Reference and destroys it.
- *Garbage Collector* can be controlled programmatically using the **gc** module.
- By default, *gc* is turned on, but it can be turned off if we are sure our code doesn't create circular references. It runs on its own periodically. It can be called it manually, and even do our own cleanup.

In [13]:
import gc

In [14]:
def object_by_id(object_id):
    for obj in gc.get_objects():
        if id(obj) == object_id:
            return "Object Exists"
    return "Not Found"

In [15]:
class A:
    def __init__(self):
        self.b = B(self)
        print(f"A: self: {hex(id(self))}, b: {hex(id(self.b))}")

In [16]:
class B:
    def __init__(self, a):
        self.a = a
        print(f"B: self: {hex(id(self))}, a: {hex(id(self.a))}")

In [17]:
gc.disable()

In [18]:
my_var = A()

B: self: 0x296a5e30880, a: 0x296a5e30cd0
A: self: 0x296a5e30cd0, b: 0x296a5e30880


In [19]:
hex(id(my_var))

'0x296a5e30cd0'

## Dynamic vs Static Typing

- Some languages (Java, C++, Swift) are statically typed.
- Consider code in Java: `String myVar = "hello";`. A String literal "hello" is created in memory as an object with some memory address. `myVar` has been declared as a string and cannot be assigned the integer value later.
- Python, in contrast, is dynamically typed.
- In Python, `my_var = 'hello';`, `my_var` is purely a reference to a string object with value hello. No type is 'attached' to `my_var`.
- In Python, variables do not have an inherent static type. When `type(my_var)` is called, Python looks up the object `my_var` is referencing(pointing to), and returns the type of the object at that memory location.

In [20]:
a = "hello"
type(a)

str

In [21]:
a = 10
type(a)

int

In [22]:
a = lambda x: x**2
type(a)

function

In [23]:
a = 3 + 4j
type(a)

complex

## Object Mutability

- Consider an object in memory.
- Changing the data inside the object is called modifying the internal state of the object. That doesn't change the memory address. 
- An Object whose internal state can be changed, is called **Mutable**. An object whose internal state cannot be changed, is called **Immutable**.
- Immutable
    - Numbers (int, float, booleans, etc.)
    - Strings
    - Tuples
    - Frozen Sets
    - User-Defined Classes can be immutable if created in a way.
- Mutable
    - Lists
    - Sets
    - Dictionaries
    - User-Defined Classes can be mutable if created in a way.
- Consider a tuple of integers: `t = (1, 2, 3)`. Tuples are immutable: elements cannot be deleted, inserted or replaced. In this case, both the container (tuple) and all its elements (ints) are immutable.
- Now, consider following: `a = [1, 2], b = [3, 4], t = (a, b), a.append(3), b.append(5)` Now t will be `([1, 2, 3], [3, 4, 5])`. As Lists are mutable: elements can be deleted, inserted, or replaced. The Tuple changes. In this case, although the tuple is immutable, its elements are not. The object references in the tuple did not change, but the referenced objects did mutate.

In [24]:
my_list = [1, 2, 3]
type(my_list)

list

In [25]:
id(my_list)

2846050634560

In [26]:
my_list.append(4)
my_list

[1, 2, 3, 4]

In [27]:
id(my_list)

2846050634560

In [28]:
my_list_1 = [1, 2, 3]
id(my_list_1)

2846051230848

In [29]:
my_list_1 = my_list_1 + [4]
my_list_1

[1, 2, 3, 4]

In [30]:
id(my_list_1)

2846051407808

In [31]:
my_dict = dict(key1=1, key2='a')
my_dict

{'key1': 1, 'key2': 'a'}

In [32]:
id(my_dict)

2846051228992

In [33]:
my_dict['key3'] = 10.5
my_dict

{'key1': 1, 'key2': 'a', 'key3': 10.5}

In [34]:
id(my_dict)

2846051228992

In [35]:
t = (1, 2, 3)
id(t)

2846051353088

In [36]:
t[0]

1

In [37]:
id(t[0])

2845974593776

In [38]:
id(t[1])

2845974593808

In [39]:
t = ([1, 2], [3, 4])
id(t)

2846050330240

In [40]:
t[0]

[1, 2]

In [41]:
t[1]

[3, 4]

In [42]:
t[0].append(3)
t

([1, 2, 3], [3, 4])

## Function Arguments and Mutability

- Once a string has been created, the contents of the object can never be changed. Consider an example: `my_var = 'hello`. The only way to modify the "value" of `my_var` is to re-assign my_var to another object.
- Immutable objects are safe from unintended side-effects.
- Consider below code:
```
def process(s):
    s = s + ' world'
    return s

my_var = 'hello'
process(my_var)
```
- When function `process(my_var)` is called, `my_var's` reference is passed to `process()`. `process()` scope stores the reference in the variable 's'. When code line `s = s + ' world'` is run, the reference to `process()` scope is changed to another memory address with `Hello World`. The memory address of `my_var` doesn't change. It happens because `str` is immutable and hence the only way to modify `s` is by creating a new memory address and referencing to it.
- Mutable objects are not safe from unintended side-effects.
- Consider below code:
```
def process(lst):
    lst.append(100)

my_list = [1, 2, 3]

process(my_list)
```
- `my_list's` reference is passed to `process()`. When `process(my_list)` is called, lst in `process()` scope refers to the memory address to `[1, 2, 3]` where initially `my_list` refers to. However, when `lst.append(100)`, it changes the list `[1, 2, 3]` to `[1, 2, 3, 100]`, the memory address still remains the same. Hence, the value of `my_list` changes to `[1, 2, 3, 100]` as the memory address hasn't changed. This is the side effect as the state of the variable changes.

In [43]:
def process(s):
    print(f"Initial s # = {id(s)}")
    s = s + ' world'
    print(f"Final s # = {id(s)}")

In [44]:
my_var = 'hello'
print(f"my_var # = {id(my_var)}")

my_var # = 2846047011696


In [45]:
process(my_var)

Initial s # = 2846047011696
Final s # = 2846051404400


In [46]:
my_var

'hello'

In [47]:
def modify_list(lst):
    print(f"Initial lst # = {id(lst)}")
    lst.append(100)
    print(f"Final lst # = {id(lst)}")

In [48]:
my_list = [1, 2, 3]
id(my_list)

2846051404416

In [49]:
modify_list(my_list)

Initial lst # = 2846051404416
Final lst # = 2846051404416


In [50]:
def modify_tuple(t):
    print(f"Initial t # = {id(t)}")
    t[0].append(100)
    print(f"Final t # = {id(t)}")

In [51]:
my_tuple = ([1, 2], 'a')
id(my_tuple)

2846051349376

In [52]:
modify_tuple(my_tuple)

Initial t # = 2846051349376
Final t # = 2846051349376


In [53]:
my_tuple

([1, 2, 100], 'a')

## Shared References and Mutability

- The term *Shared Reference* is the concept of two variables referencing the same object in memory, i.e., having the same memory address.
- Consider code: `a = 10; b = 10; s1 = 'hello'; s2 = 'hello';`
    - In both cases above, Python's memory manager decides to automatically re-use the memory referenced. a and b both will refer to the same memory address; s1 and s2 will refer to the same memory address.
- With mutable objects, Python memory manager will never create shared references. Consider code `a=[1, 2, 3]; b=[1, 2, 3]`. In this case Python memory manager will create separate memory address for `a` and `b` even though they have same value.

In [54]:
a = "hello"
b = a
hex(id(a)), hex(id(b))

('0x296a59f0370', '0x296a59f0370')

In [55]:
a = "harry"
b = "harry"
hex(id(a)), hex(id(b))

('0x296a5e46370', '0x296a5e46370')

In [56]:
a = [1, 2, 3]
b = a
b.append(100)
b, a

([1, 2, 3, 100], [1, 2, 3, 100])

## Variable Equality

- Variable equality can be thought in two fundamental ways:
    - Memory Address: `is` identity operator is used. `var_1 is var_2`. Negation: `var_1 is not var_2`
    - Object State (data): `==` equality operator is used. `var_1 == var_2`. Negation: `var_1 != var_2`
- The `None` object can be assigned to variables to indicate that they are not set (in the way we expect them to be) i.e., an "empty" value (or null pointer).
- But the `None` object is a real object managed by the Python Memory Manager. The Memory Manager will always use a shared reference when assigning a variable to `None`. Hence, we can test if a variable is "not set" or "empty" by comparing it's memory address to the memory address of "None" using the `is` operator.

In [57]:
a = 10
b = 10
id(a), id(b)

(2845974594064, 2845974594064)

In [58]:
print(a is b)

True


In [59]:
print(a == b)

True


In [60]:
a = 500
b = 500
id(a), id(b)

(2846050498480, 2846050496688)

In [61]:
print(a is b)

False


In [62]:
print(a == b)

True


In [63]:
a = 10
b = 10.0
print(a is b)
print(a == b)

False
True


In [64]:
id(None)

140733849384952

In [65]:
type(None)

NoneType

In [66]:
a = None
b = None
a is b

True

## Everything is an Object

- All Data types(e.g. Integers, Booleans, Strings, Lists, etc.) and Constructs(e.g. Operators, Functions, Classes, Types, etc.) are all `objects` i.e. instances of classes. Like below:
    - Functions (functions)
    - Classes (class)
    - Types (type)
- This means they all have a memory address.
- As a consequence: 
    - Any object can be assigned to a variable including functions...
    - Any object can be passed to a function including functions...
    - Any object can be returned from a function including functions...

## Python Optimizations

### Interning: Reusing objects on-demand

- At startup, Python pre-loads (caches) a global list of integers in the range [-5, 256]. Any time an integer is referenced in that range, Python will use the cached version of that object. The integers in the range [-5, 256] are `Singletons` objects. **Singleton** objects are classes that can only be instantiated once.
- This is *Optimization Strategy*: Small integers show up often.
- When we write: `a = 10`, Python just has to point to the existing reference for 10. But if we write: `a = 257`, Python does not use that global list and a new object is created every time.

### String Interning

- Some Strings are also automatically interned—but not all.
- As the Python code is compiled, identifiers are interned
    - variable names
    - function names
    - class names
    - etc.
- Some string literals may also be automatically interned:
    - String literals that look like identifiers.
    - Although if it starts with a digit, even though that is not a valid identifier, it may still get interned.
- Python does it all for (speed and possibly memory) optimization. Python, both internally and in the code you write deals with lots and lots of dictionary type lookups, on string keys, which means a lot of string equality testing.
- Consider we want to check if two strings are equal: `a = 'some_long_string', b = 'some_long_string'`. Using `a == b`, compares two strings *character by character*. However, if we know that `some_long_string` has been *interned*, then `a` and `b` are the same string if they both point to the same memory address. Hence we can use `a is b` instead which compares two integers and is much faster than the character by character comparison.
- Not all strings are automatically interned by Python, but we can forch strings to be interned by using the `sys.intern()` method.

### Peephole

- It occurs at compile time.
- Constant expressions get pre-calculates and are stored. e.g. `24 * 60`.
- Constant expressions of short sequences length < 20 get pre-calculated and stored. e.g. `(1, 2) * 5; 'abc' * 3; 'hello' + ' world'`
- Membership Tests: Mutables are replaced by Immutables. When membership tests such as: `if e in [1, 2, 3]:` are encountered, the [1, 2, 3] constant is replaced by its immutable counterpart. e.g. in this case by a tuple: (1, 2, 3).
- Set membership is much faster than list or tuple membership. So, instead of writing: `if e in [1, 2, 3]` or `if e in (1, 2, 3)` write: `if e in {1, 2, 3}`

In [67]:
a = 'hello'
b = 'hello'
print(id(a), id(b))
print(a == b)
print(a is b)

2846047011696 2846047011696
True
True


In [68]:
a = 'hello world'
b = 'hello world'
print(id(a), id(b))
print(a == b)
print(a is b)

2846051748784 2846051750896
True
False


In [69]:
a = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
b = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
print(id(a), id(b))
print(a == b)
print(a is b)

2846051731360 2846051731360
True
True


In [70]:
# Force String Intern
import sys
a = sys.intern('hello world')
b = sys.intern('hello world')
c = 'hello world'
print(id(a), id(b), id(c))

2846051755440 2846051755440 2846051755376


In [71]:
print(a == b)
print(a is b)

True
True


In [72]:
def compare_using_equals(n):
    a = "a long string that is not interned" * 200
    b = "a long string that is not interned" * 200
    for i in range(n):
        if a == b:
            pass

In [73]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [74]:
import time

In [75]:
start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()
print(f"Time taken for equality check: {end - start}")

Time taken for equality check: 1.6368136999662966


In [76]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()
print(f"Time taken for equality check: {end - start}")

Time taken for equality check: 0.15855599998030812


In [77]:
def my_func():
    a = 24 * 60
    b = (1, 2) * 5
    c = 'abc' * 3
    d = 'ab' * 11
    e = 'the quick brown fox' * 5
    f = ['a', 'b'] * 3

In [78]:
my_func.__code__.co_consts

(None,
 1440,
 (1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
 'abcabcabc',
 'ababababababababababab',
 'the quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown fox',
 'a',
 'b',
 3)

In [79]:
def my_func(e):
    if e in [1, 2, 3]:
        pass

In [80]:
my_func.__code__.co_consts

(None, (1, 2, 3))

In [81]:
# Set Membership vs List Membership
import string
import time

char_list = list(string.ascii_letters)
char_tuple = tuple(string.ascii_letters)
char_set = set(string.ascii_letters)

def membership_test(n, container):
    for i in range(n):
        if 'z' in container:
            pass
        
start = time.perf_counter()
membership_test(10000000, char_list)
end = time.perf_counter()
print(f"Time for list: {end-start}")

start = time.perf_counter()
membership_test(10000000, char_tuple)
end = time.perf_counter()
print(f"Time for tuple: {end-start}")

start = time.perf_counter()
membership_test(10000000, char_set)
end = time.perf_counter()
print(f"Time for set: {end-start}")

Time for list: 1.9330784000339918
Time for tuple: 1.9431662999559194
Time for set: 0.17709630000172183
