# Memory and Optimization in Python

## Variables are Memory References

We can find the memory address that a variable *references*, by using the `id()` function.

The `id()` function returns the memory address of its argument as a base-10 integer.

We can use the function `hex()` to convert the base-10 number to base-16.

In [None]:
user_id = 12
print(f"user_id:{user_id}")
print(f"Address of user_id:{hex(id(user_id))}")

In [None]:
user_name = "Mike"
print(f"user_id:{user_name}")
print(f"Address of user_id:{hex(id(user_name))}")

## References

Let's make a variable, and check it's reference count:

In [None]:
customer_list = [1,2,3]

In [None]:
import sys 
sys.getrefcount(customer_list)

How its returning 2 , seems like we found a bug in Python !!!

Answer is the *sys.getrefcount()* function takes **my_var** as an argument, this means it receives (and stores) a reference to **my_var**'s memory address **also** - hence the count is off by 1. So we will use *from_address()* instead.

Method that returns the reference count for a given variable's memory address:

Python **ctypes** has a method to return the reference count , now lets write our function to get the count usin **ctypes**

In [3]:
import ctypes

def reference_count(address):
    return ctypes.c_long.from_address(address).value

In [None]:
address = id(customer_list)

reference_count(address)

Lets do something interesting , but the below approach should be followed

In [None]:
dup_customer_list = customer_list

address_dup = id(dup_customer_list)

reference_count(address_dup)

## Garabage Collection 

In [1]:
import gc

We create a function that will search the objects in the GC for a specified id and tell us if the object was found or not

In [16]:
def obj_availabilty(obj_id):
    for obj in gc.get_objects():
        if id(obj) == obj_id:
            return "Object exists!"
    return "Not Found!"

Next we define two classes that we will use to create a circular reference

Class User's constructor will create an instance of class Purchase_History and pass itself to class Purchase_History's constructor that will then store that reference in some instance variable.

In [5]:
class User:
    def __init__(self):
        self.purchase_history = Purchase_History(self)
        print(f'User: self: {hex(id(self))}, Purchase_History:{hex(id(self.purchase_history))}')

In [6]:
class Purchase_History:
    def __init__(self, user):
        self.user = user
        print(f'Purchase_History: self: {hex(id(self))}, User: {hex(id(self.user))}')

We turn off the GC so we can see how reference counts are affected when the GC does not run and when it does (by running it manually).

In [2]:
gc.disable()

Now we create an instance of User, which will, in turn, create an instance of Purchase_History which will store a reference to the calling User instance.

In [7]:
user_1 = User()

Purchase_History: self: 0x1bfe98a5da0, User: 0x1bfe98a5e10
User: self: 0x1bfe98a5e10, Purchase_History:0x1bfe98a5da0


As we can see User and Purchase_History's constructors ran, and we also see from the memory addresses that we have a circular reference.

In fact `user_1` is also a reference to the same User instance:

In [None]:
print(hex(id(user_1)))

Now lets how many references we have for `User` and `Purchase_History` is available

In [8]:
user_id = id(user_1)
purchase_history_id = id(user_1.purchase_history)

In [17]:
print(f'reference_count(User) = {reference_count(user_id)}')
print(f'reference_count(Purchase_History) = {reference_count(purchase_history_id)}')
print(f'User: {obj_availabilty(user_id)}')
print(f'Purchase_History: {obj_availabilty(purchase_history_id)}')

reference_count(User) = 2
reference_count(Purchase_History) = 1
User: Object exists!
Purchase_History: Object exists!


Now, let's remove the reference to the A instance that is being held by `user_1`:

In [18]:
user_1 = None

In [19]:
print(f'reference_count(User) = {reference_count(user_id)}')
print(f'reference_count(Purchase_History) = {reference_count(purchase_history_id)}')
print(f'User: {obj_availabilty(user_id)}')
print(f'Purchase_History: {obj_availabilty(purchase_history_id)}')

reference_count(User) = 1
reference_count(Purchase_History) = 1
User: Object exists!
Purchase_History: Object exists!


we can see, the reference counts are now both equal to 1 (a pure **circular reference**), and reference counting alone did not destroy the User and Purchase_History instances - they're still around. 
If no garbage collection is performed this would result in a **memory leak**.

Let's run the GC manually and re-check

In [20]:
gc.collect()

177

In [21]:
print(f'reference_count(User) = {reference_count(user_id)}')
print(f'reference_count(Purchase_History) = {reference_count(purchase_history_id)}')
print(f'User: {obj_availabilty(user_id)}')
print(f'Purchase_History: {obj_availabilty(purchase_history_id)}')

reference_count(User) = 0
reference_count(Purchase_History) = 0
User: Not Found!
Purchase_History: Not Found!


Now lets Enable GC to do its work

In [22]:
gc.enable()

## Object Mutability

Certain Python built-in object types (aka data types) are **mutable**.

That is, the internal contents (state) of the object in memory can be modified.

In [23]:
my_list = [1, 2, 3]
print(my_list)
print(hex(id(my_list)))

[1, 2, 3]
0x1bfe8b621c8


In [24]:
my_list.append(4)
print(my_list)
print(hex(id(my_list)))

[1, 2, 3, 4]
0x1bfe8b621c8


As you can see, the memory address of *my_list* has **not** changed.

But, the **contents** of *my_list* has changed from *[1, 2, 3]* to *[1, 2, 3, 4]*.

On the other hand, consider this:

In [25]:
my_list_1 = [1, 2, 3]
print(my_list_1)
print(hex(id(my_list_1)))

[1, 2, 3]
0x1bfea6a1448


In [26]:
my_list_1 = my_list_1 + [4]
print(my_list_1)
print(hex(id(my_list_1)))

[1, 2, 3, 4]
0x1bfea6a1508


## Args Mutability

Consider a function that receives a *string* argument, and changes the argument

In [33]:
def hex_id(var):
    return hex(id(var))

In [34]:
def greet(name):
    print(f"name location before concatenation:{hex_id(name)}")
    name = "Hello" +" "+ name
    print(f"name location after concatenation:{hex_id(name)}")
    return name

In [35]:
u_name = "Steve"
print(f"u_name is located at {hex_id(u_name)}")

u_name is located at 0x1bfe973fca8


Note that when *name* is received, it is referencing the same object as *u_name*.

After we "modify" *name*, *name* is pointing to a new memory address:

In [36]:
greet(u_name)

name location before concatination:0x1bfe973fca8
name location after concatination:0x1bfea6a1630


'Hello Steve'

Use Case of Mutability on Mutable Obj arg:

In [48]:
def list_updater(container,value):
    print(f"container location before update:{hex_id(container)}")
    index = 0
    while index <= len(container):
        if value in container:
            break
        else:
            container.append(value)
            print(f"container location after update:{hex_id(container)}")
        index += 1
    return None
    

In [49]:
my_items = [21,92,43]

print(f"my_items is located at:{hex_id(my_items)}")

my_items is located at:0x1bfe9a18b88


In [50]:
list_updater(my_items,21)

container location before update:0x1bfe9a18b88


In [51]:
list_updater(my_items,22)

container location before update:0x1bfe9a18b88
container location after update:0x1bfe9a18b88


In [52]:
print(my_items)

[21, 92, 43, 22]


The memory address referenced by *my_list* and *items* is always the **same** (shared) reference.
we are simply modifying the contents (**internal state**) of the object at that memory address.

Use Case of Mutability on Immutable Obj arg:

In [54]:
def tuple_updater(container,value):
    print(f"container location before update:{hex_id(container)}")
    container[0].append(100)
    print(f"container location after update:{hex_id(container)}")
    return None    

In [55]:
my_tuple = ([23,91],"Range")
print(f"my_tuple is located at:{hex_id(my_tuple)}")

my_tuple is located at:0x1bfea67a108


In [57]:
tuple_updater(my_tuple,20)

container location before update:0x1bfea67a108
container location after update:0x1bfea67a108


In [58]:
print(f"my_tuple is located at:{hex_id(my_tuple)}")

my_tuple is located at:0x1bfea67a108


As you can see, the first element of the tuple was mutated.

## Variable Equality 

In [59]:
unit = 10
customer_id = 10 
print(f"Unit is located at:{hex_id(unit)}")
print(f"Customer_id is located at:{hex_id(customer_id)}")
print(f"Integer value 10 is located at:{hex_id(10)}")

Unit is located at:0x7ff8acbdb470
Customer_id is located at:0x7ff8acbdb470
Integer value 10 is located at:0x7ff8acbdb470


When we use the **is** operator, we are comparing the memory address **references**:

In [60]:
print(f"Unit and Customer_id are located at the same location:{unit is customer_id}")

Unit and Customer_id are located at the same location:True


Lets take a mutable obj example:

In [66]:
user_ids = [1,2,3]
data_rows = [1,2,3]

print(f"user_id is located at:{hex_id(user_ids)}")
print(f"datarows is located at:{hex_id(data_rows)}")

user_id is located at:0x1bfe9a2cc48
datarows is located at:0x1bfea647d48


Although they are not the same objects, they do contain the same "values":

In [67]:
print(f"user_id and data_rows located at same location:{user_ids is data_rows}")
print(f"user_id and data_rows are having same value:{user_ids == data_rows}")

user_id and data_rows located at same location:False
user_id and data_rows are having same value:True


In [68]:
row_no = 90
age = 90.0
print(f"row_no is located at:{hex_id(row_no)}")
print(f"age is located at:{hex_id(age)}")

row_no is located at:0x7ff8acbdbe70
age is located at:0x1bfe8a12d38


In [69]:
print(f"row_no belongs to:{type(row_no)}")
print(f"age belongs to:{type(age)}")

row_no belongs to:<class 'int'>
age belongs to:<class 'float'>


## Python Optimizations: Interning

Earlier, we saw shared references being created automatically by Python:

Python pre-caches integer objects in the range [-5, 256]

In [71]:
high = 257
low = 256
current = 256

In [72]:
print(f"stocks high @ {id(high)}")
print(f"stocks low @ {id(low)}")
print(f"stocks current @ {id(current)}")

stocks high @1923767769712
stocks low @140706026738480
stocks current @140706026738480


In [73]:
low is current

True

The integers in the range [-5, 256] are essentially **singleton** objects.

## How about Strings then

Python will automatically intern *certain* strings.

In particular all the identifiers (variable names, function names, class names, etc) are interned (singleton objects created).

Python will also intern string literals that look like identifiers.

For example:

In [74]:
greet_1 = 'hello'
greet_2 = 'hello'
print(id(greet_1))
print(id(greet_2))

1923782028232
1923782028232


In [75]:
#but not the below
greet_1 = 'hello, world!'
greet_2 = 'hello, world!'
print(id(greet_1))
print(id(greet_2))

1923783239280
1923770121200


However, because the following literals resemble identifiers, even though they are quite long, Python will still automatically intern them:

In [76]:
greet_1 = 'hello_world'
greet_2 = 'hello_world'
print(id(greet_1))
print(id(greet_2))

1923783220592
1923783220592


How about even more longer once:

In [80]:
string1 = '_this_is_a_long_string_that_could_be_used_as_an_Identifier'
string2 = '_this_is_a_long_string_that_could_be_used_as_an_Identifier'
print(id(string1))
print(id(string2))

1923770694544
1923770694544


In [78]:
statement1 = "HELLO_ALL_HOW_ARE_DOING"
statement2 = "HELLO_ALL_HOW_ARE_DOING"
print(id(statement1))
print(id(statement2))

1923768028016
1923768028016


In [82]:
greet_1 = 'hello world'
greet_2 = 'hello world'
print(id(greet_1))
print(id(greet_2))

1923783239280
1923770080496


Interning strings (making them singleton objects) means that testing for string equality can be done faster by comparing the memory address

#### <font color="orange">Note: Remember, using `is` ONLY works if the strings were interned!</font>

Here's where this technique fails:

In [83]:
greet_1 is greet_2

False

But still equality works:

In [88]:
greet_1 == greet_2

True

We *can* force strings to be interned (but only use it if you have a valid performance optimization need):

In [85]:
greet_1 = sys.intern('hello world')
greet_2 = sys.intern('hello world')
greet_3 = 'hello world'

In [86]:
print(id(greet_1))
print(id(greet_2))
print(id(greet_3))

1923770120944
1923770120944
1923770102320


Notice how `greet_1` and `greet_2` are pointing to the same object, but `greet_3` is **NOT**.

So, since both `greet_1` and `greet_2` were interned we can use `is` to test for equality of the two strings:

In [87]:
greet_1 is greet_2

True