##### STA 141B Data & Web Technologies for Data Analysis

### Lecture 5, 1/20/26, Memory handling


### Announcements:

- First homework assignment was uploaded on Gradescope. Deadline is 01/30/26.
- Group registration deadline this Friday!

### Today's topics

 - Scope of variables
 - Memory Handling in Python
     - Stack and Heap
     - Types
     - Reference Semantics
     - Interning

### Scope

Note that in Python, unlike many other languages, we do not have to declare variables. They are initialized the moment we assign them.

The location(s) where a variable can be accesed from is called the scope of a variable. By default, variables are defined as local variables, meaning they can only be accessed within the environment they are created, but not outside of it.

Before presenting our first examples, we have to clean our workspace to ensure it is empty.

In [6]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])?  y


In [7]:
dir() #only standard functions remain

['In',
 'Out',
 '__builtin__',
 '__builtins__',
 '__name__',
 '_dh',
 '_i',
 '_i7',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'get_ipython',
 'open',
 'quit']

One of the most common example for such an environment is a function. (Local) variables assigned within a function cannot be used outside of the function:

In [None]:
def example_fn():
    my_var = "australia"
    print("So many cangaroos in", my_var)

example_fn()

Note that variables defined within a function only "live" within this function and cannot be accessed from outside.

In [None]:
print(my_var)

In contrast, variables initiated outside of functions can be used within them.

In [None]:
my_var = "australia"

def example_fn():
    print("So many cangaroos in", my_var)

example_fn()

In [None]:
print(my_var)

However, there is one exception to it. If a variable will be assigned within a function, the python interpreter will interpret the variable as local and throw an error, if it is used within the function before it has been assigned.

In [None]:
my_var = "australia"

def example_fn():
    print("So many cangaroos in", my_var)
    my_var = "austria"
    print("So many cangaroos in", my_var)

example_fn()

Recall that variables defined within a function only "live" within this function.

In [None]:
my_var = "australia"

def example_fn():
    my_var = "austria"
    print("So many cangaroos in", my_var)

example_fn()
print("So many cangaroos in", my_var)

If you want to access a variable from outside of the function, you can use the keyword `global`.

In [None]:
my_var = "australia"

def example_fn():
    global my_var
    print(my_var)
    my_var = "austria"
    print("So many cangaroos in", my_var)

example_fn()
print("So many cangaroos in", my_var) # compare with the previous code!

However, changing variables within a function using the `global` keyword is considered bad practice, because it may be invisible to the reader, if they only read the main code.

In [None]:
# Better:
my_var = "australia"

def example_fn(my_var):
    print(my_var)
    my_var = "austria"
    print("So many cangaroos in", my_var)
    return(my_var)

my_var = example_fn(my_var)
print("So many cangaroos in", my_var)

Note that you cannot define a global variable to be accessible outside the function:

In [None]:
# Better:
my_var = "australia"

def example_fn(my_var):
    print(my_var)
    my_var = "austria"
    print("So many cangaroos in", my_var)

example_fn(my_var)
print("So many cangaroos in", my_var)

In [None]:
del global_var

In [None]:
def example_fn():
    global global_var
    global_var = "australia"

print("So many cangaroos in", global_var)

In [None]:
# Correction
global_var = None

def example_fn():
    global global_var
    global_var = "australia"

example_fn()
print("So many cangaroos in", global_var)

In [None]:
%reset

In [None]:
# Another solution:
def example_fn():
    global_var = "australia"
    return(global_var)

global_var = example_fn()
print("So many cangaroos in", global_var)

### Stack and Heap

In [None]:
x = True
type(x)

`x` is a variable, which corresponds to an <kbd>bool</kbd> object with value `True`. The variable itself holds merely a reference to a specific object. This reference is stored in local memory (the *stack*). Our compiler takes care in allocating stack memory, we don't have to do that. 

The <kbd>bool</kbd>-object and its value are stored on the random access memory (RAM, the *heap*). We can access the address of the object on the heap (and, conversely, the reference on the stack): 

In [None]:
id(x)

In [None]:
help(id)

In [None]:
help(hex)

In [None]:
hex(id(x))

In [None]:
y = float(x)
hex(id(y))

In [None]:
y = int(y)
hex(id(y))

In Python, we can change the type of a variable.

In [None]:
hex(id(x))

In [None]:
x = int(x)
type(x)

In [None]:
hex(id(x))

<img src="../images/memory1.png" alt="" width="800"/>

In [None]:
del x

As soon as the `x`-variable, which previously referenced to the <kbd>bool</kbd> object is out of scope (either by deletion or recasting), the object on the heap is ready to be overwritten by the garbage collector. 



Let's work through the phrases: *Everything in Python is an object*. Some basic default objects (*types*) we have already met are 

- Numeric: <kbd>int</kbd>, <kbd>floats</kbd>, <kbd>complex</kbd>
- Boolean: <kbd>bool</kbd>
- String: <kbd>str</kbd>
- Sequence: <kbd>list</kbd>, <kbd>tuple</kbd>, <kbd>range</kbd>
- Mapping: <kbd>dict</kbd>

The function `sys.getsizeof` ([docs](https://docs.python.org/3/library/sys.html?highlight=getsizeof#sys.getsizeof)) returns the size in bytes of the object the variable points to. 

In [None]:
x = float(True)

In [None]:
y = int(1)

In [None]:
import sys
sys.getsizeof(x)

In [None]:
sys.getsizeof(y)

A <kbd>float</kbd> is less expensive than an <kbd>integer</kbd>. This is because <kbd>integer</kbd> stores additional information about size together with the actual value. The larger the integer, the more memory required. 

In [None]:
sys.getsizeof(100 ** 10)

In [None]:
sys.getsizeof(100.0 ** 10)

However, <kbd>integer</kbd> can store larger values than <kbd>float</kbd>. 

In [None]:
x = 500 ** 500 
type(x)

In [None]:
x

In [None]:
sys.getsizeof(x)

In [None]:
float(x)

In [None]:
# N
import sys
print(sys.float_info)

In [None]:
print(sys.int_info)

The function `range(start, stop, step)` ([docs](https://docs.python.org/3/library/stdtypes.html#range)) creates a <kbd>range</kbd> type object. It starts at `start` and ends at `stop - 1`, but does not instantiate an object of that length. 

In [None]:
x = range(0, 500**500)
sys.getsizeof(x)

In [None]:
sys.getsizeof(500**500)

A <kbd>tuple</kbd> is an ordered collection of values. Think of coordinates. <kbd>tuple</kbd> is immutable, which means they can't be changed after they're created.

In [None]:
x = 1, 3.0, "horse" # parenthesis are optional, but should be used for clarity 
x

In [None]:
type(x)

In [None]:
sys.getsizeof(x)

A tuple with one element can be created with a comma:

In [None]:
z = 1,   # or z = (1,) N
type(z)

A <kbd>tuple</kbd> is inmutable. We have learned that once created, it can't be changed!

In [None]:
x[2] = 'horsies'

In [None]:
try: x[2] = 'horsies' 
except: 
    print('Tuples are inmutable!')

This is a feature, not shortcoming of <kbd>tuple</kbd>. Since they cannot be changed nor appended, they are more  economical than <kbd>list</kbd>. <kbd>list</kbd> is the mutable counterpart of <kbd>tuple</kbd>. They are instantiated with square brackets. 

In [None]:
y = [1, 3.0, "horse"]
y

In [None]:
type(y)

In [None]:
sys.getsizeof(y)

Lists are mutable, and in particular appendable. Since these actions are allowed, <kbd>list</kbd> objects require  more memory. The return of `sys.getsizeof` does not coincide with the values in the list! Instead, `y` is a variable with a reference to a <kbd>list</kbd> object on the heap, *which itself is a collection of adresses*. This collection of adresses takes $120$ bytes. 

In [None]:
sys.getsizeof(y)

In [None]:
sum([sys.getsizeof(i) for i in y])

In [None]:
sys.getsizeof(1) + sys.getsizeof(3.0) + sys.getsizeof("horse")

In contrast to <kbd>tuples</kbd>, they are however mutable. 

In [None]:
y[2] = "horsies"
y

### Reference Semantics

Lists use *reference semantics*, which means that if you assign a list to two different variables, there's still only one list in memory, and both variables refer to it. As a result, changing the list with one variable changes the list for the other variable.

In [None]:
x = y

In [None]:
hex(id(x))

In [None]:
hex(id(y))

In [None]:
x[0] = "my"
y

A new, non-referenced object can be created by slicing. 

In [None]:
z = y[:]

In [None]:
hex(id(z))

In [None]:
z

In [None]:
z[1] = 4

In [None]:
hex(id(z[1]))

In [None]:
hex(id(y[1]))

In [None]:
print(y)

In [None]:
y[1] = 0

In [None]:
print(z)

In [None]:
print(x)

<img src="../images/memory2.png" alt="" width="1000"/>

Alternatively, we can use the copy method ([docs](https://docs.python.org/3/library/copy.html)) to the original list. 

In [None]:
z = y[:] # or even better:

In [None]:
z = y.copy()
hex(id(z))

In [None]:
hex(id(y))

While the copies `y` and `z` are *equal*, the are not *identical*, because they point to different objects. 

In [None]:
y == z # equal

In [None]:
y is z # identical

In [None]:
y is x # identical

In [None]:
y[1] = 2
print(y)
print(z) 

Attention! This is a *shallow copy*, i.e., objects whithin the list will not be be reinstantiated! Above, the command `y[1] = 2` just instantiates a new <kbd>int</kbd> object of value `2` on the heap and replaces the former reference in `y` with the reference to that new object. 

In [None]:
hex(id(z[1])) == hex(id(y[1]))

This becomes tricky if the list references to another list: 

In [None]:
a = ['a', 'list']

In [None]:
y = [1, 2, 'three', a]

In [None]:
print(y)

In [None]:
z = y.copy()

In [None]:
z[3][0] = 1

In [None]:
print(z)

In [None]:
print(y)

In [None]:
hex(id(y))

In [None]:
hex(id(z))

In [None]:
hex(id(y[3]))

In [None]:
hex(id(z[3]))

In [None]:
z[0] = 3

In [None]:
y

In [None]:
z

In [None]:
y[3][1] = 'ha'

In [None]:
print(y)
print(z)

In [None]:
hex(id(z[3])) == hex(id(y[3])) 

Although both lists are real copies, they reference to the same other list `a`, which has not been copied. 

In [None]:
hex(id(z[3])) == hex(id(y[3]))

This behaviour is irrespecive of the variable `a`. We can remove it from the scope. Since the list object `a` has pointed to still is in scope, it will not be taken by the garbage collector. 

In [None]:
hex(id(a))

In [None]:
del(a)

In [None]:
hex(id(z[3]))

We can copy the upper-level lists as well by calling the `copy.deepcopy`. 

In [None]:
from copy import deepcopy
z = deepcopy(y)

In [None]:
z

In [None]:
y

In [None]:
y[3][0] = "deepcopy"

In [None]:
print(y)
print(z)

In [None]:
hex(id(x))

In [None]:
hex(id(y))

In [None]:
hex(id(y)) == hex(id(x))

In [None]:
hex(id(z[1]))

In [None]:
hex(id(y[1]))

While the copies `y` and `z` are *equal*, the are not *identical*, because they point to different objects. 

In [None]:
y == z # equal

In [None]:
y is z # identical

In [None]:
# N
# however, without a deepcopy
u = y
y[3][0] = 'hi'
print(u[3])

In [None]:
# N
a = [1, 2, 3]
u = ['foo', a]
a[1] = 'bar'
print(u[1])

In [None]:
# N
# in contrast
from copy import deepcopy
a = [1, 2, 3]
u = ['foo', a]
v = deepcopy(u)
a[1] = 'two'
u[1][0] = "asdf"
print(u)
print(v)

In [None]:
# N
# how many levels?
from copy import deepcopy

b = [1, 2]
c = ['foo', b]
d = ['bar', c]

s = deepcopy(d)
b[0] = 'penguins'

print(d)
print(s)
# answer: all

### Interning 

The heap memory is memory that can be accessed and reserved by the programmer. Usually, this is tedious and automatically done. To optimize this process, Python uses *interning* to allocate ressources. Since `x` is merely a pointer to the <kbd>int</kbd> type object with value `1`, any other variable can point to the same adress.  

In [None]:
x = 3.37877

In [None]:
y = 3.37877

In [None]:
hex(id(x)) == hex(id(y))

In [None]:
x = 1

In [None]:
y = 1

In [None]:
hex(id(x)) == hex(id(y))

This does not mean that integers use reference semantics! 

In [None]:
print(x == y)
print(x is y)

In [None]:
y = 2
x

In [None]:
hex(id(x)) == hex(id(y))

Integer internalization is only done from `-5` to `255`. 

In [None]:
x = 4.0
y = 4.0
hex(id(x)) == hex(id(y))

In [None]:
x = 400
y = 400
hex(id(x)) == hex(id(y))

Interning works for several simple types: 

In [None]:
x = "Hi"
y = "Hi"

In [None]:
hex(id(x)) == hex(id(y))

Interning can be forced using `sys.intern`. 

In [None]:
a = "This is quite a long string."
b = "This is quite a long string."
hex(id(a)) == hex(id(b))

In [None]:
a = "This is quite a long string."
b = a
hex(id(a)) == hex(id(b))

In [None]:
import sys
a = sys.intern("This is quite a long string.")
b = sys.intern("This is quite a long string.")
hex(id(a)) == hex(id(b))

In [None]:
c = "This is quite a long string."
hex(id(a)) == hex(id(c))

When using `sys.intern`, then we can internalize an object without it being pointed to on the heap. 

In [None]:
a = sys.intern("This is quite a long string.")
hex(id(a))

In [None]:
del a
b = sys.intern("This is quite a long string.")
hex(id(b))

For reoccuring data, interning allows to use the heap economically. 

### Summary 

- There is stack and heap memory
- All objects are stored on the heap
- Lists are versatile, but generally inefficient
- Optimize heap usage via interning

##### Type hinting

In [None]:
def upper_case(text: str) -> str:
    return(text.upper())

In [None]:
my_text: str = "Hello world"