In [1]:
import copy
import sys

# Memory references

Python is often described as a 'weakly typed language' with 'duck-typing'. These descriptions allude to aspects about the Python VM design which we will go into more detail here.

## The short answer

In Python, every time an object or variable appears, it isn't really an object or variable. Everything in Python is a __memory reference__. This means that the object or variable actually points to a piece of memory, and that the object or variable is actually at that memory location.

This may seem like an academic observation, but it has important rammifications. Indeed, Python being a high level language deliberately abstracts these concepts away from the user, whilst also giving relevant tools to dig into those aspects and use them (in this sense, the language and VM is actually pretty well designed, although by no means perfect).

Consider the following variables defined and manipulated:

In [2]:
a = [1,2,3]  # we define a list 
b = a  # we assign a variable b to the variable a. This is an 'assignment by reference', not 'by value'
b[1] = 10  # we manipulate b in some way
print(b)  # we see the effect on b
print(a)  # and we also see the effect on a

[1, 10, 3]
[1, 10, 3]


We assigned b to a "by reference". This means that a and b both refer to the same piece of memory. That means that manipulations on the memory referred to by b, will have an effect on the memory referred to by a. We can get round this by making an assignment onto a new piece of memory, and transferrinf the values in

In [3]:
b = copy.copy(a)  # under the hood, this makes a copy in memory, and transfers over the values
b[1] = 100  # we manipulate b in some way
print(b)  # we see the effect on b
print(a)  # we see there is no effect on a

[1, 100, 3]
[1, 10, 3]


Now, both a and b refer to different pieces of memory; so manipulations on the memory referred to by a have no effect on the memory referred to by b.

This does raise the question of managing memory, which is usually abstracted away from the user. Each piece of memory has a 'reference count'

In [4]:
c = [5,6,7]
print(sys.getrefcount(c))  # this will count the actual ref, c , as well as a temp arg ref under the hood.
d = c  # assign another reference
print(sys.getrefcount(c))  # the reference count increases by 1

2
3


In [5]:
# we can directly see the memory each variable refers to, and they are indeed the same
print(hex(id(c)))
print(hex(id(d)))

0x7f3eb72b9300
0x7f3eb72b9300


In [6]:
# updating d to a new object means it now refers to a different piece of memory; 
# the memory pointed by c has its reference count go down by 1
d = [8,9,10]
print(hex(id(c)))  # same as before
print(hex(id(d)))  # a new address
print(sys.getrefcount(c))  # ref count has gone down by 1

0x7f3eb72b9300
0x7f3eb72af7c0
2


Under the hood, Python keeps a reference count of all pieces of memory allocated with the Python VM. For any piece of memory where the reference count reaches 0, the __garbage collecter__ will automatically free up the memory. These are actions absracted away form the user.

As a side note, when memory does not get released, due to reference counts not reaching zero (usually due to coding error, and keeping variables/objects/references around), the Python VM running the code experiences a __memory leak__ .

## The involved answer

Python tries to handle all the memory management for you. The other extreme is C++98. References, as opposed to variables or objects, are declared explicitly. Since everything is declared explicitly, and the code directly translates to the memory outlay of the compiled code at runtime, C++98 is considered a 'strongly typed language'.

An additional object, called a __pointer__ is also used, which points to a specific piece of memory. It gives a method of navigating around the memory, by incrementing or decrementing the pointer (pointers don't appear in Python, so we will not go into them here).

In [7]:
%%script false  --no-raise-error  
// C++ code; this will not run in Jupyter
int i = 3;  // this variable is defined on the stack
int &ref = i;  // A reference (or alias) for i
int *ptr = &i;  // A pointer to variable i (or stores address of i)

In C++, memory can be allocated, and the memory will belong on the __heap__. In Python, since everything is a reference, implicitly all memory used is on the __heap__.

In [8]:
%%script false  --no-raise-error
// C++ code; this will not run in Jupyter
// These three lines are effectively what is happening in Python when running "ref = 3"
int *ptr = new int;  // initialise a pointer and allocate some memory to it
int &ref = *ptr;  // assign a reference to the pointer and allocated memory
ref = 3;  // assign the value

In the same way that memory is allocated manually, so must the memory be free-ed, to avoid memory leaks

In [9]:
%%script false  --no-raise-error
// C++ code; this will not run in Jupyter
int *ptr = new int;  // initialise a pointer and allocate some memory to it
int &ref = *ptr;  // assign a reference to the pointer and allocated memory
delete(ptr)  // releases the memory. Assigning anything to ref or ptr now will trash memory (e.g. SEGFAULT)!!!

This "free" command is what the garbage collector will be running during garbage collection for reference counts that reach zero. Whilst C++98 gives you the freedom to play directly with memory, it gives a lot of rope to hang yourself with, and yields an intense debugging experience working out how memory is being manipulated incorrectly. Common problems are references and pointers referring to incorrect, or out of date memory, or pointing to a NULL pointer (a default 'zero' value that does not point to any real memory).

## The broken answer
Java is an intermediate example, and has more in common with Python. Java has a garbage collector, and everything in Java is a reference. Garbage collection in a JVM is handled in the same way as a Python VM; the grabage collector will free any memory whose reference count reaches 0. The syntax for creating variables and memory is similar to C++98, and as such, Java is often considered a 'statically typed language'. The main difference in syntax is that Java, as a language, does not have pointers.

In [10]:
%%script false  --no-raise-error
// Java code; this will not run in Jupyter
Integer num1 = 10;
Integer num2 = new Integer();
Integer num3 = null;

Despite not having pointers, the language supports a "null" type. Since, in Java, all the objects are references, this means that a null pointer is associate to the reference. Calling on num3 in Java code will, at runtime, yield a NullPointerException.

The syntax of Java gives the impression the language is statically typed. However, after compilation and at runtime in the JVM, all objects are references open to NullPointerExceptions, and so behave more like dynamically typed objects.