# Notebook 1 - Measuring resource usage and profiling of python code
----------------------------------------------


## Table of Content <a id='toc'></a>

0. **[Introduction - meet the code](#0)**  
   <br>
   
1. **[Timing](#2)**  
    1.1 [Timing a single object](#1.1)  
    1.2 [Timing a set of lines](#1.2)  
    1.3 [Profiling](#1.3)  
    <br>
    
2. **[Measuring RAM usage](#3)**  
    2.1 [Line-by-line memory](#2.1)  
    2.2 [Time-based memory usage](#2.2)  
    <br>


<br>
<br>

## Introduction - meet the code <a id='0'></a>
---------------------------------------

The first step of any code optimization process should be measuring what your code is doing, in order to pinpoint where your effort should be focused.

In [None]:
# Load the IPython `autoreload` extension.
%load_ext autoreload
%autoreload 2

<br>

In this notebook we will mostly focus on a simple function which computes pairwise distances between a set of vectors (rows of a 2-dimensional matrix), a very classical operation present in many data analysis methods.

In [None]:
def pairwise_distance(x):
    """Compute pairwise Euclidean distances between a rows of the input matrix `x`.
    
    Arguments:
        x: a 2-dimensional numpy array (matrix) of numbers, or any nested sequence
           of sequences that all have the same length.
    """
    
    # Create a square matrix whose size is the number of vectors (rows) in the
    # input matrix.
    # This matrix will be used to store the Euclidean distances between each pair
    # of vectors (rows of the matrix).
    num_vectors = len(x)          # Number of rows of the input matrix.
    num_measurements = len(x[0])  # Number of columns of the input matrix.
    distance_matrix = [[0]*num_vectors for x in range(num_vectors)]
    
    # Loop over all possible combinations of vectors (rows of the input matrix).
    for i in range(num_vectors):
        for j in range(num_vectors):
            
            # Compute the squared distances between all elements of vectors (rows) "i" and "j".
            d = []
            for k in range(num_measurements):
                d.append((x[i][k] - x[j][k]) ** 2)
            
            # Euclidean distance between vectors (rows) "i" and "j". This is
            # computed as the square root of the sum of squared distances between
            # individual elements of the vectors.
            distance_matrix[i][j] = sum(d) ** 0.5

    return(distance_matrix)

<br>

As a test dataset, let's generate a 200 x 100 matrix filled with random data.

In [None]:
import numpy as np

num_vector = 200    # Number of rows.
num_measures = 100  # Number of columns.

data = np.random.uniform(size=(num_vector, num_measures))
print(data)
print(data.shape)

<br>

Let's also generate some random nucleotide data that we will need for another example, later on.
We store these nucleotide sequences in files that we create on-disk: `data/large_file.fas` and `data/medium_file.fas`.  
Notes:
* The cell can take 1-2 minutes to run.
* **`%%time`** is an
  **[IPython "magic" command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#)**
  that measures and displays the time needed to run a call.  
  More specifically, it will give the following values:
  * CPU time: time during which the CPU was in use.
  * Wall time: actual time it took to run the cell (includes things such as time spent reading/writing to disk).

In [None]:
%%time

import numpy as np
import os

# Create a "data" directory, if not already present.
if not os.path.isdir("data"):
    os.mkdir("data")

# 1'000'000 random sequences of 500 nucleotides.
with open("data/large_file.fas", "w") as OUT:
    for i in range(1000000):
        print(">seq{}".format(i), file=OUT)
        s = "".join(np.random.choice(list("ATGC"), size=500))
        print(s, file=OUT)

# 500 random sequences of 500 nucleotides.
with open("data/medium_file.fas", "w") as OUT :
    for i in range(500):
        print(">seq{}".format(i), file=OUT)
        s = "".join(np.random.choice(list("ATGC"), size=500))
        print(s, file=OUT)


<br>
<br>

[Back to ToC](#toc)

# 1. Measuring time usage <a id='2'></a>


## 1.1 timing a single object <a id='1.1'></a>

On you terminal, you may measure up the time taken by a python script execution using:

* On **Linux** and **Mac OS**.
```sh
time python my_script.py
```

* On **Windows**, the following should work when using the Windows PowerShell (but not the old windows shell).
```sh
Measure-Command {start-process command-to-benchmark -Wait}
```

<br>

Or, by using **[IPython "magic" commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html#)**:
* **`%time`**: measures the time to run a line of code.
* **`%%time`**: measures the time to run an entire cell.

<div class="alert alert-block alert-warning">
    
> **Warning:** IPython magic commands only work in Jupyter-Notebook (or other IPython compatible shells).
  They will not work if you try to run them in a classic python interpreter.

<div>
    

**Examples:**
* Measuring execution time of **a single line**:

In [None]:
%time D = pairwise_distance(data)

* Measuring execution time of **an entire Jupyter cell**:

In [None]:
%%time
# Applies on the whole cell.

D = pairwise_distance(data)

<br>

That is all nice, but **there is always a bit a variability between runs**, which becomes very apparent on lines (or cells) that have a smaller execution time.  
* To illustrate this, let's try to run the `pairwise_distance()` on a small 100 x 10 matrix.

In [None]:
small_data = np.random.uniform(size=(100, 10))

%time D = pairwise_distance(small_data)

<br>

To solve this problem, we have the **`timeit`** module:

```sh
python -m timeit myScript.py
```

Or the equivalent magic command:
* **`%timeit`**: benchmarks the time needed to execute a line of code. `%timeit` carries-out multiple
  replicates of the line to benchmark and averages the results.
* By default, `timeit` will perform a number of replicates that it deems *optimal*, but
  the number of **repeats** and **loops** can also be manually specified:
  * **`-n`:** number of loops, i.e. the number of times the code is run in a given repeat/run.
    The benchmark time is then averaged over all `n` loops.
  * **`-r`:** number of runs/repeats. The benchmark is repeated `r` times and only the best run/repeat is kept.
    This allows to get rid of runs that might have been slow for some reason external to the code (e.g.
    the machine was busy doing something else at the same time).
  * Example: `%timeit -n 2 -r 10`, 2 loops and 10 repeats.  

<br>

* **`%%timeit`** is essentially the same as `%timeit`, but applies to an entire cell.

In [None]:
%timeit D = pairwise_distance(small_data)

<br>

Using the **`-r`** and **`-n`** options, the number of runs and loops to perform can be manually specified.
* `%timeit` will perform `-r` **repeats** of `-n` loops each. The repeat with the best execution time is kept.

In [None]:
%timeit -n 2 -r 10 D = pairwise_distance(small_data)

<div class="alert alert-block alert-success">
    
**Question:** why does `%timeit` take the "best" out of `r` loops (rather than the average) ?

<div>

<br>

Ok, already this is nice, we will definitely be using this to compare together different implementations.  
For example, I have rewritten the function using **`numpy`**:

In [None]:
# Here is the old version for comparison:
def pairwise_distance(X):

    num_vectors = len(X)
    num_measurements = len(X[0])
    D = [[0]*num_vectors for x in range(num_vectors)]
    
    for i in range(num_vectors):
        for j in range(num_vectors):
            d = []
            for k in range(num_measurements):
                d.append((X[i][k] - X[j][k]) ** 2)
            
            D[i][j] = sum(d) ** 0.5
    return(D)

def pairwise_distance_numpy(X):

    num_vectors = X.shape[0]
    num_measurements = X.shape[1] 
    D = np.empty((num_vectors, num_vectors), dtype=np.float64)
    
    for i in range(num_vectors):
        for j in range(num_vectors):
            d = np.square(np.subtract(X[i], X[j]))
            D[i, j] = np.sqrt(np.sum(d))
    return(D)

<br>

**Let's run a benchmark** on these 2 implementations using a **100 x 100 matrix of random values**.

In [None]:
data = np.random.uniform(size=(100, 100))

print("native python:")
%timeit -n 5 -r 3 D = pairwise_distance(data)

print("numpy:")
%timeit -n 5 -r 3 D = pairwise_distance_numpy(data)

Nice! 

**Trick**: adding the option **`-o`** to the `%timeit` command allows to save the outputs to a variable - `timeit_res` in the example below:

In [None]:
timeit_res = %timeit -n 5 -r 3 -o D = pairwise_distance_numpy(data)
print("average:",timeit_res.average , "standard-dev", timeit_res.stdev )

That is a neat trick to help us investigate how execution time evolves with the data size:

In [None]:
%%time
times = []
N = []

num_measures = 10
num_vector_list = range(10, 200, 10)

for num_vector in num_vector_list:
    data = np.random.uniform(size=(num_vector,num_measures))
    
    # Here we also use the "-q" option to suppress the text output.
    timeit_res = %timeit -n 2 -r 7 -o -q D = pairwise_distance_numpy(data)
    times.extend(timeit_res.timings)
    N.extend( [num_vector]*len(timeit_res.timings) )


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

fig = plt.figure(figsize = (8, 8))
ax = fig.add_subplot(111)

sns.lineplot(x = N, y = times, ax = ax)

ax.set_aspect(1.0/ax.get_data_ratio(), adjustable="box")

plt.xlabel("number of vectors", size=18)
plt.ylabel("time(s)", size=18)
plt.show()

<div class="alert alert-block alert-success">

**Question:** What do you think about the shape of this curve? Was this expected?

<div>

<br>

[Back to ToC](#toc)

## 1.2 timing a set of lines<a id='1.2'></a>


Sometimes you will want to measure the execution time of a particular step in you code, but it may not be easy to isolate it for a usage with `time` or `timeit`.

For instance, consider the following code which reads a [FASTA file](https://en.wikipedia.org/wiki/FASTA_format) and computes the [GC content](https://en.wikipedia.org/wiki/GC-content) of each sequence:

In [None]:
%%time
GC=[]

with open("data/large_file.fas", "r") as IN :
    for l in IN:
        if not l.startswith(">"):
            GC.append((l.count("C") + l.count("G")) * 100 / len(l.strip()))


<br>

The codes takes about 4.5s to complete, but how much of this is file reading, and how much is GC content computation ?
* Here we do not really want to re-write the code to have neatly separated steps to apply `%time` on.
  Additionally, we may not be able store all the sequences in memory.
* In these situation, the **`time()`** function from module **`time`** (so, **`time.time()`**) is quite useful.

In [None]:
import time
time.time()

It returns the time (in second) since the Epoch, which is 00:00:00 on 1 January 1970.

> **Epoch**, as defined in [wikipedia](https://en.wikipedia.org/wiki/Epoch_(computing)):  
> In computing, an epoch is a date and time from which a computer measures system time. Most computer systems determine time as a number representing the seconds removed from particular arbitrary date and time. For instance, Unix and POSIX measure time as the number of seconds that have passed since Thursday 1 January 1970 00:00:00 UT, a point in time known as the Unix epoch.  
Windows NT systems, up to and including Windows 11 and Windows Server 2022, measure time as the number of 100-nanosecond intervals that have passed since 1 January 1601 00:00:00 UTC, making that point in time the epoch for those systems.

Applied to our code:

In [None]:
GC_time = 0

start = time.time()  # Record time at the start of the block to benchmark.
with open("data/large_file.fas", "r") as IN:
    GC = []
    for l in IN:
        if not l.startswith(">"):
            t1 = time.time()   # Record time at the start of the block to benchmark.
            GC.append((l.count("C") + l.count("G")) * 100 / len(l.strip()))
            t2 = time.time()   # Record time at the end of the block to benchmark.
            GC_time += t2 - t1 # Compute elapsed time.

stop = time.time()         # Record time at the end of the block to benchmark.
total_time = stop - start  # Compute elapsed time.

print("Total: {:.3f}s".format(total_time))
print("  GC%: {:.3f}s".format(GC_time))
print(" read: {:.3f}s".format(total_time - GC_time))

<br> 

[back to ToC](#toc)

## 1.3 profiling<a id='1.3'></a>


Most of the time your code is more complex than a single function, and before optimizing you first want to see which function you should optimize. That is when **profiling** comes in handy.

For instance, consider the following code, which reads sequences from a FASTA file, sort them by GC content, then computes a matrix of pairwise distance between all sequences and finally writes this matrix to a file.

In [None]:

def read_fasta(filename):
    """Reads a FASTA file and returns its sequences as a dictionary."""
    Dseq={}
    curseq = ""
    cur = ""
    with open(filename, "r") as IN:
        for l in IN:
            if l.startswith(">"):
                if cur != "":
                    Dseq[cur] = curseq
                cur = l[1:].strip()
                curseq = ""
            else:
                curseq += l.strip()
                
        if cur != "":
            Dseq[cur] = curseq
            
    return Dseq
            

def computeGC( seq ):
    """Takes a sequence of nucleotides (str) and compute its
    GC percentage (float).
    """
    gc = 0
    for i in range(len(seq)):
        if seq[i] in "GC":
            gc += 1
    return 100 * gc / len(seq)

def computeGC_dict(Dseq):
    """Takes a dictionary containing sequences as values 
    and compute a dictionary containing their GC%.
    """
    Dgc = {}
    for k in Dseq:
        Dgc[k] = computeGC(Dseq[k])
    return Dgc

def compute_sequence_similarity(seqA  ,seqB):
    """Compute similarity between 2 sequence as the fraction
    of position where they have the same value.
    """
    l = len(seqA)
    similar = 0
    for i in range(l):
        if seqA[i] == seqB[i]:
            similar += 1
    return similar / l



def main_script(input_filename, output_filename):
    
    # Step 1: read fasta
    Dseq = read_fasta(input_filename)
    
    # Step 2: compute GC%
    Dgc = computeGC_dict( Dseq )
    
    # Step 3: sort by GC%.
    ordered_seq = sorted(Dgc.keys(), key = lambda x: Dgc[x])
    
    # Step 4: compute pairwise distance matrix.
    sim = np.zeros((len(Dseq), len(Dseq)))
    for i, s1 in enumerate(ordered_seq):
        for j, s2 in enumerate(ordered_seq):
            sim[i, j] = compute_sequence_similarity(Dseq[s1], Dseq[s2])

    # Step 5: write the matrix.
    with open(output_filename, "w") as OUT:
        print(",".join(ordered_seq), file=OUT)
        for i in range(len(ordered_seq)):
            print( *(sim[i]), sep=",", file=OUT)

    
    

Now, if you have the eye for it, it looks like most of these function could be rewritten to be faster.

For instance, the function to compute the GC%:

In [None]:
def computeGC_better( seq ):
    """takes a sequence (str) and compute it GC% (float)."""
    return 100 * (seq.count("G") + seq.count("C")) / len(seq)

seq = "ATGC" * 5000
%timeit -n 100 -r 10 computeGC(seq)
%timeit -n 100 -r 10 computeGC_better(seq)

That is a commendable speedup!  
But, considering that coding time is a finite resource, where should we start? Where is out effort better spent?

We recommend using **[cProfile](https://docs.python.org/3/library/profile.html)**.  
In the terminal :

```sh
python -m cProfile -o profile.log -s cumtime myScript.py
```

will execute the script, and profile time usage of every functions
* `-o` : output file for the profiling log.
* `-s cumtime` : to sort by cumulative time spent in a single function.

Then to interpret the output log from cProfile, we recommend the
**[snakeviz](https://jiffyclub.github.io/snakeviz)** library.

Otherwise, in Jupyter:

In [None]:
# IMPORTANT: keep single quotes around the file names/path arguments, otherwise
#            it won't run on Windows.

%prun -l 30 -s cumtime  main_script('data/medium_file.fas', 'test.out')
# The %prun magic command activate profiling
#  -l 30: limits the report to 30 lines
#  -s cumtime: sort by decreasing cumtime


The columns correspond to:
 * `ncalls`: for the **number of calls**.
 * `tottime`: for the **total time** spent in the given function (excluding time spent in calls to sub-functions).
 * `percall`: is the quotient of tottime divided by ncalls.
 * `cumtime`: is the **cumulative time spent in this and all subfunctions** (from invocation till exit).
   This figure is accurate even for recursive functions.
 * `percall`: is the quotient of cumtime divided by primitive calls.


<br>

<div class="alert alert-block alert-success">

### Micro-Exercise 1

* Look at the profile. Where should optimization efforts go first?
  What would be the effect of using our better implementation of the GC% computing function?

<div>

<br>
<br>

We now have the tools to help us diagnose which part of our code takes the most time. But, before we move on to optimization, let's see what would happen if we launched our code on a larger dataset:


In [None]:
%time main_script('data/large_file.fas', 'test.out')

Here the problem is not the execution time, but the RAM (random access memory) usage. 

While time is a somewhat flexible constraint (it is always possible to wait a bit longer), memory is a hard limit: you either make your code less memory-hungry, or you move to another computer...  
Let's focus on memory now.

<br>
<br>

[Back to ToC](#toc)


# 2. Measuring RAM usage <a id='3'></a>

## 2.1 Line-by-line memory <a id='2.1'></a>

To measure the memory imprint of your code - a nice tool is **[memory-profiler](https://pypi.org/project/memory-profiler)**

If you haven't already, you can install it with :

```python
!pip install --user memory_profiler
```

Basically, in your code you add a **decorator** to the function of interest:

In [None]:
from memory_profiler import profile

@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    time.sleep(1)
    del b
    return a

Then, you may either do it command-line style  :

```
python -m memory_profiler example.py
```

or prefer jupyter-magic:

In [None]:
%load_ext memory_profiler

In [None]:
%memit _= my_func()

It gives us a result: `peak memory: 210.19 MiB, increment: 152.58 MiB`
            
But it does not like that our code is in the same notebook. Let's have it in another script :

In [None]:
from utils import my_func

_ = my_func()

Now, that is quite nice: we have a run down of the RAM usage, line-by-line.

Let's see how that works for our `pairwise_distance` function:

In [None]:
%%writefile tmp.py

import numpy as np
from memory_profiler import profile

# The memory increments are fairly small here, so we set the precision higher.
@profile(precision=3)
def pairwise_distance_profile(X):

    num_vectors = len(X)
    num_measurements = len(X[0])
    D = [[0]*num_vectors for x in range(num_vectors)]
    
    for i in range(num_vectors):
        for j in range(num_vectors):
            d = []
            for k in range(num_measurements):
                d.append((X[i][k] - X[j][k]) ** 2)
            
            D[i][j] = sum(d) ** 0.5
    return(D)


# The precision parameter does not work in jupyter notebooks :-( 
# so I integrate the test to the script.
num_vector = 100
num_measures = 10

data = np.random.uniform(size=(num_vector,num_measures))
_ = pairwise_distance_profile(data) 

In [None]:
%%time
!python tmp.py

> NB : the precision parameter does not work in jupyter notebooks, so here the tests are directly integrated to the script.

That is super neat, but we have to note that this also took a while : tracking all of this memory creates a lot of overhead.

Compare with the version without `@profile`:

In [None]:
num_vector = 100
num_measures = 10
data = np.random.uniform(size=(num_vector,num_measures))

%time _=pairwise_distance(data)

That is a big slow down: x100. 
For scripts with longer execution times it can get fairly prohibitive to profile the memory in such a fine way.

Let's explore an alternative with less overhead.

<br>

[Back to ToC](#toc)

## 2.2 time-based memory usage<a id='2.2'></a>

[**mprof**](https://pypi.org/project/memory-profiler) is an executable which let's you monitor any script memory usage over-time (you can install mprof with `pip` or `conda`).

It comes packaged with `memory_profiler` and allows some nice integration: 
it uses `@profile` to annotate its report and plot.

**However,** this only works if you **don't import memory_profiler in the script**... otherwise it defaults back the line-by-line profiling.

On the previous script:

In [None]:
%%writefile tmp.py

import numpy as np
# from memory_profiler import profile --> do not import this

@profile(precision=3)
def pairwise_distance_profile(X):

    num_vectors = len(X)
    num_measurements = len(X[0])
    D = [[0]*num_vectors for x in range(num_vectors)]
    
    for i in range(num_vectors):
        for j in range(num_vectors):
            d = []
            for k in range(num_measurements):
                d.append( ( X[i][k] - X[j][k] )**2 )
            
            D[i][j] = sum(d) **0.5
    return(D)


num_vector = 500
num_measures = 10

data = np.random.uniform(size=(num_vector,num_measures))

_=pairwise_distance_profile(data)
 

Then we run this script with mprof:

In [None]:
!time mprof run tmp.py


This generated a file with a title looking like `mprofile_20220711084326.dat`.

We can generate a plot of this profile:

In [None]:
!mprof plot -o tmp.png

Finally, we load this image in the jupyter notebook:

In [None]:
from IPython.display import Image
Image("tmp.png")

Here we clearly see the initial memory loading (linked to the importation of libraries and generation of the data), and the function of interest is clearly marked. 
We can see it leads to an increase of RAM usage of about 8MiB.

Also note that the overhead is much lower:

In [None]:
num_vector = 500
num_measures = 10
data = np.random.uniform(size=(num_vector,num_measures))

%time _=pairwise_distance(data)

1.5s, vs. 2.3s with `mprof`: much more reasonable.

* `mprof` is very useful to explore and pinpoint memory spike, especially since it
  **works with all executables** and not only python scripts.
* You can increase the granularity of the report using the `--interval` parameter (default: 0.1s).
* `mprof` also has a mode designed to monitor executables using multiprocessing.



<br>

[Back to ToC](#toc)

## 2.3 getting the size of a single object <a id='2.3'></a>

Last but not least, when you know your code enough you can often point to the precise object who represents the majority of RAM usage in your code.

In the case of the script for distance computation between sequences in a FASTA file, the error message pinpointed the problematic line:

```python
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
File <timed eval>:1, in <module>

Input In [59], in main_script(input_filename, output_filename)
     57 ordered_seq = sorted(Dgc.keys() , key=lambda x: Dgc[x])
     59 # step 4 : compute pairwise distance matrix
---> 60 sim = np.zeros((len(Dseq), len(Dseq)))
     61 for i, s1 in enumerate(ordered_seq):
     62     for j, s2 in enumerate(ordered_seq):

MemoryError: Unable to allocate 7.28 TiB for an array with shape (1000000, 1000000) and data type float64
```

So here, no need to use advanced tools : the problem is this square matrix.

We can investigate the memory needed by an object in memory with `sys.getsizeof`.

In [None]:
import sys

a = 0.5 # a simple float
print("size of a float:" , sys.getsizeof(a))

b ="abcdef" # a simple string
print("size of a str:" , sys.getsizeof(b))


The reported size is in bytes. To get kib: divide by 1024. 

To get MiB, divide by 1024*1024

In [None]:
N = 1000
c = np.zeros((N,N)) #NxN matrix 
print("size of a {}x{} matrix: {:.2f} MiB".format(N,N, sys.getsizeof(c)/(1024*1024)) )

A little *caveat* though. Consider the following code: 

In [None]:
import sys

a = 0.5 # a simple float
print("size of a float:" , sys.getsizeof(a))

b = [np.random.random() for i in range(10)]
print("size of a list of 10 floats:" , sys.getsizeof(b))

See anything strange? 

If a single `float` is 24bytes, then how can a list of 10 floats be less than 10*24=240 bytes ?

<br>

This is because `getsizeof` only account for direct memory does not go follow references to objects. 
In practice, that means it struggles with containers.

The official documentation point to this function if you want to get the total size of an object, including everything it contains or refers to:

In [None]:
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}
    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)

In [None]:
print("total size of a list of 10 floats:" , total_size(b))

So, `getsizeof` returned 184, which is the size of the list, add to this the 10 float : 10*24, and you get
$184+24*10=424$.
It works!

Note that for numpy array this does not change anything (because numpy arrays do not access their data by reference).

In [None]:
print("array getsizeof :", sys.getsizeof(c))
print("array total_size:", total_size(c))

<br>

<div class="alert alert-block alert-success">

### Micro-Exercise 2

* Find out which is the largest square matrix your RAM can reasonably accommodate.
    (NB: you can do this by testing it, or deducing it from the size of smaller objects)
* **Additional task (if you have time):** how could we modify the `main_script` to make it less memory hungry?

<div>


<br>
<br>
<br>

[Back to ToC](#toc)

# Additional material
------------------------------

This section will not be covered in the course, but you can read it on your own if you are interested.

## Annex 1 - Kmeans implementation profiling

Imagine you have a script, implementing a Kmeans algorithm. 
Here are the functions which look like the best candidates for optimization:
* `computeDistanceToCentroid`: compute the distance between a point and a centroid.
* `computeNearestCentroid`: compute which centroid is the closest to each point (actually
  calls `computeDistanceToCentroid`, but also possess some other potentially costly computations).
* `computeCentroids`: computes the position of the centroid of a points with a given cluster assignment.

Are they really the best candidate ? which one should we go for first ?


In [None]:
# Generating some random data.

def generateCluster(n, means, sds):
    P = np.random.randn(len(means), n)
    for i in range(len(means)):
        P[i,] = P[i,] * sds[i] + means[i]
    
    return P


clusterSizes = [4000, 2000, 4000, 4000, 2000]
clusterMeans = [ [0, -2], [3, 3], [-1, 3], [-5, 0], [5,-1]]
clusterSDs = [[0.5,1], [1,0.5], [0.5,0.5], [2,1], [1,1]]
C = []
A = []
for i in range(len(clusterSizes)):
    C.append(generateCluster(clusterSizes[i], clusterMeans[i], clusterSDs[i]))
    A += [i]*clusterSizes[i]
Points = np.concatenate(C, axis=1)
realAssignment = np.array(A)


<br>

Run the memory profiling with `%pruns` IPython magic command:
* `-l 30`: limits the report to 30 lines.
* `-s cumtime`: sort output lines by decreasing cumtime (cumulative time).

In [None]:
from Kmeans import Kmeans

# Performing Kmeans.
k = 5
%prun -l 30 -s cumtime  kmeanAssignment = Kmeans(Points, k, maxNbRounds=1000)