
<hr style="height: 2px; background: linear-gradient(to right, #E31B1D 50%, #00A4DD 50%);">

<div style="display: flex;">
    
<div style="width: 20%; text-align: left;">
    <img src="logos/hpc_logo.png" alt="Image Description" width="100px">
</div>

<div style="width: 60%; text-align: center;">
    <strong><center><font size = "6">Python profiling</font></center></strong>
    <br>
    <strong><center><font size = "4">Python + HPC</font></center></strong>
</div>

<div style="width: 20%; text-align: right;display: flex; justify-content: center;align-items: center;">
    <img src="logos/unilu_logo.png" alt="Image Description" width="100px">
</div>
    
</div>

<hr style="height: 2px; background: linear-gradient(to right, #E31B1D 50%, #00A4DD 50%);">

By: **Oscar J. CASTRO-LOPEZ** (oscar.castro@uni.lu)

**University of Luxembourg | HPC | PCOG**

<hr>

## Table of Contents

0. [Workshop Overview](#workshopoverview)
1. [Introduction](#introduction)
2. [Use Case](#usecase)
2. [Timing Executions](#timingexecutions)
3. [Profiling with prun](#profilingprun)
4. [Line profiling with lprun](#profilinglprun)
5. [Memory profiling](#memoryprofiling)
6. [Conclusion](#conclusion)


## 0. Workshop Overview <a name="workshopoverview"></a>

Welcome to the **Python + HPC** workshop. In this hands-on session, we will dive into profiling Python code to identify performance bottlenecks and optimize your applications. By the end of this workshop, you will be equipped with essential profiling tools and techniques.


### Prerequisites 

Before we begin, please make sure you have the following:

- A basic understanding of Python programming.
- Have Jupyter Notebook installed and configured on your system (_better if already installed in HPC Node_).
- Familiarity with Jupyter Notebook. 

### Agenda

1. **Introduction to Profiling**
2. **Timing executions**
3. **Code profiling %prun**
4. **Break**
5. **Line Profiling %lprun**
6. **Memory Profiling %mprun**
7. **Q&A and Closing Remarks**

### Workshop Key Goals
The primary objectives of this workshop are:

- To provide a basic understanding of Python code profiling and timing.
- To equip you with practical skills in profiling Python code.
- To explore a use case and discover bottlenecks.

### Getting Started

To get started with this workshop, follow these steps:

1. Clone or download the workshop materials from the [GitHub repository](https://github.com/ULHPC/python-school).
2. Open a terminal and navigate to the workshop directory.
3. Open this notebook (`1_Python_profiling.ipynb`) in your browser.

Let's get started!

<hr style="height: 2px; background: linear-gradient(to right, #E31B1D 50%, #00A4DD 50%);">

# 1.Introduction <a name="introduction"></a>

<center><strong><font size=5 style="color: red;">Why Profiling?</size></strong></center>

- **Identify Bottlenecks**: Profiling helps you identify which parts of your code are consuming the most time or other resources. This allows you to focus your optimization efforts where they will have the greatest impact.

- **Data-Driven Optimization**: Profiling provides concrete data about your code's performance. This data helps you make informed decisions about where to optimize based on actual measurements rather than guessing or intuition.

- **Prevent Premature Optimization**: Profiling helps you avoid the common pitfall of premature optimization, which can lead to code complexity and reduced maintainability. By profiling first, you can ensure that you're optimizing areas of the code that genuinely need it.

- **Prioritize Efforts**: When dealing with limited resources (time, budget, etc.), profiling helps you prioritize which parts of your code to optimize. You can focus on the critical sections that have the most significant impact on overall performance.

- **Avoid Over-Engineering**: Profiling helps you strike a balance between performance and readability/maintainability. Without profiling, you might over-engineer your code for performance, making it more complex than necessary.

- **Benchmark Improvements**: After making changes to your code, profiling allows you to measure the actual impact of those changes. This helps you verify that your optimizations are effective and didn't introduce new issues.

- **Continuous Improvement**: Profiling should be an ongoing process. As your codebase evolves, new bottlenecks may emerge, or the performance characteristics may change. Regular profiling ensures that your code continues to perform well over time.

- **Debugging**: Profilers often provide insights into unexpected behavior or errors in your code. You may discover unintended inefficiencies or even bugs that are only apparent when looking at performance data.

In summary, profiling is a crucial step in the optimization process. It provides a solid foundation for making informed decisions about how to improve your code's performance efficiently and effectively.

## 2. Use case KNN Classifier <a name="usecase"></a>

This implementation defines a KNNClassifier class with methods for fitting the model, calculating Euclidean distances, and making predictions. It uses a simple k-nearest neighbors approach to classify data points based on their proximity to training data.

**Note: Code given by CHATGPT**

For the purpose of testing the profiler it is not necessary to fully understand the logic behind the code. However, if more detail is required you can check: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

### Install pre-requisites: Numpy, line_profiler, memory_profiler, matplotlib 

In [1]:
!pip install numpy line_profiler memory_profiler matplotlib

Collecting numpy
  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 10.6 MB/s            
[?25hCollecting line_profiler
  Downloading line_profiler-4.1.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (669 kB)
     |████████████████████████████████| 669 kB 124.5 MB/s            
[?25hCollecting memory_profiler
  Downloading memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Collecting matplotlib
  Downloading matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5 MB)
     |████████████████████████████████| 11.5 MB 126.6 MB/s            
[?25hCollecting psutil
  Downloading psutil-5.9.6-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (283 kB)
     |████████████████████████████████| 283 kB 128.6 MB/s            
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.1-cp36-cp3

In [3]:
import numpy as np

class KNNClassifier:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def euclidean_distance(self, x1, x2):
        diff = (x1 - x2)
        sqr_diff = diff ** 2
        sqr_diff_sum = np.sum(sqr_diff)
        return np.sqrt(sqr_diff_sum)

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        # Calculate distances from the input point to all training points
        distances = [self.euclidean_distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Return the most common class label among the k nearest neighbors
        most_common = np.bincount(k_nearest_labels).argmax()
        return most_common

The following code cell shows a basic example of using the KNNClassifier class.

In [4]:
# Create train data
X_train = np.array([[1, 2], [2, 3], [3, 4], [5, 6]])
y_train = np.array([0, 0, 1, 1])

# Create an object and pass data
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create test data
X_test = np.array([[2, 3], [4, 5]])

# Generate Predictions
predictions = knn.predict(X_test)
# Print result
print(predictions)  # Output: [0 1]

[0 1]


Create a bigger dataset with random values.

In [5]:
# Example with random data
rows = 100
cols = 50
np.random.seed(699)
X_train = np.random.rand(rows*cols).reshape((rows,cols))
y_train = np.random.randint(2, size=rows)
print(f'X_train shape {X_train.shape} - y_train shape {y_train.shape}')

X_train shape (100, 50) - y_train shape (100,)


Use the KNNClassifier and check how many are correct.

In [6]:
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create random indices to test
test_size = 10
X_test = np.random.randint(rows, size=test_size)

# Generate Predictions
predictions = knn.predict(X_train[X_test])
print(f'Prediction {predictions}')
print(f'Label      {y_train[X_test]}')
# Calculate the number of equal elements
print(f'correct {np.sum(y_train[X_test] == predictions)}')

Prediction [0 0 1 0 0 0 1 1 1 0]
Label      [0 1 1 0 1 0 1 1 1 0]
correct 8


Create an even bigger dataset with random values.

In [7]:
# Example with bigger-random data
rows = 1000
cols = 50
np.random.seed(699)
X_train = np.random.rand(rows*cols).reshape((rows,cols))
y_train = np.random.randint(2, size=rows)
print(f'X_train shape {X_train.shape} - y_train shape {y_train.shape}')

X_train shape (1000, 50) - y_train shape (1000,)


Create the KNNClassifier.

In [8]:
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create random indices to test
test_size = 100
X_test = np.random.randint(rows, size=test_size)

## 3. Timing executions <a name="timingexecutions"></a>

Take the elapsed time of generating predictions.

### Taking time inside the code

There are several ways to measure the execution time in Python. For example:

- `time.time()` function: measure the total time elapsed to execute the script in seconds.
    - This value is often referred to as "wall-clock time" or "real time."
    - It includes the time spent in sleeping or waiting for I/O operations, making it suitable for measuring the total elapsed time for a program or a specific task.
- `time.process_time()`: returns the current CPU time used by the current process in seconds, as a floating-point number. 
    - This value represents the amount of CPU time consumed by your program and excludes time spent in sleep or waiting for I/O.

In [20]:
import time
import statistics

# start_time = time.time()
# time.sleep(2.4)
# end_time = time.time()
# elapsed_time = end_time - start_time
# print(f"Elapsed time: {elapsed_time} seconds")

num_iterations = 10  # Change this to the desired number of iterations
elapsed_times = {}

for i in range(num_iterations):
    start_time = time.time()
    time.sleep(2.4)
    end_time = time.time()
    elapsed_time = end_time - start_time
    elapsed_times[i] = elapsed_time

# Calculate mean and standard deviation
mean_time = statistics.mean(elapsed_times.values())
std_deviation = statistics.stdev(elapsed_times.values())

# print("Elapsed times:", elapsed_times)
print(f"Mean elapsed time: {mean_time} seconds")
print(f"Standard deviation of elapsed time: {std_deviation} seconds")

Mean elapsed time: 2.4021880865097045 seconds
Standard deviation of elapsed time: 0.00013346687532447528 seconds


In [10]:
import time
start_cpu_time = time.process_time()
time.sleep(2.4)
end_cpu_time = time.process_time()
elapsed_cpu_time = end_cpu_time - start_cpu_time
print(f"CPU time used: {elapsed_cpu_time} seconds")

CPU time used: 0.0011552599999999913 seconds


- **timeit module**: module in Python is a built-in library that provides a simple way to measure the execution time of small code snippets.
    - It has both a Command-Line Interface as well as a callable one.
    - Measures "wall-clock" time.
    - For more details: https://docs.python.org/3/library/timeit.html

In [11]:
import timeit

code_to_measure = """
result = sum(range(1000))
"""

time_taken = timeit.timeit(code_to_measure, number=10000)

print(f"Time taken: {time_taken} seconds")

Time taken: 0.1314708001445979 seconds


- **DateTime module**: measure the execution time in the hours-minutes-seconds format.
    - Measures wall time, total elapsed time.

In [13]:
import datetime
start_time = datetime.datetime.now()
time.sleep(63)
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time}")

Elapsed time: 0:01:03.063159


### Take the elapsed time of running a cell

`%time` is used to measure the execution time of a single statement or expression in a code cell.

In [14]:
%time predictions = knn.predict(X_train[X_test])

CPU times: user 741 ms, sys: 0 ns, total: 741 ms
Wall time: 742 ms


`%timeit`  is used to perform more comprehensive timing analysis. It runs the specified code cell multiple times and calculates statistics like the average, best, and worst execution times.

In [15]:
%timeit predictions = knn.predict(X_train[X_test])

730 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


For a cell with multiple statements use `%%` before the command:

In [16]:
%%timeit
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create random indices to test
test_size = 100
X_test = np.random.randint(rows, size=test_size)
predictions = knn.predict(X_train[X_test])

738 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


With only one repetion and one number:
- `-n` how many times to execute ‘statement’
- `-r` how many times to repeat the timer 

In [21]:
%%timeit -r 1 -n 1
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create random indices to test
test_size = 100
X_test = np.random.randint(rows, size=test_size)
predictions = knn.predict(X_train[X_test])

748 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


- Second (s)
- 
Millisecond (ms- 1 second = 1,000 millisecond
- Microsecond (&micro;s) - 1 second = 1,000,000 microsecond

Save the result of **timeit** with the command `%timeit -o my_code(args)`. For example:

In [22]:
predict_elapsed_time = %timeit -o knn.predict(X_train[X_test])

733 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [23]:
print('Best', predict_elapsed_time.best)
print('Average', predict_elapsed_time.average)
print('Standard Deviation', predict_elapsed_time.stdev)
print('Worst', predict_elapsed_time.worst)

Best 0.731269588926807
Average 0.733383246730747
Standard Deviation 0.001720338706617552
Worst 0.7364832749590278


## 4. Profiling with prun <a name="profilingprun"></a>

Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.


```bash
%prun my_function()
```
cProfile will run the function and collect data on how much time is spent in each function or method called within my_function.

Magic commands are special commands that can help you with running and analyzing data in your notebook. They add a special functionality that is not straight forward to achieve with python code or Jupyter notebook interface.

Magic commands are easy to spot within the code. They are either proceeded by % if they are on one line of code or by %% if they are written on several lines.

For more detail on magic commands: 

https://ipython.readthedocs.io/en/stable/interactive/magics.html


In [24]:
%prun predictions = knn.predict(X_train[X_test])

 

The output consist on the following columns:
- ncalls: number of times a function or method is called.
- tottime: total time spent in the function excluding time spent in subfunctions it calls (in seconds).
- percall: average time per call to the tottime of the function (tottime / ncalls).
- cumtime: cumulative time that indicates the total time spent in function including subfunctions.
- percall: average time percall to the cumtime of the function (cumtime / ncalls).
- filename:lineno(function): provides information about the function or method, indicates the file, line number, and name.

## 5. Line profiling with lprun <a name="profilinglprun"></a>

For a more detailed and granular analysis of code performance, as well as a clear and comprehensible report, you can employ the `lprun` profiler. Unlike some other profilers that provide high-level insights, `lprun` delves into the code at the line-by-line level, offering a meticulous examination of your code's execution. 

This can be particularly valuable when you need to pinpoint specific bottlenecks or areas for optimization within your codebase. By generating a line-by-line report, `lprun` allows you to identify exactly where the most time-consuming operations occur, aiding in the optimization process and enabling you to achieve optimal code performance.

The first step is to import the `line_profiler`. Then we need to call the profiler in the following way:

`%lprun -f function_to_profile code_to_run(arg)`


In [25]:
%load_ext line_profiler

In the following line we profile `knn.predict` function while we call the code `predictions = knn.predict(X_train[X_test])`.

In [31]:
%lprun -f knn.predict predictions = knn.predict(X_train[X_test])

The output of `%lprun` is the following:

- **Line #**: line number of the code being profiled.
- **Hits**: indicates how many times each line was executed.
- **Time**: shows the total time spent on the execution of the line.
- **Per Hit**: shows the average time, in milliseconds spent on each execution of the line.
- **% time**: shows the percentage of total execution time spent on each line.
- **Line Contents**: contains the actual code corresponding to the line being profiled.

If we want to inspect functions called inside `knn.predict(X_train[X_test])` then we can specify another function. 

For example in the following command we profile `knn._predict` while we call `knn.predict(X_train[X_test])`. With this approach we can profile all functions called inside our code to track bottlenecks.

In [27]:
%lprun -f knn._predict predictions = knn.predict(X_train[X_test])

In [33]:
%lprun -f knn.euclidean_distance predictions = knn.predict(X_train[X_test])

### Profiling with the command line
- Add @profile to the function you want to profile.
- Run `kernprof myfile.py`. This will generate an .prof file.
- Run `python -m line_profiler myfile.prof`.

In [29]:
!kernprof -l test_lprun.py

hello world
Wrote profile results to test_lprun.py.lprof
Inspect results with:
python -m line_profiler -rmt "test_lprun.py.lprof"


In [30]:
!python -m line_profiler test_lprun.py.lprof

Timer unit: 1e-06 s

Total time: 1.806e-05 s
File: test_lprun.py
Function: test at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           @profile
     2                                           def test():
     3         1          1.0      1.0      5.4      A = 1 * [100]
     4         1          0.3      0.3      1.9      B = 2 * [200]
     5         1          0.2      0.2      1.3      C = 3 * [300]
     6         1          0.3      0.3      1.5      D = 4 * [400]
     7         1         16.2     16.2     89.9      print('hello world')



## 6. Memory profiling with `memit` and `mprun` <a name="memoryprofiling"></a>

This package allows to profile memory usage in Python code on a line-by-line basis. It's particularly useful for identifying memory leaks or memory-hungry parts of the code.

First, load the package extension using the `%load_ext memory_profiler` magic command. 

In [34]:
%load_ext memory_profiler

The `memit` command measures the memory usage of a single statement.

In [35]:
%memit predictions = knn.predict(X_train[X_test])

peak memory: 64.83 MiB, increment: 0.07 MiB


The output of `memit` is limited to see the peak memory and increment. For a more detailed report line by line we must use `mprof`. However, we require that our code is stored in a file. In the following cell we use the magic command `%%file filename.py` at the beginning of the cell. That command saves in `filename.py` what is inside the cell.

In [36]:
%%file knn_demo.py

import numpy as np
from memory_profiler import profile

class KNNClassifier:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
        

    def euclidean_distance(self, x1, x2):
        diff = (x1 - x2)
        sqr_diff = diff ** 2
        sqr_diff_sum = np.sum(sqr_diff)
        return np.sqrt(sqr_diff_sum)

    
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)
    
    def _predict(self, x):
        # Calculate distances from the input point to all training points
        distances = [self.euclidean_distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Return the most common class label among the k nearest neighbors
        most_common = np.bincount(k_nearest_labels).argmax()
        return most_common

Overwriting knn_demo.py


After the file is saved, we need to import the library. Additionally, we instantiate the KNNClassifier and create a test vector before profiling the code

In [37]:
from knn_demo import KNNClassifier
knn = KNNClassifier(k=2)
knn.fit(X_train, y_train)

# Create random indices to test
test_size = 100
X_test = np.random.randint(rows, size=test_size)

Similar to the `lprun` syntax, to use `mprun` in a Jupyter notebook we need to use the command:

`%mprun -f function_to_profile code_to_run(arg)` 

In [38]:
%mprun -f knn._predict knn.predict(X_train[X_test])




In the `mprof` output, each row provides detailed information about memory usage and code execution. The columns in the output serve the following purposes:

1. Line Number: This is the first column and indicates the line number in the code where memory measurements were taken.

2. Mem Usage (MiB): The second column, represented in Mebibytes (MiB), displays the amount of memory used at each line of code. It shows how much memory is consumed or released during code execution.

3. Increment (MiB): In the third column, the increment indicates the change in memory usage compared to the previous line. Positive values represent memory allocation, while negative values indicate memory deallocation.

4. Occurrences: This fourth column reveals how many times a particular line of code was executed. It provides insights into code behavior and repetition.

5. Code Content: The fifth column contains the actual code present on the line.

Measurements are shown in MiB (Mebibytes) but you can convert it to MB (Megabytes) using the conversion factor (1 MiB ≈ 1.048576 MB).

We can also profile multiple functions at the same time:

In [39]:
%mprun -f knn.fit -f knn.predict -f knn._predict -f knn.euclidean_distance knn.predict(X_train[X_test])




<strong><font size=2 style="color: red;">Why the output shows MiB equal to 0?</size></strong>
- Probably because the size of the variable is too small.

Run the following profiling command only for the `euclidean_distance` function and increment the size of the input arrays.

In [40]:
%mprun -f knn.euclidean_distance knn.euclidean_distance(np.arange(cols*10000000), np.arange(cols*10000000))




### A "simple" code snippet to profile with `mprun` and different ways to use it

There are different ways to use `mprun`, we already shown one using Jupyter notebooks. We can also use `mprun` with the line command.

Let's start by creating another file that contains a function named `sum_of_lists()` and it is called at the end of the file.

The main difference is that we require to import the following:
```python 
from memory_profiler import profile
```
And add the decorator `@profile` on the functions that we want to profile.

In [41]:
%%file delete_me.py

from memory_profiler import profile
import numpy as np

@profile
def sum_of_lists(N):
    a = [0] * (N*10)
    b = [1] * (N*20)
    c = [1.0] * (N*30)
    d = ['A'] * (N*40)
    e = np.arange(N*10)
    f = [j ^ 3 for j in range(N)]
    total = 0
    for i in range(10):
        L = [j ^ 3 for j in range(N)]
        total += sum(L)
        del L # remove reference to L
    del a
    del b
    del c
    del d
    del e
    del f
    return total

sum_of_lists(100000)

Writing delete_me.py


We can profile with `python -m memory_profiler file_to_profile.py`.

In [42]:
!python -m memory_profiler delete_me.py

Filename: delete_me.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     5     53.8 MiB     53.8 MiB           1   @profile
     6                                         def sum_of_lists(N):
     7     61.2 MiB      7.4 MiB           1       a = [0] * (N*10)
     8     76.6 MiB     15.5 MiB           1       b = [1] * (N*20)
     9     99.3 MiB     22.7 MiB           1       c = [1.0] * (N*30)
    10    130.0 MiB     30.7 MiB           1       d = ['A'] * (N*40)
    11    137.5 MiB      7.5 MiB           1       e = np.arange(N*10)
    12    141.5 MiB      4.0 MiB      100003       f = [j ^ 3 for j in range(N)]
    13    141.5 MiB      0.0 MiB           1       total = 0
    14    142.3 MiB      0.0 MiB          11       for i in range(10):
    15    145.3 MiB -1419220.0 MiB     1000030           L = [j ^ 3 for j in range(N)]
    16    145.3 MiB      0.0 MiB          10           total += sum(L)
    17    142.3 MiB    -30.0 MiB          10           de

Or we can profile with `!mprof run file_to_profile.py`.

In [43]:
!mprof run delete_me.py

mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Filename: delete_me.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     5     54.0 MiB     54.0 MiB           1   @profile
     6                                         def sum_of_lists(N):
     7     61.4 MiB      7.4 MiB           1       a = [0] * (N*10)
     8     76.9 MiB     15.5 MiB           1       b = [1] * (N*20)
     9     99.6 MiB     22.7 MiB           1       c = [1.0] * (N*30)
    10    130.2 MiB     30.7 MiB           1       d = ['A'] * (N*40)
    11    137.7 MiB      7.5 MiB           1       e = np.arange(N*10)
    12    141.7 MiB      4.0 MiB      100003       f = [j ^ 3 for j in range(N)]
    13    141.7 MiB      0.0 MiB           1       total = 0
    14    142.5 MiB      0.0 MiB          11       for i in range(10):
    15    145.5 MiB -1422322.7 MiB     1000030           L = [j ^ 3 for j in range(N)]
    16    145.5 MiB      0.0 MiB          10           tot

We can also plot the recorded memory usage (by default, the last one).

In [46]:
!mprof plot -o mem_usage_plot.png

Using last profile data.


Output of `mprof plot`:

![image](mem_usage_plot.png "mprof output")

<strong><font size=4 style="color: red;">Why the output of `mprof` does not show same numbers wen allocating/deallocating memory?</size></strong>

Memory allocation and deallocation in Python are not as straightforward as they may seem due to:
- Garbage collection
- Memory fragmentation
- Optimizations by Python
- Operating system and Hardware

Because of these factors, you may see varying memory usage changes when creating and deleting variables in Python. 

Python's memory management is dynamic and complex, and the exact behavior can vary depending on many factors.

## 7. Conclusions / Take away <a name="conclusion"></a>

Profiling is a valuable skill for any Python developer, and it can lead to more efficient, faster, and higher-quality code. It's a tool for diagnosing and solving performance issues and should be part of the toolkit of every Python programmer.

In this tutorial, we've covered the following key areas:
- Understanding profiling tools.
- Identifying bottlenecks.
- Debugging skills.

**Note**: The reviewed tools work best for sequential applications (non-parallel).

--------
In case of need to reload libraries: Use the following command just once to automatically reload libraries. Otherwise, we would require to restart the notebook each time we want to reload a library.

In [45]:
%load_ext autoreload
%autoreload 2