* * *
<pre> INSEA            <i> Techniques de réduction de dimension - 2025 </i></pre>
* * *


<h1 align="center"> TP 1: Introduction to Python for data science </h1>

<pre align="left"> Lundi 24 Novembre  2025            <i> Author: Hicham Janati </i></pre>
* * *



# How to follow this lab:

- The goal is to **understand AND retain in the long term**: resist copy-pasting, prefer typing manually.
- Getting stuck while programming is completely normal: search online, use documentation, or use the *AI*.
- When prompting the AI, you must be specific. Explain that your goal is to learn, not to get an instant solution no matter what. Ask for short, explained answers with alternatives.
- **NEVER ASK THE AI TO PRODUCE MORE THAN ONE LINE OF CODE!**
- Adopt the `Solve-It` method: always try to solve a question or predict the output of code *before* running it. Learning happens when you confirm your understanding—and even more when you’re wrong and surprised.


# I - Python basics


## 1 - Variables, names and identity
In Python, variable names are references to objects that live in memory. The _identity_ of an object refers to its unique location in memory.

### 1.1 The identity of a variable:
#### Question 1: Predict the output of:

In [None]:
a = [1, 2, 3]
b = a

print(f"a = {a}, b = {b}")

print(f"a == b ? {a == b}")

The variables a and b do not “contain” the list. Instead, they are names pointing to the same object. We can confirm this by running the `id` function which returns the memory location:

In [None]:
print(f"id(a) = {id(a)}, id(b) = {id(b)}")

### 1.2 Comparing with `==` or `is`:

Equality (`==`) checks if two objects have the same value

Identity (`is`) checks if two names refer to the same object

#### Question 2: Predict the output of:

In [None]:
a = [1, 2, 3]
b = [1, 2, 3]

print(f"a = {a}, b = {b}")

print(f"a == b ? {a == b}")
print(f"a is b ? {a is b}")


print(f"id(a) = {id(a)}, id(b) = {id(b)}")


### Question 3
What does the following code change internally ?

In [None]:
a = 10
a = 20

## 1.3 Mutable / immutable types
Some python types are immutable: once they're created, their values cannot be changed without creating a new object:
- ints, floats, booleans, strings, tuples.

Other can be changed:
- dicts, lists, set

Analyse what happens in these two snippets of code:

In [None]:
a = 10
a = a + 5


The operation `x = x + 5` appears to change the value of x, but it actually computes `10 + 5`, saves it in a new physical location and attaches the name `a` to it.

In [None]:
x = [5, 4]
x.append(1)


The operation `x.append(1)` is said to "modify" (mutate) the object "in-place": it changes the value of the memory location.

Try this with strings:

In [None]:
s = "something"
s = s.upper()


#### Question 4: what is the output of the following cells ? analyse with `id`

In [None]:
a = [1, 2, 3]
b = a

a.append(15)

print(f"a = {a}, b = {b}")


In [None]:
a = [1, 2, 3]
b = a

a = a + [15]

print(f"a = {a}, b = {b}")


In [None]:
a = [1, 2, 3]
b = a
a += [15]

print(f"a = {a}, b = {b}")


In [None]:
a = [2, 3, 4]
b = a

b[0] = 1000


The operation `+=` is also said to be `in-place`: when manipulating lists, always be careful to what you're actually changing !


### Question 5: What about slicing ?

In [None]:
a = [0, 1, 2, 3, 4, 5, 6]
b = a[:4]

b[0] = 1000

Let's say I want to create a copy `b` of a list `a` in a different physical location (so that changing`b` won't affect `a`) how can I do it, based on the previous question ?

### Question 6: try these operations seen on list to tuples:

In [None]:
a = (1, 2, 3)
b = a
a = (15, 2)

In [None]:
a = (1, 2, 3)
b = a * 2
print(f"a = {a}, b = {b}")

What if we put a mutable type inside an immutable type ?

In [None]:
a = (1, [1, 2, 3])
b = a
a[1].append(1000)
print(f"a = {a}, b = {b}")


We want to create a copy of a list using a slice like before, but the list contains another list. Would it work ?

In [None]:
a = [1, 2, 3, [4, 5, 6]]
b = a[:]
b[3].append(1000)
print(f"a = {a}, b = {b}")



The slice copies the `references` of the elements of the list but NOT recursively: it will not go through the elements of the elements (if any are lists). Such a recursive copy is called a `deepcopy` which can be done by the package copy:

In [None]:
import copy
a = [1, 2, 3, [4, 5, 6]]
b = copy.deepcopy(a)
b[3].append(1000)
print(f"a = {a}, b = {b}")


## 1.4 What happens inside a function call

### Question 7: Call this function with different types, does the function make a copy of the object ?

In [None]:
def return_self(x):
    print(f"inside function: x = {x}, id(x) = {id(x)}" )
    return x


#### Question 8: Same for this one which manipulates an immutable int. Is the returned object the same ?

In [None]:
def add_one(x):
    print(f"inside function: x = {x}, id(x) = {id(x)}" )
    x = x + 1
    print(f"inside function after increment: x = {x}, id(x) = {id(x)}" )
    return x

a = 10
b = add_one(a)


#### Question 9: What about mutable types like lists ?

In [None]:
def append_one(x):
    print(f"inside function before append: x = {x}, id(x) = {id(x)}" )
    x.append(1)
    print(f"inside function after append: x = {x}, id(x) = {id(x)}" )
    return x

a = [1]
b = append_one(a)


The logic we discovered earlier happens inside the function, Python never makes copies when passing arguments to functions: the references of the objects are passed as if the code is executed outside the function. The local variables are "local names" attached to existing objects.

### Question 10: What about this one which manipulates a list ?

In [None]:
def change_list(x):
    x = [4, 5, 6]
    return x


### Question 11: Explain this behavior

In [None]:
def change_list(x=[]):
    x.append(1)
    return x

change_list()
change_list()
change_list()

### 1.5 Variables scopes
When Python looks up a name, it searches in this order, the LEGB order:

1. Local – inside the current function (including function arguments).
2. Enclosing – outer function scopes (for nested functions, function inside a function).
3. Global – module-level names.
4. Built-in – stuff like `len`, `print`, `int`, `def`.

### Question 12: What is the output of the following cells ?


In [None]:
def replace_list(lst):
    lst = [99, 100]

x = [1, 2, 3]
replace_list(x)
print(x)

In [None]:
x = 10

def outer():
    x = 20 

    def inner():
        x = 30
        print(x)

    inner()

outer()

# 2 - Scientific computing
### 2.1 Floating point numbers
Floats in Python are represented using binary fractions / powers in base 2. Numbers between 0-1 for example are sums of the form:
$$ \sum_{k=1} \frac{a_k}{2^k} $$
 with the $a_k$ being equal to either 0 or 1.

You can check how numbers are represented (and whether this representation is exact) using decimal

In [None]:
from decimal import Decimal

print(Decimal(0.3))

### Question 13: explain the following results

In [None]:
0.1

In [None]:
0.1 + 0.1 + 0.1


In [None]:
0.1 + 0.1 + 0.1 + 0.1

In [None]:
a = 1e16
b = -1e16
c = 1.0

(a + b) + c 


In [None]:
a + (b + c)  

Always be careful of the order of magnitude of the numbers you're manipulating ! 
For more information on how floating pointing numbers work, see [Python doc -- floats](https://docs.python.org/3/tutorial/floatingpoint.html)

## 2.2 Numpy vs built-in Python

Calculating a sum with a loop. You can "time" the operation with the magic command `%%time`.


In [None]:
%%time

N = 10_000_000 # underscores in whole numbers are ignored by Python

numbers = []  # create an empty list
for ii in range(N):
    numbers.append(ii) # add the number to the list
total = sum(numbers)

print(f"The total is {total}")

The `%%time` at the beginning of the cell above is an example of *magic commands*. It allows you to measure the time the processor (CPU) took to execute the entire cell. Magic commands with a single percent sign apply only to a single line.

In [None]:
%time print("The total is", sum([ii for ii in range(N)]))

Always prefer list comprehensions !

Lists in python can contain any mix of types, to run the sum above, Python needs to do type checks at each step before computing the `+`. Moreover, everything in Python is an object, even `integers` which have their own attributes and methods, making built-in types even slower. `Numpy` allows to circumvent this limitation by creating `arrays` where elements have the same type. 

In [None]:
import numpy as np

x = np.arange(5)

We can check its type:


In [None]:
x.dtype

In [None]:
x = 2 * np.ones(1).astype(int)
x

### Question 14: Using `x` above, run the following cells, explain

In [None]:
y = x ** 62
y

In [None]:
y = x ** 63
y

But using built-in integers:

In [None]:
y = 2 ** 62
y

In [None]:
y = 2 ** 63
y


Back to the sum computation. With numpy, we can run it in one line:

In [None]:
%time print(np.arange(N).sum())

As you can see, `NumPy` is  a lot faster than list comprehensions, and the code is much shorter. `NumPy` should always be preferred over Python’s native lists when working solely with numbers and matrices. `NumPy` vectorizes operations: instead of going through elements one by one as in a for loop, the operations are applied simultaneously in the language `C`.


The dot product between two arrays \( x \) and \( y \) of length \( n \) is defined as:  
$$
\langle x, y \rangle = x^{\top} y = \sum_{i=1}^n x_i y_i.
$$

### Question 15:
Complete the following cells to compare the speed of dot products using native Python loops and the `numpy.dot` operation:


In [None]:
%%time

N = 10_000_000
x = np.random.randn(N)  # creates a list of random numbers following the Gaussian distribution
y = np.random.randn(N)

result_loops = 0

## To do 



In [None]:
%%time
result_numpy =  ## to do 

#### Numpy slicing

NumPy offers a simple way to select subsets of an array, called slicing. To obtain a *slice* from the 3rd to the 5th element, for example:


In [None]:
x = np.arange(10)
print("All the array: ", x)
print("A slice: ", x[2:5])

Remember that Python starts indexing at 0 and that a slice `[start:end]` includes `start` but excludes `end`. If `start` is omitted, it is set to 0, and if `end` is omitted, it corresponds to the last index of the array. To select the first 6 elements:

In [None]:
x[:6]

To exclude the last element, you can use negative indices. For example:


In [None]:
x[:-1]

In [None]:
x[:-3]

Slices can also have a third parameter called `step`. So far, we have omitted this argument, which defaults to 1. Starting from 0, we can select the even indices by using a step of 2:


In [None]:
x[0:10:2]

We can omit the `start` and `end` arguments since they don’t do anything in this case:


In [None]:
x[::2]

### Question 16:
Create a slice that selects the odd numbers in reverse order using a single slice:


### Question 17: 
array numpy array mutable ? What happens if you modify a full slice like we did with lists using x[:] ?

## 2.3 Scientific computing with Numpy

The shape of a numpy array is by default not a column and not a row: just an array. To make it a row or a column, we must add an "imaginary" axis to change its shape:

In [None]:
a = np.array([1, 2, 3])
a.shape



In [None]:
b = a[np.newaxis, :]
b.shape


In [None]:
c = a[:, np.newaxis]
c.shape

This is equivalent to a reshape:

In [None]:
b = a.reshape(3, 1)
b.shape

### Question 18:
We have two vectors $a$ and $b$. We would like to compute the distance matrix $D_{ij} = (a_i - b_j)^2$. Implement this using built-in Python and Numpy. Which is faster ?

In [None]:
a = np.random.randn(10)
b = np.random.randn(20)
D = np.zeros((10, 10))


### Question 19:
We have two datasets $A$ and $B$ both of dimension 5. We would like to compute the distance matrix $D_{ij} = \|A_i - B_j\|^2$. Implement this using built-in Python and Numpy. Look up the function `np.linalg.norm`.


### Question 20:    
Generate a random dataset of 100 samples in 2 dimensions. Center the data, compute the empirical covariance and diagonalize with `np.linalg.eig` or `np.linalg.eigh`. Which one should you prefer ?