# Data manipulation with numpy ü™Ç
Let's practice numpy !

In this exercise, you'll have to initialize your own vectors, matrices and tensors and perform basic operations: selecting elements, slicing, masking, reshaping etc...

The last part of the exercise takes advantage of the possibility to draw random values from a known distribution to make you (re-)discover a very famous theorem of Statistics ü§ì

## Dealing with vectors ‚õπÔ∏è

1. Import numpy

In [40]:
import numpy as np

2. Initialize a numpy array called `vec` which contains values from 0 to 10 with a step of 0.5

In [41]:
vec = np.arange(0, 10.5, 0.5)

3. What is the shape of `vec` ?

In [42]:
vec.shape

(21,)

4. Display the 7th value of `vec`

In [43]:
print(f"The value at index 6 is: {vec[6]}")

The value at index 6 is: 3.0


5. Display the 3 first items of `vec`

In [44]:
vec[:3]

array([0. , 0.5, 1. ])

6. Display the 3 last items

In [45]:
vec[-3:]

array([ 9. ,  9.5, 10. ])

7. By using masks, select values of `vec` that are below 7

In [46]:
monMask = vec < 7
monMask

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True, False, False, False, False,
       False, False, False])

In [47]:
vec[monMask]

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5])

## Dealing with matrices üèãÔ∏è‚Äç‚ôÄÔ∏è
8. Define a function called `my_func` that takes to arguments `x`and `y`and returns: $f(x) = x^2 + y$.

In [48]:
def my_func(x,y):
    return np.square(x) + y

9. Use `my_func` to initialize a 4x4 matrix:

In [89]:
matrice=np.fromfunction(my_func,(4,4),dtype=int)
matrice

array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 4,  5,  6,  7],
       [ 9, 10, 11, 12]])

10. Iterate other the matrix' values and for each value, find a way of computing its [remainder](https://en.wikipedia.org/wiki/Remainder) in the integer division by 2.

Hint: There exists an operator (like `+`or `*`) that allows to compute the remainder in integer division. [Python's doc](https://docs.python.org/3/library/operator.html) may help you üòâ

In [90]:
def compute_remainder(x):
    return x % 2
for i in matrice.flat:
    compute_remainder(i)


10. Once you've found the operator that allows to compute the remainder, use it to create a mask that allows to select only even numbers in your matrix. Store these values into an array called `even_numbers`.

In [91]:
monMask2 = matrice % 2 == 0
even_numbers = matrice[monMask2]
even_numbers

array([ 0,  2,  2,  4,  4,  6, 10, 12])

11. Reshape `even_numbers` into a 2x4 matrix and apply the `log` function to its elements.

In [92]:
even_numbers_reshaped=even_numbers.reshape(2,4)
even_numbers_reshaped

array([[ 0,  2,  2,  4],
       [ 4,  6, 10, 12]])

In [93]:
log_even_numbers_reshaped=np.log(even_numbers_reshaped)
log_even_numbers_reshaped

  log_even_numbers_reshaped=np.log(even_numbers_reshaped)


array([[      -inf, 0.69314718, 0.69314718, 1.38629436],
       [1.38629436, 1.79175947, 2.30258509, 2.48490665]])

## From vector to matrix, from matrix to tensor and the way back ü§π
12. Initialize a vector named `vec` containing the 100 first integers:

In [97]:
vec = np.arange(0,100)
vec

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

13. Create a 10x10 `matrix` containing the values of `vec`

In [99]:
reshaped_vec=vec.reshape(10,10)
reshaped_vec

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

14. We'd like to create a `tensor` of rank 3 by reshaping the `matrix` such that it will be structured like several layers of 5x5 matrices. Find a way to do this operation with `.reshape`:

In [107]:
tensored_vec = vec.reshape(-1,5,5)
tensored_vec

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74]],

       [[75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84],
        [85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94],
        [95, 96, 97, 98, 99]]])

15. Select the tensor's first layer (the first 5x5 matrix)

In [111]:
tensored_vec[0]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

16. Among this first layer matrix, select the element at line index 1 and column index 2:

In [112]:
tensored_vec[0,1,2]

np.int64(7)

17. Still among the first layer matrix, select all the elements of column 2:

In [116]:
tensored_vec[0,:,2]

array([ 2,  7, 12, 17, 22])

18. Still among the first layer matrix, select all the elements of line 1:

In [119]:
tensored_vec[0,1,:]

array([5, 6, 7, 8, 9])

19. Re-create the initial 10x10 matrix from your `tensor`

In [137]:
de_tensored_vec = tensored_vec.reshape(10,10)
de_tensored_vec


array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

20. Re-create the initial vector of 100 elements from you `tensor`

In [138]:
vec=tensored_vec.reshape(-1)
vec

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

## A famous theorem ü§ì

Let's use numpy's random values generator to re-discover one of the most famous theorems of statistics !

21. Generate an array named `uniform_values` containing 10.000 elements that are drawn from a (continuous) uniform distribution in the interval [0, 10].

The cell below allows you to visualize the distribution of the values.

In [131]:
uniform_values=np.random.uniform(0,10,10000)
uniform_values

array([8.61385171, 9.65702628, 4.8487211 , ..., 1.48653071, 1.98078584,
       4.527756  ], shape=(10000,))

In [139]:
import plotly.express as px

px.histogram(uniform_values)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

22. Now, create a loop with 1000 iterations. At each iteration:
* Draw a new sample of 10.000 values drawn from a (continuous) uniform distribution in the interval [0, 10]
* Compute the sample's mean (hint: [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) have many useful methods to compute basic statistics)
* Store the mean value into a list named `mean_values`.

In the end, you must get a list named `mean_values` containing 1000 elements. The cell after will allow you to visualize the distribution of the elements in `mean_values`.

In [132]:
mean_values=[]
for i in range(1000):
    uniform_values=np.random.uniform(0,10,10000)
    px.histogram(uniform_values)
    mean_values.append(uniform_values.mean())
print(len(mean_values) == 1000)

True


In [140]:
px.histogram(mean_values)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

23. Do you recognize this curve? Which probability density does it represent ?


A Bell curve.it represents the normal distribution

24. What is the name of the famous theorem that explains why we just got a bell-shaped curve ?

the central limit theorem