<a href="https://colab.research.google.com/github/olgasherbiena/KPI/blob/MMSP/lisagu_figurs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Intro to NumPy
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays (https://en.wikipedia.org/wiki/NumPy).

Numpy is incredibly fast, as it has bindings to C libraries.

In [None]:
import numpy as np

## Creation of NumPy array
### From a list

In [None]:
a_list = [1, 2, 3]
a_array = np.array(a_list)

NameError: name 'np' is not defined

In [None]:
print(type(a_list))
print(type(a_array))

<class 'list'>
<class 'numpy.ndarray'>


In [None]:
matrix = [[1,2,3],[4,5,6],[7,8,9]]
a_2D = np.array(matrix)

### Using built-in methods

In [None]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
np.arange(0, 10, 2) # np.arange(start, stop, step)

array([0, 2, 4, 6, 8])

In [None]:
np.zeros(3)

array([0., 0., 0.])

In [None]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
np.linspace(0, 10, 20) # np.linspace(start, stop, num=50)

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

## Random

### uniform
Return an array of the given shape with random samples from a uniform distribution

In [None]:
np.random.uniform(0, 1, 5) # (low, high, size)

array([0.24015207, 0.07544111, 0.96789315, 0.32081534, 0.34331033])

### randn
Return samples from the “standard normal” distribution.

In [None]:
np.random.randn(2,2) # standard normal (Gaussian) distribution of mean 0 and variance 1

array([[-0.12169145, -1.0715006 ],
       [-0.87798852,  1.34773534]])

### normal

In [None]:
mu, sigma = 1, 0.5 # mean and standard deviation
np.random.normal(mu, sigma, size = (3, 2))

array([[0.2944109 , 0.50597265],
       [0.4838797 , 1.1399829 ],
       [1.17712201, 1.11682474]])

### multivariate normal

In [None]:
mean = [1, 2]
cov = [[1, 0], [0, 1]]
np.random.multivariate_normal(mean, cov, size= (3, 2))

array([[[1.22072892, 2.73542753],
        [1.48960286, 1.71097724]],

       [[1.84969535, 1.53060501],
        [0.81885285, 1.96233603]],

       [[1.37394612, 4.30925623],
        [2.17860549, 2.32785531]]])

### randint
Return random integers

In [None]:
np.random.randint(1, 100, 5) # randint(low, high=None, size=None)

array([14, 47, 88, 89, 63])

### seed()
The seed() method is used to initialize the random number generator.

In [None]:
np.random.seed(0)

### Operations

In [None]:
arr = np.arange(0, 10)
arr_r = arr.reshape(2, 5)
print('arr: ')
print(arr)
print('arr_r: ')
print(arr_r)

arr: 
[0 1 2 3 4 5 6 7 8 9]
arr_r: 
[[0 1 2 3 4]
 [5 6 7 8 9]]


In [None]:
my_arr = np.random.randint(1, 100, 10)
print(my_arr)

[45 48 65 68 68 10 84 22 37 88]


In [None]:
my_arr.max()

np.int64(88)

In [None]:
my_arr.argmax()

np.int64(9)

In [None]:
arr_1 = np.array([1, 2, 3])
arr_2 = np.array([4, 5, 6])

In [None]:
arr_1 + arr_2

array([5, 7, 9])

In [None]:
arr_1 * arr_2

array([ 4, 10, 18])

In [None]:
np.exp(arr_1)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [None]:
np.log(arr_1)

array([0.        , 0.69314718, 1.09861229])

Performance: NumPy vs Python

In [None]:
import time
size = 1000000
a = list(range(size))
b = list(range(size))

start = time.time()
c = [x + y for x, y in zip(a, b)]
print("Pure Python:", time.time() - start)

a_np = np.arange(size)
b_np = np.arange(size)

start = time.time()
c_np = a_np + b_np
print("NumPy:", time.time() - start)


Pure Python: 0.11652708053588867
NumPy: 0.005560874938964844


# Aggregation in NumPy

Aggregation functions allow you to summarize data along specific dimensions (axes). Examples include sum, mean, std, min, max, etc.

In [None]:
# Basic aggreration

data = np.array([[1, 2], [3, 4]])

# Sum of all elements
print(np.sum(data))        # 10

# Mean of all elements
print(np.mean(data))       # 2.5


10
2.5


In [None]:
# Aggregation along axes
#Use the axis argument to control the dimension along which the function operates:

# Sum across rows (axis=1)
print(np.sum(data, axis=1))  # [3 7]  row-wise (across each row)

# Sum down columns (axis=0)
print(np.sum(data, axis=0))  # [4 6] column-wise (down each column)

[3 7]
[4 6]


In [None]:
arr = np.random.randint(0, 100, size=(4, 5))

print("Min:", np.min(arr))
print("Mean:", np.mean(arr))
print("Standard deviation:", np.std(arr))


Min: 9
Mean: 59.5
Standard deviation: 26.284025566872362


In [None]:
#Useful tip: Keep dimensions with keepdims=True

arr = np.random.rand(3, 4)
mean_per_row = np.mean(arr, axis=1, keepdims=True)  # shape (3, 1)
centered = arr - mean_per_row  # broadcast subtraction
