<a href="https://colab.research.google.com/github/simonebugo/Big_Data/blob/main/2a_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers.

In [5]:
import numpy as np

# The NumPy ndarray: A Multidimensional Array Object
- N-dimensional array object, or ndarray, which is a fast, flexible container for large data sets
- Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax
- An ndarray is a generic multidimensional container for homogeneous data
  - all of the elements must be the **same type**,
  - **shape**, a tuple indicating the size of each dimension,
  - **dtype**, an object describing the data type of the array.

## Creating ndarrays
- the ***array*** function accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data


In [1]:
data1 = [1, 3, 5, 7, 9, 11]

In [2]:
data1

[1, 3, 5, 7, 9, 11]

In [3]:
type(data1)

list

In [7]:
arr1 = np.array(data1)

In [8]:
arr1

array([ 1,  3,  5,  7,  9, 11])

In [9]:
arr1.ndim
#mi restituisce la dimensione

1

In [10]:
arr1.shape


(6,)

In [11]:
arr1.dtype
#indica il tipo degli elementi nell'array

dtype('int64')

In [12]:
arr1.itemsize # the size (in bytes) of each array element

8

In [13]:
arr1.nbytes # the total size (in bytes) of the array

48

In [16]:
data2 = [1, 3.7, 5, 7, 9, 11.4]

In [17]:
data2

[1, 3.7, 5, 7, 9, 11.4]

- data2 contiene numeri interi e numeri decimali, posso creare un Numpy array?

In [18]:
arr2 = np.array(data2)

In [19]:
arr2

array([ 1. ,  3.7,  5. ,  7. ,  9. , 11.4])

In [20]:
for x in arr2:
  print(x)

1.0
3.7
5.0
7.0
9.0
11.4


In [21]:
arr2[0]

np.float64(1.0)

- Con due dimensioni?

In [22]:
data_2 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
#con 2 dimensioni si definisce una matrice, lo posso fare partendo da una lista di liste

In [23]:
type(data_2)

list

In [24]:
data_2[1][0]

4

In [25]:
data_2

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [26]:
arr_2 = np.array(data_2)

In [27]:
arr_2

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [28]:
arr_2.ndim
#->2 ho righe e colonne

2

In [None]:
arr_2.shape

- Other techniques for initializing ndarrays
  - ones
  - zeros
  - empty
  - eye

In [29]:
np.ones(16)
#inizializzo un array tutto a 1. di default sono float 64

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [30]:
np.ones((16, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [32]:
np.empty((3, 4))
#non ho controllo dei valori, 3 righe e 4 colonne

array([[4.18725558e-315, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000],
       [2.22809558e-312, 1.50008929e+248, 4.31174539e-096,
        1.15998412e-028],
       [3.77778426e+180, 1.15998412e-028, 4.19462329e+228,
        1.55535091e+161]])

In [33]:
np.zeros((4, 4, 4))
#3 dimensioni ognuna di 4x4 elemeti. ho 3 matrici 4x4

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [31]:
np.eye(44)
#matrice identità

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

- When constructing an array, you can specify the data type using a string

In [None]:
np.ones(10, dtype='float32')  # Default is numpy.float64

In [None]:
# random.randint(low, high=None, size=None, dtype=int)

a1 = np.random.randint(10, size=10)  #one-dimensional array
a2 = np.random.randint(10, size=(10, 4)) # two-dimensional array
a3 = np.random.randint(10, size=(10, 3, 3)) # three-dimensional array

## Basic array manipulations

#### Indexing and slicing arrays
- Getting and setting the value of individual array elements
- Getting and setting smaller subarrays within a larger array

In [34]:
# numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)

arr = np.arange(0,10,1) #posso utilizzare lo slicing per accedere ai valori. funziona arrange crea sequenza di numeri da 0 a 10 aggiungendo sempre 1 in questo caso

In [35]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
arr[5]

In [None]:
arr[:]

In [36]:
arr[2:5] #accedo ai valori dall'indice 2 all'indice 5

array([2, 3, 4])

In [37]:
arr[2:5] = 111 # data is not copied, and any modifications to the view will be reflected in the source array
#vado a modificare i valori degli array

In [38]:
arr

array([  0,   1, 111, 111, 111,   5,   6,   7,   8,   9])

In [43]:
lista = [0,1,2]
# lista[:2] = 10 nelle liste questa cosa non si può fare, deve esserci la stessa dimensione a destra e sinistra dell' =

In [40]:
lista[:2] = [10,10]

In [41]:
lista

[10, 10, 2]

In [44]:
array2D = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
#il modo di accedere con lo slicing si può utilizzare per accedere anche ad array con + dimensioni

In [45]:
array2D
array2D[2] #si può accedere come per le liste

array([7, 8, 9])

In [None]:
array2D[2]

In [46]:
array2D[1,1] #elemento che ha come indice di riga 1 e di colonna 1

np.int64(5)

In [47]:
array2D[1,0]

np.int64(4)

In [48]:
array2D[1][0] #equivalente all'altro

np.int64(4)

In [None]:
array2D[1][0] = 44

In [None]:
array2D

In [49]:
array2D[:, 0:1] #slicing si può utilizzare così per accedeere a tutte le righe e solo ad una colonna siccome l'indice 1 non è compreso

array([[1],
       [4],
       [7]])

In [50]:
array2D[2:,1:] #vado da riga di indice 2 (quindi in questo caso solo la riga 2 siccome ho 3 righe (0,1,2) in poi a colonna di indice 1 in poi

array([[8, 9]])

In [None]:
array2D[:,1:]

#### Boolean indexing

In [97]:
data = np.random.randn(7,5)
#posso confrontare i valori di un aray rispetto ad un certo numero

In [53]:
data

array([[ 0.86934243, -0.42366701, -0.27151807,  1.03659793,  0.13529045],
       [-1.09044936, -0.87550047,  0.62752675,  0.12787938,  0.49581458],
       [-0.29576356, -0.23402278, -0.65400534, -0.60856177, -0.89981989],
       [-0.17317664,  0.74993107,  0.11053261, -0.20642049,  0.75836111],
       [ 0.04948849, -1.37093237, -0.868576  ,  0.10359199,  1.43761472],
       [ 1.28865933, -0.53384523,  1.27289562, -1.34891404, -1.07086641],
       [ 0.67084655, -0.20970465,  1.7547897 , -1.74853566,  0.79155045]])

In [54]:
data < 0 #valori true o false in basa se è rispettata o meno

array([[False,  True,  True, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True,  True,  True],
       [ True, False, False,  True, False],
       [False,  True,  True, False, False],
       [False,  True, False,  True,  True],
       [False,  True, False,  True, False]])

In [56]:
data[data<0]=0 #così facendo tutti i valori < 0 vengono modificati con 0

In [57]:
data

array([[0.86934243, 0.        , 0.        , 1.03659793, 0.13529045],
       [0.        , 0.        , 0.62752675, 0.12787938, 0.49581458],
       [0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.74993107, 0.11053261, 0.        , 0.75836111],
       [0.04948849, 0.        , 0.        , 0.10359199, 1.43761472],
       [1.28865933, 0.        , 1.27289562, 0.        , 0.        ],
       [0.67084655, 0.        , 1.7547897 , 0.        , 0.79155045]])

In [None]:
array2D

In [None]:
array2D == 5

In [None]:
array2D[array2D == 5]=0

In [None]:
array2D

In [None]:
(array2D == 3) | (array2D == 8)

#### Fancy Indexing
- Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
- To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order

In [61]:
arr = np.empty((10, 6))

In [None]:
for i in range(10):
    arr[i] = i

In [62]:
arr

array([[4.10958283e-315, 0.00000000e+000, 6.66346214e-310,
        1.27319747e-313, 4.75093525e-320, 6.66346214e-310],
       [3.60667921e-322, 6.66346198e-310,             nan,
        4.16917268e-315, 6.66346214e-310, 2.33419537e-313],
       [4.78255545e-320, 6.66346214e-310,             nan,
        2.12199579e-314,             nan, 4.16917246e-315],
       [6.66346214e-310, 1.27319747e-313, 4.81417565e-320,
        6.66346214e-310, 3.65608578e-322, 6.66346198e-310],
       [            nan, 4.16917268e-315, 6.66346214e-310,
        2.33419537e-313, 4.84579585e-320, 6.66346214e-310],
       [            nan, 2.12199579e-314,             nan,
        4.16917246e-315, 6.66346214e-310, 1.27319747e-313],
       [4.87741606e-320, 6.66346214e-310, 3.70549234e-322,
        6.66346198e-310,             nan, 4.16917268e-315],
       [6.66346214e-310, 2.33419537e-313, 4.90903626e-320,
        6.66346214e-310,             nan, 2.12199579e-314],
       [            nan, 4.16917246e-315, 6.6634

In [None]:
arr[[2,1,4]]

In [None]:
arr[np.array([1,5,4,3,6,6,7])]

#### Reshaping of arrays
- Changing the shape of a given array


In [63]:
arr.shape

(10, 6)

In [64]:
arr

array([[4.10958283e-315, 0.00000000e+000, 6.66346214e-310,
        1.27319747e-313, 4.75093525e-320, 6.66346214e-310],
       [3.60667921e-322, 6.66346198e-310,             nan,
        4.16917268e-315, 6.66346214e-310, 2.33419537e-313],
       [4.78255545e-320, 6.66346214e-310,             nan,
        2.12199579e-314,             nan, 4.16917246e-315],
       [6.66346214e-310, 1.27319747e-313, 4.81417565e-320,
        6.66346214e-310, 3.65608578e-322, 6.66346198e-310],
       [            nan, 4.16917268e-315, 6.66346214e-310,
        2.33419537e-313, 4.84579585e-320, 6.66346214e-310],
       [            nan, 2.12199579e-314,             nan,
        4.16917246e-315, 6.66346214e-310, 1.27319747e-313],
       [4.87741606e-320, 6.66346214e-310, 3.70549234e-322,
        6.66346198e-310,             nan, 4.16917268e-315],
       [6.66346214e-310, 2.33419537e-313, 4.90903626e-320,
        6.66346214e-310,             nan, 2.12199579e-314],
       [            nan, 4.16917246e-315, 6.6634

In [65]:
arr.reshape((5,12)) #il numero di elementi deve essere sempre la stesso. prima avevo 60 valori, devo avere 60 valori pure ora

array([[4.10958283e-315, 0.00000000e+000, 6.66346214e-310,
        1.27319747e-313, 4.75093525e-320, 6.66346214e-310,
        3.60667921e-322, 6.66346198e-310,             nan,
        4.16917268e-315, 6.66346214e-310, 2.33419537e-313],
       [4.78255545e-320, 6.66346214e-310,             nan,
        2.12199579e-314,             nan, 4.16917246e-315,
        6.66346214e-310, 1.27319747e-313, 4.81417565e-320,
        6.66346214e-310, 3.65608578e-322, 6.66346198e-310],
       [            nan, 4.16917268e-315, 6.66346214e-310,
        2.33419537e-313, 4.84579585e-320, 6.66346214e-310,
                    nan, 2.12199579e-314,             nan,
        4.16917246e-315, 6.66346214e-310, 1.27319747e-313],
       [4.87741606e-320, 6.66346214e-310, 3.70549234e-322,
        6.66346198e-310,             nan, 4.16917268e-315,
        6.66346214e-310, 2.33419537e-313, 4.90903626e-320,
        6.66346214e-310,             nan, 2.12199579e-314],
       [            nan, 4.16917246e-315, 6.66346214

In [66]:
arr.reshape((-1,15))

array([[4.10958283e-315, 0.00000000e+000, 6.66346214e-310,
        1.27319747e-313, 4.75093525e-320, 6.66346214e-310,
        3.60667921e-322, 6.66346198e-310,             nan,
        4.16917268e-315, 6.66346214e-310, 2.33419537e-313,
        4.78255545e-320, 6.66346214e-310,             nan],
       [2.12199579e-314,             nan, 4.16917246e-315,
        6.66346214e-310, 1.27319747e-313, 4.81417565e-320,
        6.66346214e-310, 3.65608578e-322, 6.66346198e-310,
                    nan, 4.16917268e-315, 6.66346214e-310,
        2.33419537e-313, 4.84579585e-320, 6.66346214e-310],
       [            nan, 2.12199579e-314,             nan,
        4.16917246e-315, 6.66346214e-310, 1.27319747e-313,
        4.87741606e-320, 6.66346214e-310, 3.70549234e-322,
        6.66346198e-310,             nan, 4.16917268e-315,
        6.66346214e-310, 2.33419537e-313, 4.90903626e-320],
       [6.66346214e-310,             nan, 2.12199579e-314,
                    nan, 4.16917246e-315, 6.66346214e

In [None]:
arr.reshape((3,2,10)) #posso cambiare il nuemero di dimensioni, l'importante è che il numero di elementi sia sempre 60

#### Change the data type of an array.

In [None]:
x = np.array([[2, 4, 6], [6, 8, 10]], np.int32)

In [None]:
y= x.astype(float)

In [None]:
y

#### Joining and splitting of arrays
- Combining multiple arrays into one, and splitting one array into many
  - np.concatenate takes a tuple or list of arrays as its first argument

In [79]:
x = np.array([1,2,3,4,5])
y = np.array([6,7,8,9,10])
x


array([1, 2, 3, 4, 5])

In [78]:
np.concatenate([x,y]) #li concatena una accanto all'altro (array di 1 sola dimensione)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [80]:
x.ndim

1

In [None]:
x.shape

In [69]:
np.concatenate([x,y],axis=1)    # The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0.
#gli array solo di un unica dimenione

AxisError: axis 1 is out of bounds for array of dimension 1

In [70]:
xR=x.reshape(1,-1) #array di 1 sola riga dove il numero di colonne è calcolato in modo automatico (array di una riga e 5 colonne) -> ha quindi 2 dimensioni
yR=y.reshape(1,-1)

In [74]:
xR

array([[1, 2, 3, 4, 5]])

In [75]:
xR.ndim

2

In [None]:
xR.shape

In [76]:
np.concatenate([xR,yR])    # default axis=0

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [None]:
np.concatenate([xR,yR], axis=1)

- For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions


In [None]:
x = np.array([1,2,3])
y = np.array([[4,5,6],[7,8,9]])

In [None]:
x

In [None]:
y

In [81]:
np.vstack([x,y]) #vertical

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [82]:
np.hstack([y,y]) #orizzontale

array([ 6,  7,  8,  9, 10,  6,  7,  8,  9, 10])

- The opposite of concatenation is splitting, which is implemented by the functions np.split


In [None]:
z, k = np.split(y,[1])

In [None]:
z

In [None]:
k

## Computation on NumPy Arrays



- Any arithmetic operations between **equal-size arrays** applies the operation **element-wise**
- Arithmetic operations with scalars are propagating the value to each element

In [98]:
arr1 = np.random.randint(10,size= (10,10))
arr2 = np.random.randint(10,size= (10,10))

In [99]:
arr1

array([[3, 5, 7, 8, 2, 8, 5, 3, 4, 6],
       [5, 7, 3, 2, 0, 7, 4, 2, 3, 9],
       [3, 4, 3, 6, 6, 7, 9, 4, 1, 4],
       [6, 0, 8, 3, 9, 6, 0, 5, 5, 9],
       [4, 8, 1, 2, 7, 6, 6, 4, 5, 7],
       [3, 3, 6, 7, 6, 6, 1, 1, 2, 7],
       [0, 8, 4, 5, 0, 4, 8, 8, 5, 8],
       [3, 9, 9, 9, 6, 7, 4, 9, 3, 0],
       [7, 1, 0, 2, 9, 2, 0, 0, 5, 3],
       [6, 2, 9, 1, 4, 5, 5, 7, 2, 4]])

In [None]:
arr1 * 125

In [None]:
arr1 + arr2    # arr1 and arr2 have the same shape

In [None]:
arr1 * arr2 - arr1 / ( arr2 + 1)

- Transposing arrays and inner matrix product

In [None]:
arr1

In [None]:
arr1.T #da la trasposta matrice

In [None]:
np.dot(arr1,arr2)

### Mathematical and Statistical Methods

- A set of mathematical functions which compute statistics about an entire array or about the data along an axis are accessible as array methods.



Algebric operations

```
+	np.add	Addition (e.g., 1 + 1 = 2)
-	np.subtract	Subtraction (e.g., 3 - 2 = 1)
-	np.negative	Unary negation (e.g., -2)
*	np.multiply	Multiplication (e.g., 2 * 3 = 6)
/	np.divide	Division (e.g., 3 / 2 = 1.5)
//	np.floor_divide	Floor division (e.g., 3 // 2 = 1)
**	np.power	Exponentiation (e.g., 2 ** 3 = 8)
%	np.mod	Modulus/remainder (e.g., 9 % 4 = 1)
```

Trigonometric functions:

```
sin, cos, tan	compute sine, cosine and tangent of angles
arcsin, arccos, arctan	calculate inverse sine, cosine and tangent
hypot	calculate hypotenuse of given right triangle
sinh, cosh, tanh	compute hyperbolic sine, cosine and tangent
arcsinh, arccosh, arctanh	compute inverse hyperbolic sine, cosine and tangent
deg2rad	convert degree into radians
rad2deg	convert radians into degree
```

Statistical functions:

```
amin, amax	returns minimum or maximum of an array or along an axis
ptp	returns range of values (maximum-minimum) of an array or along an axis
percentile(a, p, axis)	calculate pth percentile of array or along specified axis
median	compute median of data along specified axis
mean	compute mean of data along specified axis
std	compute standard deviation of data along specified axis
var	compute variance of data along specified axis
average	compute average of data along specified axis
```

In [None]:
arr1

In [None]:
np.median(arr1)

In [None]:
np.add(arr1,arr2)    # subtract, multiply, divide

In [None]:
arr1+arr2

- **np.where** returns elements depending on condition

In [85]:
a = np.arange(1,11)

In [86]:
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [87]:
a>5

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [88]:
np.where(a>5) #voglio capire dove il valore è > 0, ritorna tutti gli indici dell'array dove il valore è > 5

(array([5, 6, 7, 8, 9]),)

In [90]:
np.where(a>5,0,10) #dove a è > 5 metti 0 altrimenti metti 10

array([10, 10, 10, 10, 10,  0,  0,  0,  0,  0])

In [None]:
b = np.arange(10,20)

In [None]:
b

In [None]:
np.where(a%2==1,a,b)

## Boolean Arrays

- Boolean values are coerced to 1 (True) and 0 (False) in the above methods.
- Sum is often used as a means of counting True values in a boolean array.


In [None]:
bools = np.array([True,True,False,False,True])

In [None]:
bools.sum()

In [None]:
bools.any() #indice se c'è almeno un valore = true nell'array

In [None]:
bools.all() #verifica se tutti sono = true

In [None]:
bools[:2].all()

## Sorting
- Like Python’s built-in list type, NumPy arrays can be sorted in-place using the sort method

In [91]:
arrsort = np.random.randn(4,3)

In [92]:
arrsort

array([[-2.63863533,  0.60305006, -0.3545661 ],
       [ 0.91338736, -0.55943621,  0.80805798],
       [-1.23889178,  1.12965061,  0.50247233],
       [-1.42693153,  0.90802027,  0.20306873]])

In [93]:
arrsort.sort(0)    # righe, metto in ordine sicondo le righe

In [94]:
arrsort

array([[-2.63863533, -0.55943621, -0.3545661 ],
       [-1.42693153,  0.60305006,  0.20306873],
       [-1.23889178,  0.90802027,  0.50247233],
       [ 0.91338736,  1.12965061,  0.80805798]])

In [95]:
arrsort.sort(1)    # colonne, metto in ordine secondo le colonne

In [96]:
arrsort

array([[-2.63863533, -0.55943621, -0.3545661 ],
       [-1.42693153,  0.20306873,  0.60305006],
       [-1.23889178,  0.50247233,  0.90802027],
       [ 0.80805798,  0.91338736,  1.12965061]])

## Unique and Other Set Logic

- NumPy has some basic set operations for one-dimensional ndarrays.

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [None]:
np.unique(names) #da un array con nomi che non sono ripetuti