# Numpy

* NumPy is a Python library for numerical computation on scalars, vectors, matrices, and more generally multidimensional tensors.
* NumPy has advanced features for indexing and manipulating data of arbitrary size.
* It is efficient for linear algebra operations, optimization, sampling of random variables (statistics) ...
* It is open-source and freely accessible (BSD, free license).
* It is the basic tool, together with Pandas, for representing scientific information and datasets which will then be given as input to AI and ML tools

## Motivations

* Basic Python lists and data structures are convenient but slow.
* Operations performed on NumPy structures are up to 50 times faster than the same operations performed on Python data structures.
* NumPy arrays are stored as a single block in memory unlike Python lists, making their use more efficient
* Arrays, matrices and tensors are in fact widely used in Data Science, where speed and computational resources are important.
* NumPy arrays are called *ndarrays* (=n-dimensional array).

In [2]:
import numpy as np

In [2]:
# create a np array

arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
arr = np.array(['a','b']) # also string
arr = np.array(['a',1]) # also mixed types

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [3]:
# check dimension with ndim

print(np.array([1, 2, 3, 4, 5]).ndim)
print(np.array([[1, 2],[3, 4]]).ndim)

1
2


In [4]:
# use ndmin to create an array with a given dimension

arr = np.array([1, 2, 3, 4], ndmin=5)
arr

array([[[[[1, 2, 3, 4]]]]])

In [5]:
# access array elements 

arr = np.array([[1, 2],[3, 4]])
arr[1,1]

np.int64(4)

## Slicing and indexing

To index the elements of a multidimensional array you use square brackets, e.g.: variable_name[x_position,
y_position]

<img src="../../docs/numpy_indexing.png">


In [3]:
# slicing 
# arr[1:3, 1:4]
# the first slice means I have to take the 2nd and 3rd row
# the second slide means that for each row, I take the 2nd-4th elements.

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])
arr[1:3, 1:4]

array([[ 7,  8,  9],
       [12, 13, 14]])

In [7]:
arr.dtype

dtype('int64')

In [8]:
arr = np.array(['a',1])
arr.dtype

dtype('<U21')

In [9]:
# casting
arr = np.array([1.1, 2.1, 3.1])
arr = arr.astype('i') #.astype(int)
arr

array([1, 2, 3], dtype=int32)

In [10]:
#copy
arr = np.array([1.1, 2.1, 3.1])
arr2 = arr.copy()
arr[1] = 3
print(arr)
print(arr2)

[1.1 3.  3.1]
[1.1 2.1 3.1]


In [11]:
#shape
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print('shape of', arr, arr.shape)

arr = np.array([1, 2, 3, 4], ndmin=5)
print('shape of', arr, arr.shape)

shape of [[1 2 3 4]
 [5 6 7 8]] (2, 4)
shape of [[[[[1 2 3 4]]]]] (1, 1, 1, 1, 4)


In [12]:
# reshape
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
# newarr = arr.reshape(4, 3) #error deve essere un multiplo
print(newarr)
# nota che il nuovo array è un RIFERIMENTO al vecchio

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [13]:
# unknown dimension
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, -1) # deve cmq essere possibile fare i multipli
newarr

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [14]:
# flatten
arr = np.array([[1, 2, 3], [4, 5, 6]])
newarr = arr.reshape(-1)
newarr


array([1, 2, 3, 4, 5, 6])

In [15]:
# joining arrays

# concatenate (cannot change the dimension of the array)

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


In [16]:
arr1 = np.zeros((28,28,2))
arr2 = np.ones((28,28,1))
arr = np.concatenate((arr1, arr2), axis=2)
print(arr.shape)

(28, 28, 3)


In [17]:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1) # row concatenate
print(arr)
arr = np.concatenate((arr1, arr2), axis=0) # col concatenate
print(arr)


[[1 2 5 6]
 [3 4 7 8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [18]:
# stack (can change the dimension of the arry)

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1)
print(arr)

arr = np.stack((arr1, arr2), axis=0) 
arr = np.vstack((arr1, arr2)) # vertical stack
print(arr)

[[1 4]
 [2 5]
 [3 6]]
[[1 2 3]
 [4 5 6]]


In [19]:
# search
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)

#for i in x[0]:
#    print(i)

(array([3, 5, 6]),)


In [20]:
# filtering
arr = np.array([41, 42, 43, 44])
filter_arr = arr > 42
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)

[False False  True  True]
[43 44]


In [6]:
# operations (ufunc - Universal Functions - vectorized)
arr1 = np.array([[1, 2], 
                 [3, 4]])
print(np.mean(arr1, axis=0))
print(np.sum(arr1, axis=0))
print(np.sum(arr1, axis=1))


[2. 3.]
[4 6]
[3 7]


In [22]:
def myadd(x, y):
  return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

[6 8 10 12]


In [23]:
# slicing
arr1 = np.array([[1, 2, 3],[4, 5, 6],[7,8,9]])
arr1[2:,2:]

array([[9]])