# Machine Learning Zoomcamp


## 1.7 Introduction to NumPy


Plan:

* Creating arrays
* Multi-dimensional arrays
* Randomly generated arrays
* Element-wise operations
    * Comparison operations
    * Logical operations
* Summarizing operations

In [2]:
import numpy as np

In [8]:
np

<module 'numpy' from '/home/alexey/.pyenv/versions/3.8.11/lib/python3.8/site-packages/numpy/__init__.py'>

## Creating arrays


In [10]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [11]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [12]:
np.full(10, 2.5)

array([2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5])

In [15]:
a = np.array([1, 2, 3, 5, 7, 12])
a

array([ 1,  2,  3,  5,  7, 12])

In [17]:
a[2] = 10

In [18]:
a

array([ 1,  2, 10,  5,  7, 12])

Creates range from n to m -- similar to python range(). Stop number is exclusive.

In [20]:
np.arange(3, 10)

array([3, 4, 5, 6, 7, 8, 9])

Creates numbers between first number and last number (inclusive) of size n. Will create range with equal intervals.

In [24]:
np.linspace(0, 100, 11)

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

## Multi-dimensional arrays


np.zeros can work in multi-dimensional arrays.

In [3]:
# Will create an array of zeroes with 5 rows and 2 columns (shape 5, 2)
np.zeros((5, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [10]:
# 3 x 3, 2 dimensional. First dimension is signified by the outer brackets, second dimension by the inner brackets. First dim is row, second is column.
# Create array from 3 python lists (list of lists)
n = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [9]:
# Access row 0, column 1 and assign to 20
print(n[0, 1])
n[0, 1] = 20

2


In [30]:
n

array([[ 1, 20,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [11]:
# Access single row (row 2, third row)
n[2]

array([7, 8, 9])

In [7]:
# Access row 2 (third row) and assign [1, 1, 1]
n[2] = [1, 1, 1]

In [35]:
n

array([[ 1, 20,  3],
       [ 4,  5,  6],
       [ 1,  1,  1]])

Access columns in array

In [14]:
# Access the second column (column index 1) of all rows.
n[:, 1]

array([2, 5, 8])

In [13]:

# Reassign the entire third column (column index 2)
n[:, 2] = [0, 1, 2]

In [40]:
n

array([[ 1, 20,  0],
       [ 4,  5,  1],
       [ 1,  1,  2]])

## Randomly generated arrays


In [None]:
# numbers are actually pseudo-random. Setting a seed will ensure we get the same numbers each time we run the code. This is useful for reproducibility and testing.
np.random.seed(2)

# Create a 5 x 2 array of random numbers between 0 and 100 -- the multiplication by 100 is applied element-wise to scale the numbers to the desired range.
100 * np.random.rand(5, 2)

array([[43.59949021,  2.59262318],
       [54.96624779, 43.53223926],
       [42.03678021, 33.0334821 ],
       [20.4648634 , 61.92709664],
       [29.96546737, 26.68272751]])

In [15]:
np.random.seed(2)
# np.random.randn(5, 2) will create a 5 x 2 array of random numbers from a standard normal distribution (mean 0, standard deviation 1)
np.random.randn(5, 2)

array([[-0.41675785, -0.05626683],
       [-2.1361961 ,  1.64027081],
       [-1.79343559, -0.84174737],
       [ 0.50288142, -1.24528809],
       [-1.05795222, -0.90900761]])

In [17]:
np.random.seed(2)
# np.random.normal allows specifying mean (loc) and standard deviation (scale) of the distribution. Also takes a size argument, which is a tuple specifying the shape of the output array. Use when you need to reproduce a specific distribution.
np.random.normal(loc=0, scale=1, size=(5, 2))

array([[-0.41675785, -0.05626683],
       [-2.1361961 ,  1.64027081],
       [-1.79343559, -0.84174737],
       [ 0.50288142, -1.24528809],
       [-1.05795222, -0.90900761]])

In [18]:
np.random.seed(2)
# np.random.randint generates random integers between low (inclusive) and high (exclusive). Also takes a size argument. Can also accept a dtype argument to specify the type of integers (default is long integer).
np.random.randint(low=0, high=100, size=(5, 2))

array([[40, 15],
       [72, 22],
       [43, 82],
       [75,  7],
       [34, 49]])

## Element-wise operations


In [19]:
# Create a 1-d array with values between - and 4
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [20]:
# Add 1 to each element - different from python list which requires iteration over the list
a + 1

array([1, 2, 3, 4, 5])

In [21]:
# Can also chain operations:
a * 2

array([0, 2, 4, 6, 8])

In [22]:
a * 2 + 10

array([10, 12, 14, 16, 18])

In [24]:
(10 + (a * 2)) ** 2

array([100, 144, 196, 256, 324])

In [25]:

b = (10 + (a * 2)) ** 2 / 100

In [26]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

In [27]:
# Add two arrays element-wise
a + b

array([1.  , 2.44, 3.96, 5.56, 7.24])

In [68]:
a / b + 10

array([10.        , 10.69444444, 11.02040816, 11.171875  , 11.2345679 ])

## Comparison operations

In [70]:
a

array([0, 1, 2, 3, 4])

In [28]:
# Which elements are greater than or equal to 2?
a >= 2

array([False, False,  True,  True,  True])

In [29]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

In [30]:
# Which elements in a are greater than elements in b?
a > b

array([False, False,  True,  True,  True])

In [31]:
# Which elements of a are greater than elements in b? Use the boolean array to index into a. Returns the index of array a for which condition is true
a[a > b]

array([2, 3, 4])

In [37]:
indices_id = [2, 3, 4]
# b[indices_id]
b[[2,3,4]]

array([1.96, 2.56, 3.24])

## Summarizing operations

In [75]:
a

array([0, 1, 2, 3, 4])

In [79]:
a.std()

1.4142135623730951

In [82]:
n.min()

0

### Next

Linear algebra refresher