# Machine Learning Zoomcamp

Adapted version of the original numpy notebook [07-numpy.ipynb](https://github.com/joweyel/machine-learning-zoomcamp/blob/master/01-intro/notebooks/07-numpy.ipynb)
- In this version the shapes of numpy arrays are explitely specified

## 1.7 Introduction to NumPy


Plan:

* Creating arrays
* Multi-dimensional arrays
* Randomly generated arrays
* Element-wise operations
    * Comparison operations
    * Logical operations
* Summarizing operations

In [1]:
import numpy as np
print(np.__version__)

1.26.0


In [2]:
np

<module 'numpy' from '/home/userl/miniconda3/envs/ml-zoomcamp/lib/python3.9/site-packages/numpy/__init__.py'>

## Creating arrays


In [3]:
zeros = np.zeros(10)
print(zeros.shape) # 1D Array
zeros


(10,)


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [4]:
ones = np.ones(10)
print(ones.shape) # 1D Array
ones

(10,)


array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [5]:
full = np.full(10, 2.5)
full.shape # 1D Array
print(full)


[2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]


In [6]:
a = np.array([1, 2, 3, 5, 7, 12])
print(a.shape) # 1D Array
a

(6,)


array([ 1,  2,  3,  5,  7, 12])

Changing a single element of the array `a`. 

In [7]:
a[2] = 10

In [8]:
a

array([ 1,  2, 10,  5,  7, 12])

Creating a range of numbers from $3$ to $10$ with default step-size $1$ (can be changed).

In [9]:
rng = np.arange(3, 10)
print(rng.shape)
rng


(7,)


array([3, 4, 5, 6, 7, 8, 9])

Discretization of interval into equidistant points.

In [10]:
lnsp = np.linspace(0, 100, 11)
print(lnsp.shape)
lnsp

(11,)


array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

## Multi-dimensional arrays


In [11]:
zeros = np.zeros((5, 2))
print(zeros.shape) # 2D Array / 2D Matrix
zeros 

(5, 2)


array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [12]:
n = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print(n.shape) # 2D Array
n

(3, 3)


array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Changing single value in 2D array requires 2 indices in the fowllowing 2 ways:
- `n[i, j] = newVal`
- `n[i][j] = newVal`

In [13]:
n[0, 1] = 20

In [14]:
n

array([[ 1, 20,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

It is possible to change full rows / columns of the numpy array. For this only the row-index or column-index are required.


`Important`: The dimension of the matrix-row and the new row must have the same dimension.



In [15]:
# inserting a row as numpy array and as list is possible
n[2] = np.array([1, 1, 1])

n[0] = [2, 4, 8]


In [16]:
n

array([[2, 4, 8],
       [4, 5, 6],
       [1, 1, 1]])

Example for column-replacement
- The `:` operator in the matrix-indices here means, that all rows are chosen
- The `2` in the matrix-indices means that the 2nd row is chosen
- `Conclusion`: the 3rd value of every row $\Leftrightarrow$ the 3rd column

In [17]:
n[:, 2] = [0, 1, 2]

In [18]:
n

array([[2, 4, 0],
       [4, 5, 1],
       [1, 1, 2]])

## Randomly generated arrays


In [19]:
# Fixes random number generator to be reproducible
np.random.seed(2)

# 5 rows and 2 columns of uniform distributed values in [0, 1)
rand = 100 * np.random.rand(5, 2) 

print(rand.shape) # 2D Array
rand


(5, 2)


array([[43.59949021,  2.59262318],
       [54.96624779, 43.53223926],
       [42.03678021, 33.0334821 ],
       [20.4648634 , 61.92709664],
       [29.96546737, 26.68272751]])

In [20]:
np.random.seed(2)
# 2D Array of 5 rows and 2 columns of normal distributed values
randn = np.random.randn(5, 2)
print(randn.shape)
randn

(5, 2)


array([[-0.41675785, -0.05626683],
       [-2.1361961 ,  1.64027081],
       [-1.79343559, -0.84174737],
       [ 0.50288142, -1.24528809],
       [-1.05795222, -0.90900761]])

In [21]:
np.random.seed(2)

# Get random integer in {0, 1, ..., 100} in every cell of created numpy array
randint = np.random.randint(low=0, high=100, size=(5, 2))
print(randint.shape)
randint

(5, 2)


array([[40, 15],
       [72, 22],
       [43, 82],
       [75,  7],
       [34, 49]])

## Element-wise operations


Standard math-operators like for example `+`, `-`, `*`, `/` can directly be used with numpy datastructures.

- **Important to know:**
    - `Scalar & Vector/Matrix`: Scalars are broadcasted to numpy arrays
    - `Vector/Matrix & Vector/Matrix`: Both numpy arrays must have the same dimension

In [22]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [23]:
b = (10 + (a * 2)) ** 2 / 100

In [24]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

In [25]:
a / b + 10

array([10.        , 10.69444444, 11.02040816, 11.171875  , 11.2345679 ])

## Comparison operations
- Also element-wise
- returns `bool`-values

In [26]:
a

array([0, 1, 2, 3, 4])

In [27]:
a >= 2

array([False, False,  True,  True,  True])

In [28]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

In [29]:
a > b

array([False, False,  True,  True,  True])

In [30]:
# obtaining subset of numpy array based on a condition
a[a > b]

array([2, 3, 4])

## Summarizing operations

In [31]:
a

array([0, 1, 2, 3, 4])

**Standard Deviation**

In [32]:
a.std()

1.4142135623730951

**min(·)**

In [33]:
n.min()

0

### Next

Linear algebra refresher