# Intro To NumPy
- numpy is Python library for fast array computing (as fast as C and Fortran) and used in every field of science and engineering
- offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more
- foundation of scientific Python and PyData ecosystems such as:
    - Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science packages
- the heart of NumPy is **ndarray**, a homogenous n-dimensional array object, with methods to efficiently operate on it
- [Beginners Guide](https://numpy.org/devdocs/user/absolute_beginners.html)
- [NumPy Fundamentals](https://numpy.org/devdocs/user/basics.html)

## Installation
- can use conda or pip

```bash
conda config --env --add channels conda-forge
conda install numpy
```

```
pip install numpy
```

## import NumPy
- must import numpy library to use in Python script; typical usage is:

In [1]:
import numpy as np

In [3]:
print(np.__version__)

1.19.1


In [2]:
array = np.arange(6)

In [3]:
array.shape

(6,)

In [4]:
array

array([0, 1, 2, 3, 4, 5])

## Difference between a Python list and a NumPy array
- NumPy array data has same type (homogenous)
- provides enourmous speed on mathematical operation that are meant to be performed on arrays
- Python list can contain different data types within a single list (heterogenous)
    - much slower and inefficienet in operations

## NumPy array
- central data structure of the NumPy library
- grid of elments that can be indexed in various ways
- the elements are of the same type, referred to as the array **dtype**
- the **rank** of the array is the number of dimensions
- the **shape** of the array is a tuple of integers giving the size of the array along each dimension
- can initialize NumPy arrays from Python lists

In [5]:
a = np.array([1, 2, 3, 4, 5, 6])

In [6]:
b = np.array([[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]])

In [8]:
b.shape

(3, 4)

In [7]:
# accessing np array is similar to Python list using 0-based indices
print(a[0])

1


In [10]:
print(b)

[[  1   2   3   4]
 [ 10  20  30  40]
 [100 200 300 400]]


In [9]:
print(b[2][0])

100


### Types of array
- **1-D** array is also called **vector**
    - no difference between row and column vectors
- **2-D** array is also called **matrix**
- **3-d** and higher dimensional arrays are also called **tensor**

### Attributes of an array
- array is usually a fixed-size container of items of the same type and size
- the number of dimensions and items in an array is defined byt its shape
- the shape is a tuple that specify the sizes of each dimension
- NumPy dimensions are called **axes**
- the *b* NumPy **ndarray** is a 2-d matrix
- the *b* array has 2 axes
- the first axis (row) has length of 3 and the second axis (column) has a length of 4

In [11]:
b

array([[  1,   2,   3,   4],
       [ 10,  20,  30,  40],
       [100, 200, 300, 400]])

## Creating basic array
- various ways; primary is by using **np.array()**

In [16]:
a = np.array([1, 2, 3])

In [17]:
a

array([1, 2, 3])

In [18]:
# create and initialize elements with 0s
a = np.zeros(4)

In [19]:
a

array([0., 0., 0., 0.])

In [20]:
# create an initialize elements with 1s
a = np.ones(5)

In [21]:
a

array([1., 1., 1., 1., 1.])

In [22]:
# create an empty array with random values; make sure to fill the array with actual elements
a = np.empty(2)

In [23]:
a

array([2.05833592e-312, 2.33419537e-312])

In [24]:
# use arange(start, stop, step)
np.arange(2, 9, 2)

array([2, 4, 6, 8])

In [25]:
# create an array with values that are spaced linearly in a specified interval
np.linspace(0, 10, num=5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [27]:
# specify datatype; default is np.float64
np.ones(5, dtype=np.int64)

array([1, 1, 1, 1, 1])

## Adding, removing, and sorting elements
- https://numpy.org/devdocs/reference/generated/numpy.sort.html#numpy.sort
- `np.sort(a, axis=-1, kind=None, order=None)` -  array a to be sorted and return the sorted ndarray
    - axis : default-1 sorts along the last axis
    - kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default is quicksort
    - order: str or list of str where str is field name or list of field names

In [29]:
a = np.array([3, 1, 2, 4])

In [30]:
a.sort()

In [31]:
a

array([1, 2, 3, 4])

In [33]:
b = np.array([5, 6, 7, 8])

In [34]:
np.concatenate((a, b))

array([1, 2, 3, 4, 5, 6, 7, 8])

In [37]:
np.concatenate((a, b), axis=0)

array([1, 2, 3, 4, 5, 6, 7, 8])

In [38]:
c = np.array([7, 8, 9, 10])

In [39]:
np.concatenate((a, b, c))

array([ 1,  2,  3,  4,  5,  6,  7,  8,  7,  8,  9, 10])

In [45]:
# concatenate 2-d array
matrix = np.concatenate(([a], [b], [c]))

In [46]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 7,  8,  9, 10]])

## know the shape and size of array
- ndarray.shape, ndarray.size, ndarray.ndim

In [47]:
matrix.shape

(3, 4)

In [48]:
matrix.size
# product of the elements of array's shape

12

In [49]:
matrix.ndim
# number of axes or dimensions

2

## Indexing and slicing
- NumPy arrays can be sliced the same way as Python lists

In [50]:
data = np.array([1, 2, 3])

In [51]:
data[1]

2

In [53]:
data[1:]

array([2, 3])

In [54]:
data[-1]

3

In [55]:
# slice array with certain conditions
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

In [66]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [57]:
# print values in the array that are less than 5 as a 1-d array
print(a[a < 5])

[1 2 3 4]


In [64]:
# select numbers that are equal to or greater than 5; use that condition to index an array
# keeps the original dimension of the array
five_up = a >=5

In [65]:
five_up

array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [67]:
# select elements that satisfiy two conditions using & and | operators
c = a[(a>2) &(a<11)]

In [68]:
c

array([ 3,  4,  5,  6,  7,  8,  9, 10])

## basic operations on arrays
- `+` - add two arrays' corresponding elements
- `-` - subtract one array from another's corresponding elements
- `*` - multiply one array by another's corresponding elements
- `/` - divide one array by another's corresponding elements

In [69]:
data = np.array([1, 2])
ones = np.ones(2, dtype=int)

In [71]:
data

array([1, 2])

In [72]:
ones

array([1, 1])

In [70]:
data + ones

array([2, 3])

In [73]:
data - ones

array([0, 1])

In [74]:
data / ones

array([1., 2.])

In [75]:
data.sum()

3

In [101]:
# you specifiy the axis on 2-d array
b = np.array([[1, 1], [0.5, 0.5]])

In [102]:
# sum the rows
b.sum(axis=0)

array([1.5, 1.5])

In [103]:
# sum the columns
b.sum(axis=1)

array([2., 1.])

In [104]:
b.min()

0.5

In [105]:
b.max()

1.0

In [106]:
b.sum()

3.0

In [107]:
# find min on each column
b.min(axis=0)

array([0.5, 0.5])

In [108]:
# find min on each row
b.min(axis=1)

array([1. , 0.5])

## Broadcasting
- an operation between a vector and a scalar applies to all the elements in vector

In [85]:
data = np.array([1.0, 2.0, 3.0])

In [86]:
data * 1.6

array([1.6, 3.2, 4.8])

In [87]:
data + 1.1

array([2.1, 3.1, 4.1])

In [88]:
data / 2

array([0.5, 1. , 1.5])

In [89]:
data - 1

array([0., 1., 2.])

## Transposing and reshaping a matrix

In [111]:
data = np.arange(1, 7, 1)

In [112]:
data

array([1, 2, 3, 4, 5, 6])

In [119]:
# 2x3 matrix
X = data.reshape(2, 3)

In [120]:
X

array([[1, 2, 3],
       [4, 5, 6]])

In [117]:
# 3x2 matrix
data.reshape(3, 2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [121]:
X.transpose()

array([[1, 4],
       [2, 5],
       [3, 6]])

In [122]:
# flatten n-d array to 1-d array
X.flatten()

array([1, 2, 3, 4, 5, 6])

## mathematical formulas
- MeanSquareError = $\frac{1}{n}\sum_{i=1}^{n}(Y\_prediction_i - Y_i)^2$

In [123]:
predictions = np.ones(3)
labels = np.arange(1, 4)

In [128]:
print(predictions, labels)

[1. 1. 1.] [1 2 3]


In [126]:
error = 1/len(predictions)*np.sum(np.square(predictions-labels))

In [127]:
print(f'supervised ML error= {error}')

supervised ML error= 1.6666666666666665
