# Numpy Lesson

## Introduction

Numpy, short for Numerical Python, is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. 

Many computational and data science packages use Numpy as the main building block. It is a fundamental library for scientific computing in Python.

Some features of Numpy:
- `ndarray`, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number generation, and Fourier transform capabilities.
- A C API for connecting Numpy with libraries written in C, C++, or FORTRAN.

The advantages of using Numpy:

- Numpy internally stores data in a contiguous block of memory, independent of other built-in Python objects. Numpy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences (e.g. lists).
- Numpy operations perform complex computations on entire arrays without the need for Python for loops, which can be slow for large sequences. This is called _vectorization_.

![numpy_vs_list](../assets/numpy_vs_python_list.png)

You can install Numpy by using `conda` or `pip`:

```bash
conda install numpy
```

```bash
pip install numpy
```

Then, you can import Numpy as follows:

In [None]:
import numpy as np

where np is a standard alias for numpy.

To give you an idea of the performance difference, consider a Numpy array of one million integers, and the equivalent Python list:

In [None]:
my_arr = np.arange(1000000)
my_list = list(range(1000000))
my_list

Let's multiply each sequence by 2, you can use the `%timeit` magic command to measure the execution time of the code:

In [None]:
%timeit my_arr2 = my_arr * 2

In [None]:
%timeit my_list2 = [x * 2 for x in my_list]

Numpy operations and algorithms are generally 10 to 100 times faster than their pure Python counterparts, and use significantly less memory.

## Numpy ndarray

Numpy's `ndarray`, or N-dimensional array, is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements. 

In [None]:
data = np.array([1.5, -0.1, 3])

In [None]:
data

In [None]:
#assignment - multiply each element in the array by 2.
arr = np.array([1, 2, 3, 4, 5])
arr1 = arr*2
arr1

Multiply all of the elements by 10.

In [None]:
data * 10

Add the corresponding values in each "cell" in the array.

In [None]:
data + data

In [None]:
data * data

> Practice the above array with different arithmetic operations: 
> 
> `-`, `/`, `**`, `%`, `//`.
>

### ndarray illustration

An ndarray is a multidimensional or n-dimensional array of fixed size with homogenous elements (i.e. all elements must be of the same type). Every array has a `shape`, a tuple indicating the size of each dimension, and a `dtype`, an object describing the data type of the array.

![ndarray](../assets/numpy_ndarray.png)

In [None]:
data.shape

In [None]:
data.dtype

The easiest way to create an array is to use the `array` function.

In [None]:
data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1)

arr1

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array.

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

arr2 = np.array(data2)

arr2

In [None]:
#data3 = 
# np.array()

In [None]:
arr2.dtype

In [None]:
arr2.shape

We can also check the number of dimensions.

In [None]:
arr2.ndim

Besides `array`, there are other functions for creating new arrays. We have seen `arange` above, which is similar to the built-in `range` function but returns an array instead of a list.

`ones` and `zeros` create arrays of 1s and 0s, respectively, with a given length or shape. `empty` creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape.

In [None]:
np.zeros(5)

In [None]:
np.zeros((3,6))

In [None]:
np.zeros((3,6,3))

> Create a new array with 3 dimensions using `ones`.

You can also explicitly specify the data type of the array.

In [None]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

arr1.dtype

In [None]:
arr2 = np.array([1, 2, 3], dtype=np.int32)

arr2.dtype

Data types provide a mapping directly onto an underlying disk or memory representation. The numerical data types are named the same way: a type name, like `float` or `int`, followed by a number indicating the number of bits per element. A standard double-precision floating point value (what's used under the hood in Python's `float` object) takes up 8 bytes or 64 bits. Thus, this type is known in Numpy as `float64`. See the following table for a list of the numerical data types.

| Data type | Type code | Description |
| --- | --- | --- |
| int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types |
| int16, uint16 | i2, u2 | Signed and unsigned 16-bit integer types |
| int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types |
| int64, uint64 | i8, u8 | Signed and unsigned 64-bit integer types |
| float16 | f2 | Half-precision floating point |
| float32 | f4 or f | Standard single-precision floating point. Compatible with C float |
| float64 | f8 or d | Standard double-precision floating point. Compatible with C double and Python float object |
| float128 | f16 or g | Extended-precision floating point |
| complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two 32, 64, or 128 floats, respectively |
| bool | ? | Boolean type storing True and False values |
| object | O | Python object type |
| string_ | S | Fixed-length ASCII string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10' |
| unicode_ | U | Fixed-length Unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10') |

You can explicitly convert or cast an array from one dtype to another using `astype` method.

In [None]:
arr = np.array([1, 2, 3, 4, 5])

arr.dtype

In [None]:
float_arr = arr.astype(np.float64)

float_arr

In [None]:
float_arr.dtype

If you cast some floating-point numbers to be of integer dtype, the decimal part will be truncated.

In [None]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

arr.astype(np.int32)

You can also convert strings representing numbers to numeric form.

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

numeric_strings.astype(float)

In [None]:
shape_float = np.array([3, 4], dtype=np.float64)
shape_float.astype(np.float32)


If you write `float` instead of `np.float64`, Numpy will guess the data type for you.

> Create an array with a shape of (3, 4) and a data type of `float64`. Then convert it to `float32`.

## Arithmetic with ndarrays

Arithmetic operations are applied as batch operations on arrays without any `for` loops. This is called _vectorization_. Any arithmetic operations between equal-size arrays applies the operation element-wise.

![vectorization](../assets/vectorization.png)

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

arr

In [None]:
arr * arr

In [None]:
arr - arr

Broadcasting is another powerful feature of Numpy. It describes how arithmetic works between arrays of different shapes. For example, you can just think of the smaller array (or scalar value) being replicated multiple times to match the shape of the larger array.

In [None]:
arr + np.array([1, 1, 1])

Here, `[1, 1, 1]` is stretched or broadcasted across the larger array `arr` so that it matches the shape.

In [None]:
arr1 = np.array([1, 2, 3, 4])
arr1

In [None]:
arr1 + 4

4 becomes [4, 4, 4, 4] beneath the hood, then arithmetic happens elementwise.

In [None]:
arr1 ** 2

In [None]:
1 / arr1

To find out more about broadcasting, check out the [official documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html).

Comparison between arrays of the same size yield boolean arrays.

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

arr2

In [None]:
arr2 > arr



## Indexing and slicing

Indexing and slicing allow you to select subsets of array data.

One-dimensional arrays are simple; on the surface they act similarly to Python lists.

In [22]:
arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Indexing to select a single element.

In [24]:
arr[5]


5

Slicing to select a range of elements.

In [25]:
#start from 5 and before 8
arr[5:8]

array([5, 6, 7])

You can also assign value to it, which will be propagated to the entire selection.

In [26]:
# 5,6,7 replace by 12
arr[5:8] = 12

arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array (in-place).

In [27]:
arr_slice = arr[5:8]

arr_slice

array([12, 12, 12])

In [28]:
arr = np.array([1,2,3,4,5])
copy = arr[1:4].copy()
copy


array([2, 3, 4])

In [32]:
#???
arr_slice[1] = 10
arr_slice

array([64, 10, 64])

The "bare" slice `[:]` will assign to all values in an array.

In [31]:
arr_slice[:] = 64

arr_slice

array([64, 64, 64])

In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays.

In [33]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

arr2d[1]

array([4, 5, 6])

You can index it "twice" to get individual elements. These two expressions are equivalent.

In [34]:
#1 - row index 2- col index
arr2d[1][2]

6

In [35]:
arr2d[1, 2]

6

For 2D array indexing the syntax is `arr2d[row_index, col_index]` or `arr2d[axis_0_index, axis_1_index]`. Think of axis 0 as the "rows" of the array and axis 1 as the "columns."

![2d_array_indexing](../assets/ndarray_axis_index.png)

To slice out the first two rows of the `arr2d` array, you can pass `[:2]` as the row index.

In [None]:
#First 2 rows
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

You can pass multiple slices just like you can pass multiple indexes:

In [37]:
#first 2 rows, Last 2 column
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

You can mix indexing and slicing.

In [38]:
#Row 1 and First 2 column
arr2d[1, :2]

array([4, 5])

Passing a slice with `:` means to select the entire axis. To select the first column.

In [None]:
# all rows and first column
arr2d[:, :1] # or arr2d[:, 0]

array([[1],
       [4],
       [7]])

In [40]:
# check the shape

arr2d[:, :1].shape

(3, 1)

To select the first row.

In [41]:
arr2d[:1, :] # or arr2d[0, :]

array([[1, 2, 3]])

In [42]:
# select the second row of the array.
arr2drow = arr2d[:2, :]
arr2drow

array([[1, 2, 3],
       [4, 5, 6]])

In [43]:
# check the shape

arr2d[:1, :].shape

(1, 3)

> Assign values such that the final array looks like the following:
>
> | 1 | 2 | 3 |
> | --- | --- | --- |
> | 4 | -1 | -1 |
> | 7 | 8 | 9 |


In [44]:
arr = np.arange(1,10).reshape(3,3)
arr[1,1:] = -1
arr

array([[ 1,  2,  3],
       [ 4, -1, -1],
       [ 7,  8,  9]])

In [None]:
arr3d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr3d[1,1:] =-1
arr3d

In [55]:
#Create a 2D numpy array of shape (5, 5) filled with the number 1.
#import numpy as np

arrS = np.ones((5, 5), dtype=int)
arrS


array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

## Boolean indexing

Let's consider an example where we have an array of names with duplicates, and an array of scores (for 2 subjects) that correspond to each name.

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
scores = np.array([[75, 80], [85, 90], [95, 100], [100, 77], [85, 92], [95, 80], [72, 80]])
scores

In [None]:
scores

If we want to select all the rows with the corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing `names` with the string 'Bob' yields a boolean array.

In [None]:
names == "Bob"

This boolean array can be passed when indexing the array.

In [None]:
scores[names == "Bob"]

You can mix boolean indexing with other slicing and indexing methods.

In [None]:
scores[names == "Bob", 1]

To select everything but 'Bob', you can either use `!=` or negate the condition using `~`.

In [None]:
names != "Bob"

In [None]:
~(names == "Bob")

In [None]:
scores[names != "Bob"]

`~` operator can be useful when you want to invert a boolean array referenced by a variable.

In [None]:
cond = names == "Bob"

cond

In [None]:
scores[~cond]

> Show the scores for `Joe`.

In [None]:
scores[names == "Joe"]

You can select two or more names by combining multiple boolean conditions. Use boolean arithmetic operators like `&` (and) and `|` (or).

In [None]:
mask = (names == "Bob") | (names == "Will")

mask

In [None]:
scores[mask]

In [None]:
scores > 80

You can also set the values based on these boolean arrays. For example, to set all scores less than 80 to 70:

In [None]:
#
scores[scores < 80] = 70
scores

In [None]:
scores

To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order.

In [None]:
arr = np.zeros((8, 4))

for i in range(8):
    arr[i] = i

arr

In [None]:
arr[[4, 3, 0, 6]]

Negative indices select rows from the end.

In [None]:
#start from the bottom
arr[[-3, -5, -7]]

## Reshaping and transposing arrays

Arrays have the `reshape` method to change the shape of a given array to a new shape that has the same number of elements. For example, you can reshape a 1D array to a 2D array with 2 rows and 3 columns.

In [None]:
arr = np.arange(15).reshape((3, 5))

arr

Arrays have the `transpose` method for rearranging data. For a 2D array, `transpose` will return a new view on the data with axes swapped.

In [None]:
arr.transpose()

`T` attribute is a shortcut for `transpose`.

In [None]:
arr.T

In [None]:

array_3d = np.arange(24).reshape(2,3,4)
array_3d

> Create an array with 3 dimensions using `arange` and `reshape`.

## Universal functions

A universal function, or `ufunc`, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [None]:
arr = np.arange(10)

arr

In [None]:
# calculate the square root of each element in the array

np.sqrt(arr)

In [None]:
# calculate the exponential of each element in the array
np.exp(arr)

These are referred to as unary ufuncs. Others, such as `add` or `maximum`, take 2 arrays (thus, binary ufuncs) and return a single array as the result.

In [None]:
x = np.array([3, 7, 15, 5, 12])
y = np.array([11, 2, 4, 6, 8])

np.maximum(x, y)

You can refer to the [Numpy documentation](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs) for a list of all available universal functions.

> Search for a ufunc that returns element-wise quotient and remainder simultaneously.
>
> Run it on x and y. Note that it will return a tuple of two arrays.

## Conditional Logic

If you want to evaluate all elements in an array based on a condition, you can use `np.where`, a vectorized version of the ternary expression `x if condition else y`.

Suppose we had a boolean array and two array of values:

In [None]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])

yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])

If we wanted to take a value from `xarr` whenever the corresponding value in `cond` is `True` otherwise take the value from `yarr`:

In [None]:
np.where(cond, xarr, yarr)

The second and third arguments to `numpy.where` don’t need to be arrays; one or both of them can be scalars. A typical use of `where` in data analysis is to produce a new array of values based on another array. 

## Array methods

You can generate a random array using `np.random` module. The `randn` function returns a sample (or samples) from the "standard normal" distribution. A standard normal distribution is a normal distribution with a mean of 0 and standard deviation of 1.

Here we generate a random 3x4 array of samples from the standard normal distribution.

In [58]:
arr = np.random.randn(3, 4)

arr

array([[-0.35839996,  0.18521252,  1.97710322,  1.77654441],
       [-0.85601591, -0.42577347, -1.93836402, -1.73262939],
       [ 0.89759041, -0.24191141, -1.01712853, -1.5806105 ]])

In [None]:
# average
arr.mean()

In [None]:
# you can also use universal function
np.mean(arr)

In [59]:
# sum

arr.sum()

-3.314382637034126

In [61]:
#calculate the sum of all the elements in each row.
arrA = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arrSum = np.sum(arrA,axis=1)
arrSum


array([ 6, 15, 24])

In [62]:
#calculate the average of all the elements.
arrA = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arrAvg = np.average(arrA,axis=1)
arrAvg


array([2., 5., 8.])

You can also provide an optional argument `axis` that specifies the axis along which the statistic is computed, resulting in an array with one fewer dimension.

In [None]:
arr.mean(axis=1)

In [None]:
arr.mean(axis=0)

`axis=1` means "compute across the columns," where `axis=0` means "compute down the rows."

Refer to the diagram again for the illustration on axes.

![ndarray](../assets/numpy_ndarray.png)

> Compute the sum across the columns of `arr`.

For boolean arrays, `any` tests whether one or more values in an array is `True`, while `all` checks if every value is `True`.

In [None]:
bools = np.array([False, False, True, False])

In [None]:
bools.any()

In [None]:
bools.all()

Like Python’s built-in list type, NumPy arrays can be sorted with the `sort` method. Note that this method sorts a data array _in-place_, meaning that the array contents are rearranged rather than a new array being created.

In [None]:
arr = np.random.randn(8)

arr

In [None]:
arr.sort()

In [None]:
arr

## Unique and Other Set Logic

You can use `unique` to return a sorted unique values of an array.

In [None]:
names = np.array(['Bob', 'Will', 'Joe', 'Bob', 'Will', 'Joe', 'Joe'])

np.unique(names)

In [None]:
np.unique(np.array([3, 3, 3, 2, 2, 1, 1, 4, 4]))

`in1d` tests membership of the values in one array in another, returning a boolean array.

In [None]:
np.in1d([2, 3, 6], [1, 2, 3, 4, 5])

Refer to the [official documentation](https://numpy.org/doc/stable/reference/routines.set.html) for more set operations.

> Search for a set function that finds the common values between two arrays.
>
> Run it on x and y arrays below.

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([3, 4, 5, 6, 7])
x[np.in1d(x,y)]

## Linear Algebra

Linear algebra operations, like matrix multiplication, decompositions, determinants, and other square matrix math, are an important part of many array libraries. 

Multiplying two two-dimensional arrays with `*` is an element-wise product, while matrix multiplications require either using the `dot` function or the `@` infix operator.

![matrix_multiplication](../assets/matrix_multiplication.png)

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])

x.dot(y)

In [None]:
# you can also use the @ operator

x @ y

In [None]:
# or the dot function

np.dot(x, y)

You can refer to the [official documentation](https://numpy.org/doc/stable/reference/routines.linalg.html) for more linear algebra operations.

> Search for a linalg function that computes the determinant of a matrix.
>
> Run it on the array below.

In [None]:
a = np.array([[1, 2], [3, 4]])
a