# NumPy

NumPy is the most basic package for using Python to do scientific computing. It is the foundation for most of the other scientific libraries. The `ndarray` is the heart of this library (N-Dimensional Array). This notebook is mostly about the data structure `ndarray`.

In [None]:
# It is a convention to import numpy as 'np'
import numpy as np

# ndarray
There are several differences between a Python `list` and a NumPy `ndarray`.

- `ndarray` is homogenous which means that a single array can contain elements of a specific type only.
- NumPy stores data in a contiguous block of memory, independent of any Python object.
- NumPy is implemented in `C`, has less overhead, and uses less memory.
- Fast vectorized operations, usually 10 to 100 times faster than normal Python loops.

## Initializing ndarray
- `array` function takes a Python `list`, `ndarray`, or any sequence like object and returns a new `ndarray`. Remember that it **copies** the element of the given `list` or `ndarray`.
- `asarray` is same as `array` but does not copy the input if it is already a `ndarray`. Rather it just returns the memory of the same input.
- `arange` is like the `range` function of Python. It generates numbers between given range.
- NumPy will try to infer a suitable data type for the array from the given data. However, you can change that using the `dtype` parameter.

In [None]:
array1d = np.array([1, 2, 3.5, 4, 5])
array2d = np.asarray([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
], dtype=np.float32)
array = np.arange(10, dtype=np.int16)

print('array1d (vector)\n', array1d)
print('array2d (matrix)\n', array2d)
print('array\n', array)

Difference between `array` and `asarray` functions of initializing a new `ndarray` is that when the input array is already an `ndarray`, `asarray` will just return the memory address of the input array.

In [None]:
array_2 = np.array(array)
array_3 = np.asarray(array)

# array_3 was constructed using the np.asarray function. Since the
# input array was already a ndarray, np.asarray just assigned the memory
# location of the input to array_3 variable. That's why array and array_3 
# have same memory address. But, array_2 has different one because it was 
# created using np.array function and it always copies the input and returns 
# a fresh array.
print(id(array))
print(id(array_2))
print(id(array_3))

In [None]:
# now, if we change anything in array_3, it will be also reflected on array
# as both share the same memory location. That's why we should avoid asarray.
array_3[0] = 1000
print('array3\n', array_3)
print('array\n', array)

We can compare the time it takes to perform any operation on Python `list` and NumPy `ndarray`. The `%timeit` calculate the time needed to perform any expression. Here, we square every element of a $1000$ element Python `list` and NumPy `ndarray`, and compare their time. 

In [None]:
python_list = [5] * 1000
np_array = np.arange(1000)

%timeit np_array_squared = np_array ** 2
%timeit python_list_squared = [x ** 2 for x in python_list]

There are some other convenient functions to create `ndarray` of different shapes and sizes.

In [None]:
# creates ndarray with 5 rows and 5 columns with all elements
# initialized to zero.
np.zeros((5, 5))

In [None]:
# similar to previous function, it also creates a ndarray of
# given dimension and initialize them to 1.
np.ones((5, 5))

Each `ndarray` has a `shape` property that returns a tuple containing the shape of the array. It also has a `dtype` property that returns the type of the data the `ndarray` contains. Finally, you can also use the `ndim` property to get the number of dimension of the `ndarray`.

In [None]:
print(array1d.shape)
print(array2d.shape)

In [None]:
print(array1d.dtype)
print(array2d.dtype)

In [None]:
print(array1d.ndim)
print(array2d.ndim)

Other convenient functions to create a `ndarray` includes:
- `ones_like`, `zeros_like`
- `empty`, `empty_like`
- `full`, `full_like`
- `eye`
- `identity`

It’s not safe to assume that `numpy.empty` will return an array of all zeros. This function returns uninitialized memory and thus may contain nonzero garbage values. You should use this function only if you intend to populate the new array with data.

In [None]:
# Creates a new array of the given size but does not populate them with
# any value.
empty_array = np.empty((5, 5))
empty_array

In [None]:
# Creates a ndarray of the given size and fills them with the
# given value.
filled_array = np.full((5, 5), 1)
filled_array

In [None]:
# Returns a NxN identity matrix where the leading diagonal elements
# have the value of 1 and rests have 0.
np.identity(5)

In [None]:
# Creates a ndarray that has the shape of the empty_array created earlier
# and fills all the elements with 1.
np.ones_like(empty_array)

In [None]:
# Creates a ndarray that has similar shape to array1d but
# fills each element with the integer 5.
np.full_like(array1d, 5)

## Changing Data Types

- We can use the `astype` function to change between data types.
- Calling `astype` always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.
- If casting were to fail for some reason (like a string that cannot be converted to float64), a `ValueError` will be raised. 

In [None]:
# different data types available in numpy
np.sctypes

In [None]:
# Checks the inheritance hierarchy of int64
np.int64.mro()

We can use the `astype` function to cast any `ndarray` into another data type.

In [None]:
filled_array.astype(np.float64)

In [None]:
filled_array.astype(np.object_)

In [None]:
filled_array.astype(np.bool_)

If you cast some floating-point numbers to be of integer data type, the decimal part will be truncated.

In [None]:
arr = np.array([3.99, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.astype(np.int32)

If you have an array of strings representing numbers, you can use astype to convert them to numeric form. Be cautious when using the `numpy.string_` type, as string data in NumPy is fixed size and may truncate input without warning. 

In [None]:
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
numeric_strings.astype(np.float32)

## Arithmetic With `ndarray`

Arrays are important because they enable you to express batch operations on data without writing any for loops. `NumPy` users call this vectorization. Arithmetic operation in `NumPy` is vectorized. It means that there is no need to write loops. For two similar sized `ndarray`s, any operation between them will be always elementwise.

In [None]:
array

In [None]:
array2d

In [None]:
array2 = np.arange(15, dtype=np.float64)
array2

In [None]:
array + 5

In [None]:
array2d - 1

In [None]:
array * 2

In [None]:
array ** 2

In [None]:
1 / array2d

In [None]:
array4 = np.arange(10)
print(array4)
print(array)

In [None]:
array + array4

In [None]:
array - array4

In [None]:
array * array4

Comparison between similar sized array yields boolean arrays.

In [None]:
array > array4

In [None]:
array < array4

In [None]:
array == array4

## Broadcasting

Broadcasting defines how arithmetic operations between 2 differently sized arrays occur. For example, the most simple broadcasting occurs when we add a scalar value to a `ndarray`.

In [None]:
array = np.arange(10)
array

Below, the scalar value `10` has been broadcast to all the element of the `array`.

In [None]:
array + 10

Two arrays are compatible for broadcasting if for each trailing dimension (i.e., starting from the end) the axis lengths match or if either of the lengths is 1. Broadcasting is then performed over the missing or length 1 dimensions.

In [None]:
array1 = np.random.randn(4, 4)
print(array1)
print(array1.shape)

In [None]:
array2 = np.random.randn(4, 1)
print(array2)
print(array2.shape)

In [None]:
array3 = np.random.randn(1, 4)
print(array3)
print(array3.shape)

In [None]:
array1 + array2

In [None]:
array1 - array3

In [None]:
array4 = np.random.randn(3, 5, 5)
array5 = np.random.randn(3, 1, 1)

array6 = array4 + array5
print(array6.shape)

## Indexing & Slicing

NumPy array indexing is a deep topic, as there are many ways you may want to select a subset of your data or individual elements.

In [None]:
array2d

In [None]:
array2d[2, 1]

In [None]:
array2d[0]

In [None]:
array2d[0, 0:2]

In [None]:
array2d[:, 2]

In [None]:
array2d[1:, 1:]

Sliced array returns a memory not a copy. So, be extra carefull while working with a sliced array.

In [None]:
sliced = array2d[0, :]
sliced

In [None]:
sliced[0] = 100
sliced

In [None]:
array2d

If you want a copy of a slice of an `ndarray` instead of a view, you will need to explicitly copy the array—for example, `array2d[0, :].copy()`.

In [None]:
sliced = array2d[0, :].copy()
sliced

In [None]:
sliced[0] = 500

In [None]:
print(array2d)
print(sliced)

### Assignments

As you can see, if you assign a scalar value to a slice, as in `array2d[0:1, 0:3] = 12`, the value is propagated (or broadcast henceforth) to the entire selection.

In [None]:
array2d[0:1, 0:2] = [5, 5]
array2d

#### Boolean Indexing

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names

In [None]:
data = np.random.randn(7, 4)
data

In [None]:
idx = (names == 'Bob')
idx

In [None]:
names[idx]

In [None]:
data[idx]

In [None]:
idx = (names != 'Bob')
idx

In [None]:
names[idx]

In [None]:
mask = (names == 'Bob') | (names == 'Will')
mask

In [None]:
names[mask]

In [None]:
a = np.arange(10)
a

In [None]:
a > 5

In [None]:
a[a > 5]

#### Facny Indexing

In [None]:
data

In [None]:
data[0, 0]

In [None]:
data[0][0]

In [None]:
data[[0, 0, 1, 2, 3], [0, 1, 1, 2, 3]]

### Reshape, Transpose, & Swap Axis

In [None]:
array = np.arange(10)
array

In [None]:
array.shape

In [None]:
array.ndim

In [None]:
array.reshape(2, 5)

In [None]:
array.reshape(5, 2)

In [None]:
array.reshape(2, -1)

In [None]:
array.reshape(-1, 10)

In [None]:
array2d

In [None]:
array2d.T

In [None]:
array2d.swapaxes(1, 0)

### Functions

In [None]:
array1d = np.arange(10)
print(array1d)

In [None]:
array2d = np.random.randn(5, 6)
print(array2d)

In [None]:
np.sqrt(array1d)

In [None]:
np.sum(array1d)

In [None]:
np.sum(array2d, axis=0)

In [None]:
np.mean(array2d, axis=1)

In [None]:
np.var(array2d, axis=1)

In [None]:
np.std(array2d, axis=1)

In [None]:
np.quantile(array2d, .50, axis=0)

In [None]:
np.min(array2d, axis=0)

In [None]:
np.max(array2d, axis=1)

In [None]:
array1d.argmax()

In [None]:
array1d.max()

In [None]:
array2d.argmax(axis=0)

In [None]:
array1d.argmin()

In [None]:
np.cumsum(array1d)

In [None]:
x = np.array([2, 3, 4, 5])
np.cumprod(x)

In [None]:
array1d

In [None]:
array1d > 5

In [None]:
(array1d > 5).sum()

### Linear Algebra

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
x

In [None]:
y = np.array([[6., 23.], [-1, 7], [8, 9]])
y

In [None]:
# Matrix Multiplication
x.dot(y)

In [None]:
# The @ operator is a shorthand for matrix multiplication
x @ y

In [None]:
# The multiplication operator * yields elementwise multiplication
x * np.full_like(x, 2)

The `numpy.linalg` module contains all the industry standard functions for manipulating vectors and matrices. You can explore the available functions [here](https://numpy.org/doc/stable/reference/routines.linalg.html).

In [None]:
from numpy import linalg

x = np.random.randn(5, 5)
x_inv = linalg.inv(x)
x

In [None]:
(x @ x_inv).astype(np.int16)

In [None]:
linalg.det(x)

### File I/O

`np.save` and `np.load` are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension `.npy`

In [None]:
x

In [None]:
y

In [None]:
np.save('./Data/matrix.npy', x)

In [None]:
loaded_data = np.load('./Data/matrix.npy')
loaded_data

In [None]:
np.savez('./Data/matrices.npz', matrix1 = x, matrix2 = y)

In [None]:
loaded_data = np.load('./Data/matrices.npz')

In [None]:
loaded_data['matrix1']

In [None]:
loaded_data['matrix2']

# Reference

> Chapter 4, McKinney, Wes. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.", 2012.