# Introduction To NumPy

## What is NumPy?

`NumPy`, short for `Numerical Python`, is one of the most important package for numerical computing in `Python`. `Numpy` is designed for efficiency on large `array` of `data`.

## NumPy `ndarray` vs Python `list` What is the difference?

The core component in `NumPy` is the `ndarray` or `multi-dimensional array`. The `NumPy ndarray` is a high-performance `multidimensional array object` (`ndarray`) designed specifically to perform math operations, linear algebra, and probability calculations.

Numpy `ndarray` data structures perform better in:

- Memory - Numpy `ndarray` uses less memory to store `data` and supports more `data types`as compared to Python.
- Performance - Numpy perform complex computations on entire `ndarray` without the need for Python `for loops`.
- Functionality - `SciPy` and `NumPy` have optimized functions such as linear algebra operations built in.

Python List data structures perform better in:
    
- `list` may contain items of different types, like `int`, `floats`, `string`, `object`.
- `list` methods `append` and `pop` change the list size dynamically, which make them run fast in `O(1)` time.

`NumPy` uses algorithms written in `C` that perform operations in nanoseconds rather than seconds.

The `NumPy` library takes advantage of a processor feature called `Single Instruction Multiple Data` (SIMD) to process data faster.

## The `ndarray`

The fundamental object of `NumPy` is the `ndarray` which stands for `multi-dimensional array` which provides `vectorized` arithmetic operations. The word `n-dimensional` refers to the fact that `ndarray` can have one or more dimensions. 

- The type of `values` in the `ndarray` is specified by a separate `data-type` parameter named `dtype`.

- The number of dimensions in an `ndarray` is defined by its `shape`, which is a `tuple` of `n` non-negative integers that specify the size of each dimension.

<img src="images/ndarray_shape.png" />

## Import `NumPy`

In [3]:
import numpy as np

In [4]:
# Check version
np.__version__

'1.20.3'

## NumPy `ndarray` Object

To generate a new NumPy `ndarray` objects, we must call the `np.ndarray()` constructor.

### Creating a NumPy `ndarray` with `np.arange`

Create a `ndarray` using `arange` function, which is similar to python's built-in `range` function.

In [22]:
# int range
a = np.arange(1, 15)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14]
<class 'numpy.ndarray'>
1
(14,)
int32
4


In [23]:
# float range
a = np.arange(1.0, 15.0)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
<class 'numpy.ndarray'>
1
(14,)
float64
8


In [24]:
# step range parameter
a = np.arange(1, 15, 2)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1  3  5  7  9 11 13]
<class 'numpy.ndarray'>
1
(7,)
int32
4


### Creating a NumPy `ndarray` from  a Python `list` with `np.array`

Create a `ndarray` from a python `list`.

In [8]:
b = [i for i in range(15)]
print(b)
print(type(b))

c = np.array(b)
print(c)
print(type(c))
print(c.ndim)
print(c.shape)
print(c.dtype)
print(c.itemsize)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
<class 'list'>
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
<class 'numpy.ndarray'>
1
(15,)
int32
4


### Creating a NumPy  `ndarray` with `np.zeros`

The `np.zeros` method creates an `ndarray` containing `n` number of `0`

In [15]:
# 1 Dimension np.arrays of zeros
One_dim_array_of_zeros = np.zeros(10)
print(One_dim_array_of_zeros)
print(type(One_dim_array_of_zeros))
print(One_dim_array_of_zeros.ndim)
print(One_dim_array_of_zeros.shape)
print(One_dim_array_of_zeros.dtype)
print(One_dim_array_of_zeros.itemsize)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
<class 'numpy.ndarray'>
1
(10,)
float64
8


In [16]:
# 2 Dimensions np.arrays of zeros
Two_dim_array_of_zeros = np.zeros((3, 5))
print(Two_dim_array_of_zeros)
print(type(Two_dim_array_of_zeros))
print(Two_dim_array_of_zeros.ndim)
print(Two_dim_array_of_zeros.shape)
print(Two_dim_array_of_zeros.dtype)
print(Two_dim_array_of_zeros.itemsize)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
<class 'numpy.ndarray'>
2
(3, 5)
float64
8


### Creating a NumPy `ndarray`  with `np.ones`

The `np.ones` method create an `ndarray` containing `n` number of `1`.

In [18]:
# 1 Dimension np.arrays of ones
One_dim_array_of_ones = np.ones(10)
print(One_dim_array_of_ones)
print(type(One_dim_array_of_ones))
print(One_dim_array_of_ones.ndim)
print(One_dim_array_of_ones.shape)
print(One_dim_array_of_ones.dtype)
print(One_dim_array_of_ones.itemsize)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>
1
(10,)
float64
8


In [19]:
# 2 Dimensions np.arrays of zeros
Two_dim_array_of_ones = np.ones((3, 5))
print(Two_dim_array_of_ones)
print(type(Two_dim_array_of_ones))
print(Two_dim_array_of_ones.ndim)
print(Two_dim_array_of_ones.shape)
print(Two_dim_array_of_ones.dtype)
print(Two_dim_array_of_ones.itemsize)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
<class 'numpy.ndarray'>
2
(3, 5)
float64
8


### Creating a NumPy `ndarray` with `np.random.randint`

The `np.random.randint` method create an `ndarray` filled with random integers.

In [20]:
# 1 Dimension np.arrays of random int
One_dim_random_values = np.random.randint(1, 100, size=15)
print(One_dim_random_values)
print(type(One_dim_random_values))
print(One_dim_random_values.ndim)
print(One_dim_random_values.shape)
print(One_dim_random_values.dtype)
print(One_dim_random_values.itemsize)

[36 72 30 74 50 39 12 87 85 54 94 59 55 94 49]
<class 'numpy.ndarray'>
1
(15,)
int32
4


In [21]:
# 2 Dimension np.arrays of random int
two_dim_random_values = np.random.randint(1, 100, size=(3, 5))
print(two_dim_random_values)
print(two_dim_random_values)
print(type(two_dim_random_values))
print(two_dim_random_values.ndim)
print(two_dim_random_values.shape)
print(two_dim_random_values.dtype)
print(two_dim_random_values.itemsize)

[[79 25  6 28 56]
 [36 87 57 11 49]
 [14 54 91 68 54]]
[[79 25  6 28 56]
 [36 87 57 11 49]
 [14 54 91 68 54]]
<class 'numpy.ndarray'>
2
(3, 5)
int32
4


### Exploring Built-ins `np.random` Methods

In [13]:
random_methods = [m for m in dir(np.random) if not m.startswith("_")]
print(random_methods)

['BitGenerator', 'Generator', 'MT19937', 'PCG64', 'Philox', 'RandomState', 'SFC64', 'SeedSequence', 'beta', 'binomial', 'bit_generator', 'bytes', 'chisquare', 'choice', 'default_rng', 'dirichlet', 'exponential', 'f', 'gamma', 'geometric', 'get_state', 'gumbel', 'hypergeometric', 'laplace', 'logistic', 'lognormal', 'logseries', 'mtrand', 'multinomial', 'multivariate_normal', 'negative_binomial', 'noncentral_chisquare', 'noncentral_f', 'normal', 'pareto', 'permutation', 'poisson', 'power', 'rand', 'randint', 'randn', 'random', 'random_integers', 'random_sample', 'ranf', 'rayleigh', 'sample', 'seed', 'set_state', 'shuffle', 'standard_cauchy', 'standard_exponential', 'standard_gamma', 'standard_normal', 'standard_t', 'test', 'triangular', 'uniform', 'vonmises', 'wald', 'weibull', 'zipf']


### The NumPy `ndarray` Data Types `dtype`

The `dtype` determines how the `data` is interpreted as being `floating point`, `integer`, `boolean` etc.

A NumPy `ndarray` may contain only a single `data-type`.

- `'?' Bolean`
- `'b' byte`
- `'B' unsigned byte`
- `'i' integer`
- `'u' unsigned integer`
- `'f' floating point`
- `'c' complex floating point`
- `'m' timedelta`
- `'M' datetime`
- `'U' unicode string`
- `'V' raw data`

The possible suffixes for `int`, `float` are `1`, `2`, `4`, `8`.

In [26]:
# Int 32 data type
dt_int32 = np.array(range(15), dtype='i4')
print(dt_int32)
print(dt_int32.dtype)

dt_int32 = np.array(range(15), dtype=np.int32)
print(dt_int32)
print(dt_int32.dtype)

# Int 64 data type
dt_int64 = np.array(range(15), dtype='i8')
print(dt_int64)
print(dt_int64.dtype)

# Int 64 data type
dt_int64 = np.array(range(15), dtype=np.int64)
print(dt_int64)
print(dt_int64.dtype)

# float 32 data type
dt_float32 = np.array(range(15), dtype='f4')
print(dt_float32)
print(dt_float32.dtype)

dt_float32 = np.array(range(15), dtype=np.float32)
print(dt_float32)
print(dt_float32.dtype)

# float 64 data type
dt_float64 = np.array(list(range(15)), dtype='f8')
print(dt_float64)
print(dt_float64.dtype)

dt_float64 = np.array(range(15), dtype=np.float64)
print(dt_float64)
print(dt_float64.dtype)

# boolean data type
dt_bool = np.array(range(15), dtype='?')
print(dt_bool)
print(dt_bool.dtype)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int32
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int32
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int64
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int64
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float32
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float32
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float64
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float64
[False  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]
bool


### Casting  an `ndarray` `dtype` with `astype`

`astype` copy of the `ndarray` and cast to a specified type.

In [27]:
d_int = np.array(list(range(15)), dtype='i4')
print(d_int)
print(d_int.dtype)

d_float = d_int.astype('f4')
print(d_float)
print(d_float.dtype)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int32
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float32


### NumPy `ndim`

`ndim`determines the dimensions of the input `ndarray`.

In [28]:
One_dim_array_of_zeros.ndim

1

In [29]:
Two_dim_array_of_zeros.ndim

2

### NumPy `itemsize`

`itemsize` determine the `size` of one element in `byte`.

In [32]:
itemsize = One_dim_array_of_zeros.itemsize
print(f"The size in bytes of each element: {itemsize}")

The size in bytes of each element: 8


### NumPy `dtype`

`dtype` determine the data `type` of the array items.

In [33]:
One_dim_array_of_zeros.dtype

dtype('float64')

### NumPy `shape`

`shape` determine the `shape` of the `ndarray`. `shape` return a `tuple`.

In [34]:
shape = One_dim_array_of_zeros.shape
print(shape)
print(type(shape))

(10,)
<class 'tuple'>


In [35]:
shape = Two_dim_array_of_zeros.shape
print(shape)
print(type(shape))

(3, 5)
<class 'tuple'>


### Exploring Built-in `ndarray`  Methods

In [36]:
ndarray_methods = [method for method in dir(a) if not method.startswith("__")]
print(ndarray_methods)

['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']


### NumPy `reshape`

`reshape` create a new shape to an `ndarray` without changing its `values`.

In [37]:
a = np.arange(15)
a = a.reshape(3, 5)
print(a)
print(a.ndim)
print(a.size)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
2
15


### NumPy `indexing`

In a 1D `ndarray` you can access the `value` by specifying the desired `index` in square brackets, just like a Python `list`.

In [41]:
one_dimension = np.array([1, 2, 3, 4, 5])
print(one_dimension)

print(one_dimension[2])

[1 2 3 4 5]
3


In a 2D or multidimensional `ndarray` you can access `values` using comma seperated tuple of indices.

In [42]:
two_dimension = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])
print(two_dimension)

print(two_dimension[1, 2])


[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
8


### NumPy `slicing`

The `values` of an `subarrays` can be accessed using slice notation, marked by the colon `:`.

In [43]:
v = np.arange(1, 16).reshape(3, 5)
print(v)
print("-------")
print(v[:2, :2])
print("-------")
print(v[0:, 0:1])
print("-------")
print(v[0:, 4:])

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
-------
[[1 2]
 [6 7]]
-------
[[ 1]
 [ 6]
 [11]]
-------
[[ 5]
 [10]
 [15]]


## Add new values to a NumPy `ndarray`

### Using `ndarray.append` method

The NumPy `np.append` values to the end of an `ndarray`.

In [44]:
e = np.arange(10)
print(e)

[0 1 2 3 4 5 6 7 8 9]


In [45]:
np.append(e, 10)
print(e)

[0 1 2 3 4 5 6 7 8 9]


### Using `np.insert` method

The NumPy `np.insert` values along the given axis before the given indices.

In [29]:
f = np.array(range(1, 10))
print(f)

[1 2 3 4 5 6 7 8 9]


`np.insert` return a new `ndarray`, it doesn't modify the `ndarray` passed as argument.

In [30]:
new_f = np.insert(f, 0, 0)
print(f) # unmodified ndarray after np.insert
print(new_f) # a new ndarray is assign to new_f

[1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]


In [31]:
new_f = np.insert(new_f, 10, 10)
print(new_f)

[ 0  1  2  3  4  5  6  7  8  9 10]


## NumPy Performance

In [32]:
numpy_array = np.arange(10**6)
print(numpy_array[:10])
print(type(numpy_array))

python_list = list(range(10**6))
print(python_list[:10])
print(type(python_list))

[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'list'>


In [33]:
%time for _ in range(10): numpy_array_double = numpy_array * 2

Wall time: 37 ms


In [34]:
print(numpy_array_double[-10:])

[1999980 1999982 1999984 1999986 1999988 1999990 1999992 1999994 1999996
 1999998]


In [35]:
%time numpy_array_double = np.append(numpy_array_double, 2000000)
print(numpy_array_double[10:])

Wall time: 4 ms
[     20      22      24 ... 1999996 1999998 2000000]


In [36]:
%time for _ in range(10): python_list_double = [i * 2 for i in python_list]

Wall time: 1.58 s


In [37]:
%time python_list_double.append(2000000)
print(python_list_double[-10:])

Wall time: 0 ns
[1999982, 1999984, 1999986, 1999988, 1999990, 1999992, 1999994, 1999996, 1999998, 2000000]


## Memory Consumption

### Python `list`

In [52]:
from sys import getsizeof

lst_empty = list()
lst = list(range(5))

size_of_list_object = getsizeof(lst_empty)
size_of_single_item = getsizeof(lst[0])
size_of_all_items = len(lst) * size_of_single_item
total_list_size = size_of_list_object + size_of_all_items

print(f"Object type : {type(lst_empty)}")
print(f"Size of an empty list (init): {size_of_list_object} bytes")
print(f"Size of an single item: {size_of_single_item} bytes")
print(f"Size of all the elements: {size_of_all_items} bytes")
print(f"Total size of list, including list object: {total_list_size} bytes")

Object type : <class 'list'>
Size of an empty list (init): 56 bytes
Size of an single item: 24 bytes
Size of all the elements: 120 bytes
Total size of list, including list object: 176 bytes


### NumPy `array`

In [53]:
nd_array_empty = np.array([])
nd_array = np.arange(5)

size_of_list_object = getsizeof(nd_array_empty)
size_of_single_item = getsizeof(nd_array[0])
size_of_all_items = nd_array.size * size_of_single_item
total_list_size = size_of_list_object + size_of_all_items

print(f"Object type : {type(nd_array_empty)}")
print(f"Size of an empty ndarray (init): {size_of_list_object} bytes")
print(f"Size of an single item: {size_of_single_item} bytes")
print(f"Size of all the elements: {size_of_all_items} bytes")
print(f"Total size of list, including list object: {total_list_size} bytes")

Object type : <class 'numpy.ndarray'>
Size of an empty ndarray (init): 104 bytes
Size of an single item: 28 bytes
Size of all the elements: 140 bytes
Total size of list, including list object: 244 bytes


## `NumPy` Concepts

### What is Vectorization?

`Vectorization` is a technique of replacing explicit `for-loops` with `array expressions`, which in this case can be computed internally with a `low-level` language.

Vectorized operations in `NumPy` use highly optimized `C` and `Fortran` functions, making for cleaner and faster Python code.

https://en.wikipedia.org/wiki/Array_programming

#### Example of vectorized function

In [None]:
p = np.power(a, 2)
print(p)

In [None]:
m = np.multiply(a, 2)
print(m)

In [None]:
s = np.sin(a)
print(s)

In [None]:
c = np.cos(a)
print(c)

### What is Broadcasting?

`Broadcasting` describes how `NumPy` operate on `arrays` with different `shapes` during arithmetic operations to perform a `vectorized` calculation between them.

`Machine learning` is one domain that can frequently take advantage of `vectorization` and `broadcasting`.

## Arithmetic with NumPy Arrays

Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [None]:
a = np.arange(1, 21).reshape((5, 4))
print(a)

In [None]:
b = np.arange(1, 21).reshape((5, 4))
print(b)

In [None]:
a + b

In [None]:
a * b

In [None]:
a / b

In [None]:
np.random.shuffle(a)
np.random.shuffle(b)

a > b

### Exploring `NumPy` Built-in Methods

In [None]:
numpy_methods = [method for method in dir(np) if not method.startswith("_")]
print(numpy_methods)

### NumPy `ufuncs`

There are currently more than `60` universal functions defined in `numpy` on one or more types, covering a wide variety of operations. 


https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

### Other Subpackages

`numpy.fft` Fast Fourier Transform

`numpy.polynomial` Efficient Polynomials

`numpy.linalg` Linear Algebra

`numpy.math` C Standard library functions

`numpy.random` Random Number Generation

### Conclusion

`NumPy` provides a wide variety of functions capable of performing operations on `arrays` of data. Its use of `vectorization` makes these functions incredibly fast, when compared to the analogous computations performed in pure Python. 