# Introduction To NumPy

## What is NumPy

`NumPy`, short for `Numerical Python`, is one of the most important package for numerical computing in `Python`. `Numpy` is designed for efficiency on large `array` of `data`.

## NumPy `array` vs Python `list`

The core component in `NumPy` is the `ndarray` or `multi-dimensional array`. In programming, `array` describes a collection of elements, similar to a `list`.

`NumPy arrays` are faster and more compact than `Python lists`. 

`NumPy arrays` uses less memory to store `data` and supports more `data types`as compared to Python. `NumPy` provides a `dtype`parameter to define the `data type` (`int`, `float` etc).

`NumPy` uses algorithms written in `C` that perform operations in nanoseconds rather than seconds.

The `NumPy` library takes advantage of a processor feature called `Single Instruction Multiple Data` (SIMD) to process data faster.

`NumPy` operations perform complex computations on entire `array` without the need for Python `for loops`.

## The `ndarray`

The fundamental object of `NumPy` is the `ndarray` which stands for `multi-dimensional array` which provides `vectorized` arithmetic operations. The word `n-dimensional` refers to the fact that `ndarrays` can have one or more dimensions. 

- The type of `items` in the `array` is specified by a separate `data-type object` parameter named `dtype`.

- The number of dimensions in an `array` is defined by its `shape`, which is a `tuple` of `n` non-negative integers that specify the size of each dimension.

## Import `NumPy`

In [3]:
import numpy as np

In [4]:
np.__version__

'1.20.3'

## NumPy `ndarrays` Object

To generate a new NumPy `ndarray` objects, we must call the `np.ndarray()` constructor.

### Creating a NumPy Array from a Range of Values with `np.arange`

Create an one dimension `ndarray` using `arange` function, which is similar to python's built-in `range` function.

In [5]:
# int range
a = np.arange(1, 15)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14]
<class 'numpy.ndarray'>
1
(14,)
int32
4


In [6]:
# float range
a = np.arange(1.0, 15.0)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
<class 'numpy.ndarray'>
1
(14,)
float64
8


In [7]:
# step range parameter
a = np.arange(1, 15, 2)
print(a)
print(type(a))
print(a.ndim)
print(a.shape)
print(a.dtype)
print(a.itemsize)

[ 1  3  5  7  9 11 13]
<class 'numpy.ndarray'>
1
(7,)
int32
4


### Creating a NumPy Array from Python List with `np.array`

Create a `ndarray` from a python `list`.

In [8]:
b = [i for i in range(15)]
print(b)
print(type(b))

c = np.array(b)
print(c)
print(type(c))
print(c.ndim)
print(c.shape)
print(c.dtype)
print(c.itemsize)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
<class 'list'>
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
<class 'numpy.ndarray'>
1
(15,)
int32
4


### Creating a NumPy Array of Zeros with `np.zeros`

The `np.zeros` function creates an `array` containing `n` number of `zeros`

In [9]:
# 1 Dimension np.arrays of zeros
One_dim_array_of_zeros = np.zeros(10)
print(One_dim_array_of_zeros)
print(type(One_dim_array_of_zeros))
print(One_dim_array_of_zeros.ndim)
print(One_dim_array_of_zeros.shape)
print(One_dim_array_of_zeros.dtype)
print(One_dim_array_of_zeros.itemsize)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
<class 'numpy.ndarray'>
1
(10,)
float64
8


In [10]:
# 2 Dimensions np.arrays of zeros
Two_dim_array_of_zeros = np.zeros((3, 5))
print(Two_dim_array_of_zeros)
print(type(Two_dim_array_of_zeros))
print(Two_dim_array_of_zeros.ndim)
print(Two_dim_array_of_zeros.shape)
print(Two_dim_array_of_zeros.dtype)
print(Two_dim_array_of_zeros.itemsize)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
<class 'numpy.ndarray'>
2
(3, 5)
float64
8


### Creating a NumPy Array of Ones with `np.ones`

In [11]:
# 1 Dimension np.arrays of ones
One_dim_array_of_ones = np.ones(10)
print(One_dim_array_of_ones)
print(type(One_dim_array_of_ones))
print(One_dim_array_of_ones.ndim)
print(One_dim_array_of_ones.shape)
print(One_dim_array_of_ones.dtype)
print(One_dim_array_of_ones.itemsize)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>
1
(10,)
float64
8


In [12]:
# 2 Dimensions np.arrays of zeros
Two_dim_array_of_ones = np.ones((3, 5))
print(Two_dim_array_of_ones)
print(type(Two_dim_array_of_ones))
print(Two_dim_array_of_ones.ndim)
print(Two_dim_array_of_ones.shape)
print(Two_dim_array_of_ones.dtype)
print(Two_dim_array_of_ones.itemsize)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
<class 'numpy.ndarray'>
2
(3, 5)
float64
8


### Creating a NumPy Array of randint values with `np.random.randint`

In [14]:
# 1 Dimension np.arrays of random int
One_dim_random_values = np.random.randint(1, 100, size=15)
print(One_dim_random_values)
print(type(One_dim_random_values))
print(One_dim_random_values.ndim)
print(One_dim_random_values.shape)
print(One_dim_random_values.dtype)
print(One_dim_random_values.itemsize)

[59 68 48 43 51 64 47 79 43 11 34 16 96 90 42]
<class 'numpy.ndarray'>
1
(15,)
int32
4


In [15]:
# 2 Dimension np.arrays of random int
two_dim_random_values = np.random.randint(1, 100, size=(3, 5))
print(two_dim_random_values)
print(two_dim_random_values)
print(type(two_dim_random_values))
print(two_dim_random_values.ndim)
print(two_dim_random_values.shape)
print(two_dim_random_values.dtype)
print(two_dim_random_values.itemsize)

[[92 93 57 41 34]
 [33 79 90 94 62]
 [40 60 29 74 23]]
[[92 93 57 41 34]
 [33 79 90 94 62]
 [40 60 29 74 23]]
<class 'numpy.ndarray'>
2
(3, 5)
int32
4


### Exploring Built-in `np.random` Methods

In [19]:
random_methods = [m for m in dir(np.random) if not m.startswith("_")]
print(random_methods)

['BitGenerator', 'Generator', 'MT19937', 'PCG64', 'Philox', 'RandomState', 'SFC64', 'SeedSequence', 'beta', 'binomial', 'bit_generator', 'bytes', 'chisquare', 'choice', 'default_rng', 'dirichlet', 'exponential', 'f', 'gamma', 'geometric', 'get_state', 'gumbel', 'hypergeometric', 'laplace', 'logistic', 'lognormal', 'logseries', 'mtrand', 'multinomial', 'multivariate_normal', 'negative_binomial', 'noncentral_chisquare', 'noncentral_f', 'normal', 'pareto', 'permutation', 'poisson', 'power', 'rand', 'randint', 'randn', 'random', 'random_integers', 'random_sample', 'ranf', 'rayleigh', 'sample', 'seed', 'set_state', 'shuffle', 'standard_cauchy', 'standard_exponential', 'standard_gamma', 'standard_normal', 'standard_t', 'test', 'triangular', 'uniform', 'vonmises', 'wald', 'weibull', 'zipf']


### The NumPy Array Data Type `dtype`

The `dtype` determines how the `data` is interpreted as being `floating point`, `integer`, `boolean` etc.

A NumPy `array` may contain only a single `data-type`.

- `'?' Bolean`
- `'b' byte`
- `'B' unsigned byte`
- `'i' integer`
- `'u' unsigned integer`
- `'f' floating point`
- `'c' complex floating point`
- `'m' timedelta`
- `'M' datetime`
- `'U' unicode string`
- `'V' raw data`

The possible suffixes for `int`, `float` are `1`, `2`, `4`, `8`, `16`.

In [12]:
d = np.array(list(range(15)), dtype='i4')
print(d)
print(d.dtype)

d = np.array(list(range(15)), dtype='i8')
print(d)
print(d.dtype)

e = np.array(list(range(15)), dtype='f4')
print(e)
print(e.dtype)

e = np.array(list(range(15)), dtype='f8')
print(e)
print(e.dtype)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int32
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int64
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float32
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float64


### Casting  an Array `dtype` with `astype`

In [13]:
d_int = np.array(list(range(15)), dtype='i4')
print(d_int)
print(d_int.dtype)

d_float = d_int.astype('f4')
print(d_float)
print(d_float.dtype)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
int32
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.]
float32


### NumPy `ndim`

`ndim`determines the dimensions of the input `ndarray`.

In [14]:
One_dim_array_of_zeros.ndim

1

In [15]:
Two_dim_array_of_zeros.ndim

2

### NumPy `itemsize`

determine the `size` of one array `item` in `byte`.

In [16]:
One_dim_array_of_zeros.itemsize

8

### NumPy `dtype`

determine the data `type` of the array items.

In [17]:
One_dim_array_of_zeros.dtype

dtype('float64')

### NumPy `shape`

determine the `shape` of the `ndarray`. `shape` return a `tuple`.

In [18]:
shape = One_dim_array_of_zeros.shape
print(shape)
print(type(shape))

(10,)
<class 'tuple'>


In [19]:
shape = Two_dim_array_of_zeros.shape
print(shape)
print(type(shape))

(3, 5)
<class 'tuple'>


### Exploring Built-in `ndarray`  Methods

In [20]:
ndarray_methods = [method for method in dir(a) if not method.startswith("__")]
print(ndarray_methods)

['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']


### NumPy `reshape`

Create a new shape to an `ndarray` without changing its `values`.

In [21]:
a = np.arange(15)
a = a.reshape(3, 5)
print(a)
print(a.ndim)
print(a.size)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
2
15


### NumPy `indexing`

In a one-dimensional `array` you can access the `value` by specifying the desired `index` in square brackets, just as with Python `list`.

In [22]:
one_dimension = np.array([1, 2, 3, 4, 5])
print(one_dimension)

print(one_dimension[2])

[1 2 3 4 5]
3


In a two-dimensional or multidimensional `array` you can access `values` using comma seperated tuple of indices.

In [23]:
two_dimension = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])
print(two_dimension)

print(two_dimension[1, 2])


[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
8


### NumPy `slicing`

The `values` of an `subarrays` can be accessed using slice notation, marked by the colon `:`.

In [24]:
v = np.arange(1, 16).reshape(3, 5)
print(v)
print("-------")
print(v[:2, :2])
print("-------")
print(v[0:, 0:1])
print("-------")
print(v[0:, 4:])

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
-------
[[1 2]
 [6 7]]
-------
[[ 1]
 [ 6]
 [11]]
-------
[[ 5]
 [10]
 [15]]


## NumPy Performance

In [25]:
numpy_array = np.arange(10**6)
print(numpy_array[:10])
print(type(numpy_array))

python_list = list(range(10**6))
print(python_list[:10])
print(type(python_list))

[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'list'>


In [26]:
%time for _ in range(10): numpy_array_double = numpy_array * 2

Wall time: 29 ms


In [27]:
print(numpy_array_double[-10:])

[1999980 1999982 1999984 1999986 1999988 1999990 1999992 1999994 1999996
 1999998]


In [28]:
%time numpy_array_double = np.append(numpy_array_double, 2000000)
print(numpy_array_double[10:])

Wall time: 3 ms
[     20      22      24 ... 1999996 1999998 2000000]


In [29]:
%time for _ in range(10): python_list_double = [i * 2 for i in python_list]

Wall time: 1.68 s


In [30]:
%time python_list_double.append(2000000)
print(python_list_double[-10:])

Wall time: 999 µs
[1999982, 1999984, 1999986, 1999988, 1999990, 1999992, 1999994, 1999996, 1999998, 2000000]


## `NumPy` Concepts

### What is Vectorization?

`Vectorization` is a technique of replacing explicit `for-loops` with `array expressions`, which in this case can be computed internally with a `low-level` language.

Vectorized operations in `NumPy` use highly optimized `C` and `Fortran` functions, making for cleaner and faster Python code.

https://en.wikipedia.org/wiki/Array_programming

#### Example of vectorized function

In [31]:
p = np.power(a, 2)
print(p)

[[  0   1   4   9  16]
 [ 25  36  49  64  81]
 [100 121 144 169 196]]


In [32]:
m = np.multiply(a, 2)
print(m)

[[ 0  2  4  6  8]
 [10 12 14 16 18]
 [20 22 24 26 28]]


In [33]:
s = np.sin(a)
print(s)

[[ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
 [-0.95892427 -0.2794155   0.6569866   0.98935825  0.41211849]
 [-0.54402111 -0.99999021 -0.53657292  0.42016704  0.99060736]]


In [34]:
c = np.cos(a)
print(c)

[[ 1.          0.54030231 -0.41614684 -0.9899925  -0.65364362]
 [ 0.28366219  0.96017029  0.75390225 -0.14550003 -0.91113026]
 [-0.83907153  0.0044257   0.84385396  0.90744678  0.13673722]]


### What is Broadcasting?

`Broadcasting` describes how `NumPy` operate on `arrays` with different `shapes` during arithmetic operations to perform a `vectorized` calculation between them.

`Machine learning` is one domain that can frequently take advantage of `vectorization` and `broadcasting`.

## Arithmetic with NumPy Arrays

Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [35]:
a = np.arange(1, 21).reshape((5, 4))
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [36]:
b = np.arange(1, 21).reshape((5, 4))
print(b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [37]:
a + b

array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [18, 20, 22, 24],
       [26, 28, 30, 32],
       [34, 36, 38, 40]])

In [38]:
a * b

array([[  1,   4,   9,  16],
       [ 25,  36,  49,  64],
       [ 81, 100, 121, 144],
       [169, 196, 225, 256],
       [289, 324, 361, 400]])

In [39]:
a / b

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [40]:
np.random.shuffle(a)
np.random.shuffle(b)

a > b

array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [False, False, False, False]])

### Exploring `NumPy` Built-in Methods

In [41]:
numpy_methods = [method for method in dir(np) if not method.startswith("_")]
print(numpy_methods)



### NumPy `ufuncs`

There are currently more than `60` universal functions defined in `numpy` on one or more types, covering a wide variety of operations. 


https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

### Other Subpackages

`numpy.fft` Fast Fourier Transform

`numpy.polynomial` Efficient Polynomials

`numpy.linalg` Linear Algebra

`numpy.math` C Standard library functions

`numpy.random` Random Number Generation

### Conclusion

`NumPy` provides a wide variety of functions capable of performing operations on `arrays` of data. Its use of `vectorization` makes these functions incredibly fast, when compared to the analogous computations performed in pure Python. 