## Introduction to NumPy

**NumPy** stands for **'Numerical Python'**. It is famous for efficient **arrays**. NumPy enables fast computation in Python as the underlying implementation of NumPy is in C. Thus, it is faster than Python Lists. NumPy stores the elements as `ndarray` objects.It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. NumPy allows the efficient operations on the data structures often used in machine learning:vectors, matrices, and tensors. 

## Installing NumPy

The installation is simple. Use **pip** to install the package as `pip install numpy`. To install a specific version, the command has to be modified as `pip install numpy==x.x.x` where x.x.x is the version you want. For example, `pip install numpy==1.22.3` will install the **1.22.3** version of NumPy.

## Basic Usuage

### Importing Numpy

To import the numpy on any python projects, use `import numpy` command. After the import, we simply can check the numpy version installed using the command `numpy.version.version`.

In [1]:
import numpy

In [2]:
numpy.version.version

'1.22.3'

Alternatively, the package can be imported and aliased for easy usuage. For example, numpy can also be imported as `import numpy as np`. Now, each time we call the methods of numpy, we don't need to repeat `numpy.method_name` and can be replaced with `np.method_name`.

In [3]:
import numpy as np

In [4]:
np.version.version

'1.22.3'

### Creating NumPy Arrays

To create arrays in numpy, we use `numpy.array()`. The syntax is as follows:  

`numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)`

#### Creating One Dimensional Array

In [5]:
vector_a = np.array([1,2,3])
vector_a

array([1, 2, 3])

In [6]:
type(vector_a)

numpy.ndarray

In [7]:
vector_a.dtype

dtype('int64')

In [8]:
vector_b = np.array([1,2,'a'])
vector_b

array(['1', '2', 'a'], dtype='<U21')

In [9]:
vector_c = np.array([1,2,3.2])
vector_c

array([1. , 2. , 3.2])

In [10]:
vector_c.dtype

dtype('float64')

In [11]:
vector_d = np.array(['a','b','c'], dtype=str)
vector_d

array(['a', 'b', 'c'], dtype='<U1')

In [12]:
vector_d.dtype

dtype('<U1')

#### Check the shape of the arrray

In [13]:
vector_d.shape

(3,)

Integers at every index tells about the number of elements the corresponding dimension has.

#### Creating Multidimensional Array

In [14]:
multi_a = np.array([[1,2,3],[3,4,5]])
multi_a

array([[1, 2, 3],
       [3, 4, 5]])

In [15]:
multi_a.shape

(2, 3)

In [18]:
multi_b = np.array([[1,2,3],[3,4,5],[4,5,6]])
multi_b

array([[1, 2, 3],
       [3, 4, 5],
       [4, 5, 6]])

In [19]:
multi_b.shape

(3, 3)

In [20]:
multi_c = np.array([[1,2,3],[3,4,5],[4,5,6]])
multi_c

array([[1, 2, 3],
       [3, 4, 5],
       [4, 5, 6]])

In [21]:
multi_c.shape

(3, 3)

In [22]:
multi_d = np.array([1, 2, 3], ndmin=5)
multi_d

array([[[[[1, 2, 3]]]]])

In [23]:
multi_d.shape

(1, 1, 1, 1, 3)

#### Using Numpy Matrices

In [24]:
mat_a = numpy.mat([[1,2],[3,4]])
mat_a

matrix([[1, 2],
        [3, 4]])

In [25]:
mat_a.size

4

In [26]:
mat_b = np.mat('1 2; 3 4')
mat_b

matrix([[1, 2],
        [3, 4]])

In [29]:
mat_c = np.mat('1 2; 3 4; 5 6')
mat_c

matrix([[1, 2],
        [3, 4],
        [5, 6]])

In [30]:
mat_c.ndim

2

#### Creating arrays with Arange
The `arange` function is similar to Python's `range` function. This can be used to create numpy arrays.

In [27]:
arange_a = np.arange(20)
arange_a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [28]:
arange_b = np.arange(2,20)
arange_b

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19])

In [29]:
arange_c = np.arange(2,20,2)
arange_c

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

#### Creating arrays of ones, zeros and empty
Array of ones and zeros of any dimension can be created from numpy.  
`zeros(dim)` will return a np.array of `dim` dimensions initialised with 0. Note that `dim` should be a `tuple`. 

`zeros_like(array)` will return a np.array of same dimensions as of `array` initialised with zeros. 

The functionality is same with `ones` , and `ones_like` except of course the initialization is done with ones. 

The functionality is same with `empty`, and `empty_like` which will create numpy arrays but won't initialise it with anything (hence, faster) By default, all the values in the array will have garbage values. 


In [33]:
zeros_demo = np.zeros(3,2)
zeros_demo

TypeError: Cannot interpret '2' as a data type

In [34]:
zeros_a = np.zeros((3))
zeros_a

array([0., 0., 0.])

In [35]:
zeros_b = np.zeros((3,4))
zeros_b

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [36]:
zeros_c = np.zeros((4,4), dtype=int)
zeros_c

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [37]:
zeros_like_a = np.zeros_like(zeros_a)
zeros_like_a

array([0., 0., 0.])

In [38]:
zeros_like_c = np.zeros_like(zeros_c)
zeros_like_c

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [39]:
ones_a = np.ones((7,3))
ones_a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

#### Task: Create an array of shape(4,3) with all the elements '7' and dtype 'int' using the above concept

In [36]:
task = 7*np.ones((4,3), dtype = int)
task

array([[7, 7, 7],
       [7, 7, 7],
       [7, 7, 7],
       [7, 7, 7]])

In [37]:
empty_a = np.empty((4,5))
empty_a

array([[ 1.37538943e-316,  0.00000000e+000,  1.37874196e-316,
         8.65931655e+003,  6.90863336e-310],
       [ 1.37878860e-316, -3.09844609e+026,  6.90863336e-310,
         1.37883524e-316,  1.21325445e-165],
       [ 6.90863336e-310,  1.37888188e-316, -6.99222808e+101,
         6.90863339e-310,  1.37892852e-316],
       [ 6.76590670e+041,  6.90863336e-310,  6.90863336e-310,
         2.93873763e+294,  6.90863336e-310]])

In [38]:
empty_like_a = np.empty_like(empty_a)
empty_like_a

array([[ 1.37538943e-316,  0.00000000e+000,  1.37874196e-316,
         8.65931655e+003,  6.90863336e-310],
       [ 1.37878860e-316, -3.09844609e+026,  6.90863336e-310,
         1.37883524e-316,  1.21325445e-165],
       [ 6.90863336e-310,  1.37888188e-316, -6.99222808e+101,
         6.90863339e-310,  1.37892852e-316],
       [ 6.76590670e+041,  6.90863336e-310,  6.90863336e-310,
         2.93873763e+294,  6.90863336e-310]])

### Accessing and Slicing NumPy Arrays
To access the elements in an array, we simply pass the index of desired element. 

In [39]:
access_a = np.array([1,2,3,5])
access_a

array([1, 2, 3, 5])

In [40]:
access_a[3]

5

In [41]:
access_b = np.array([[1,2,3],[3,4,5]])
access_b

array([[1, 2, 3],
       [3, 4, 5]])

In [42]:
access_b[1]

array([3, 4, 5])

Slicing refers to extracting the array elements from one given index to another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

In [43]:
slice_a = np.arange(10)
slice_a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [44]:
slice_a[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [45]:
slice_a[:4]

array([0, 1, 2, 3])

In [46]:
slice_a[1:7]

array([1, 2, 3, 4, 5, 6])

In [47]:
slice_a[::2]

array([0, 2, 4, 6, 8])

Slicing NumPy arrays is similar to that of Python lists. One main distinction in Python list and NumPy array is that the slice __is not the copy, but the original array. Hence, if any operations on the slice will be reflected in the original array.__

In [48]:
slice_b = np.arange(20)
slice_b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [49]:
slice_b[10:15] = 7

In [50]:
slice_b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  7,  7,  7,  7,  7, 15, 16,
       17, 18, 19])

To avoid above scenario, we can use `copy()`

In [51]:
slice_c = np.arange(20)
slice_c

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [52]:
copy = slice_c.copy()
copy[10:15] = 7

In [53]:
copy

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  7,  7,  7,  7,  7, 15, 16,
       17, 18, 19])

In [54]:
slice_c

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [40]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [55]:
slice_multi_dim_a = np.array([[1,2,3],[4,5,6]])
slice_multi_dim_a

array([[1, 2, 3],
       [4, 5, 6]])

In [56]:
slice_multi_dim_a[:]

array([[1, 2, 3],
       [4, 5, 6]])

In [57]:
slice_multi_dim_a[:,0]

array([1, 4])

In [58]:
slice_multi_dim_a[:,0:2]

array([[1, 2],
       [4, 5]])

In [59]:
slice_multi_dim_b = np.array([[1,2,3],[4,5,6]])
slice_multi_dim_b

array([[1, 2, 3],
       [4, 5, 6]])

In [60]:
slice_multi_dim_b[0,:]

array([1, 2, 3])

#### Boolean Indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition.

In [61]:
bool_a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (bool_a > 2)

In [62]:
bool_idx

array([[False, False],
       [ True,  True],
       [ True,  True]])

In [63]:
bool_a[bool_idx]

array([3, 4, 5, 6])

In [64]:
bool_b = np.array(['Python', 'Java', 'Ruby', 'Rust'])
bool_b == 'Python'

array([ True, False, False, False])

To work with multiple boolean conditions, we use **|**, **&** operators but not the Python operators **or** and **and**.

In [65]:
(bool_b == 'Python') | (bool_b == 'Ruby')

array([ True, False,  True, False])

#### Masking and Indexing

In [66]:
mask_array_a = np.linspace(5, 50, 24, dtype=int).reshape(4, -1)

In [67]:
np.info(np.linspace)

 linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None,
          axis=0)

Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

.. versionchanged:: 1.16.0
    Non-scalar `start` and `stop` are now supported.

.. versionchanged:: 1.20.0
    Values are rounded towards ``-inf`` instead of ``0`` when an
    integer ``dtype`` is specified. The old behavior can
    still be obtained with ``np.linspace(start, stop, num).astype(int)``

Parameters
----------
start : array_like
    The starting value of the sequence.
stop : array_like
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples

In [68]:
mask_array_a

array([[ 5,  6,  8, 10, 12, 14],
       [16, 18, 20, 22, 24, 26],
       [28, 30, 32, 34, 36, 38],
       [40, 42, 44, 46, 48, 50]])

In [69]:
mask = mask_array_a % 2 == 0

In [70]:
mask

array([[False,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True]])

In [71]:
masked_array = mask_array_a[mask]
masked_array

array([ 6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,
       40, 42, 44, 46, 48, 50])

### Array Mathematics

#### Arithmetic Operations

In [72]:
x = np.array([1, 2, 3])
y = np.array([-1, -2, -3])

output_1 = np.add(x, y) #element wise addition of array x and y
output_2 = x + y
assert np.array_equal(output_1, output_2)
print(f'out1: {output_1} and out2: {output_2}')

out1: [0 0 0] and out2: [0 0 0]


In [73]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1])

output_1 = np.add(x, y) #adding y to each element of x
output_2 = x + y 
print(f'out1: {output_1} and out2: {output_2}')

out1: [2 3 4 5 6] and out2: [2 3 4 5 6]


*Note: Subtraction, Multiplication and Division can be carried out in the same fashion using `np.subtract`, `np.multiply` and `np.divide`.  
For Floor division `np.floor_division` can be used.*

In [74]:
x = np.array([1, 2, .2])

output_1 = np.reciprocal(x)
output_2 = 1/x
assert np.array_equal(output_1, output_2)
print(f'out1: {output_1} and out2: {output_2}')

out1: [1.  0.5 5. ] and out2: [1.  0.5 5. ]


#### Trigonometric Operations

Calculation of sine, cosine and tangent, element-wise.

In [75]:
x = np.array([0., 1., 30, 90, 120, 150, 180])
print("sine:", np.sin(x))
print("cosine:", np.cos(x))
print("tangent:", np.tan(x))

sine: [ 0.          0.84147098 -0.98803162  0.89399666  0.58061118 -0.71487643
 -0.80115264]
cosine: [ 1.          0.54030231  0.15425145 -0.44807362  0.81418097  0.69925081
 -0.59846007]
tangent: [ 0.          1.55740772 -6.4053312  -1.99520041  0.71312301 -1.02234624
  1.33869021]


Calculation of inverse sine, inverse cosine, and inverse tangent, element-wise.

In [76]:
x = np.array([-1, 0, 1])
print("sine:", np.arcsin(x))
print("cosine:", np.arccos(x))
print("tangent:", np.arctan(x))

sine: [-1.57079633  0.          1.57079633]
cosine: [3.14159265 1.57079633 0.        ]
tangent: [-0.78539816  0.          0.78539816]


Convert angles from radians to degrees.

In [77]:
x = np.array([-np.pi, -np.pi/2, np.pi/2, np.pi])

output_1 = np.degrees(x)
output_2 = np.rad2deg(x)
assert np.array_equiv(output_1, output_2)
print(f'out1: {output_1} and out2: {output_2}')

out1: [-180.  -90.   90.  180.] and out2: [-180.  -90.   90.  180.]


Convert angles from degrees to radians.

In [78]:
x = np.array([-180.,  -90.,   90.,  180.])

output_1 = np.radians(x)
output_2 = np.deg2rad(x)
assert np.array_equiv(output_1, output_2)
print(f'out1: {output_1} and out2: {output_2}')

out1: [-3.14159265 -1.57079633  1.57079633  3.14159265] and out2: [-3.14159265 -1.57079633  1.57079633  3.14159265]


#### Statistics and Linear Algebra

In [79]:
x = np.array([[1, 2, 3, 4],[5, 6, 7, 8]])
x

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [80]:
x.T

array([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8]])

In [81]:
sum_a = x.sum()
sum_a

36

In [82]:
sum_across_col = x.sum(axis=0)
sum_across_col

array([ 6,  8, 10, 12])

In [83]:
sum_across_row = x.sum(axis=1)
sum_across_row

array([10, 26])

In [84]:
mean_a = x.mean()
mean_a

4.5

In [85]:
mean_across_col = x.mean(axis=0)
mean_across_col

array([3., 4., 5., 6.])

In [86]:
mean_across_row = x.mean(axis=1)
mean_across_row

array([2.5, 6.5])

In [87]:
array_a = np.array([[1, 2, 3, 4, 5],[1, 3, 4, 6, 7]])
array_b = np.array([[2, 4, 6, 8, 10], [1, 3, 5, 7, 9]])

In [88]:
array_a

array([[1, 2, 3, 4, 5],
       [1, 3, 4, 6, 7]])

In [89]:
array_a * 3

array([[ 3,  6,  9, 12, 15],
       [ 3,  9, 12, 18, 21]])

In [90]:
array_a.dot(array_b.T)

array([[110,  95],
       [156, 135]])

### Comparing the speed of Numpy vs Python loops and lists

In [9]:
from datetime import datetime
import numpy as np

In [10]:
def calculate_sum_from_numpy(n):
    a = np.arange(n) ** 2
    b = np.arange(n) ** 3
    c = a + b
    return c

In [12]:
def calculate_sum_from_list_loops(n):
    a = list(range(n))
    b = list(range(n))
    c = []
    for i in range(len(a)):
        a[i] = i ** 2
        b[i] = i ** 3
        c.append(a[i] + b[i])
    return c

In [13]:
start = datetime.now()
c = calculate_sum_from_list_loops(100000)
delta = datetime.now() - start
print("The last 2 elements of the sum", c[-2:])
print("Python Loop and List elapsed time in microseconds", delta)

The last 2 elements of the sum [999950000799996, 999980000100000]
Python Loop and List elapsed time in microseconds 0:00:00.031631


In [14]:
start_numpy = datetime.now()
c = calculate_sum_from_numpy(100000)
delta_numpy = datetime.now() - start_numpy
print("The last 2 elements of the sum", c[-2:])
print("Numpy  elapsed time in microseconds", delta_numpy)

The last 2 elements of the sum [999950000799996 999980000100000]
Numpy  elapsed time in microseconds 0:00:00.004323


In [2]:
import numpy as np

In [3]:
a = np.array([1,2,3,4,5])

In [7]:
b = a.astype('float')

In [8]:
b

array([1., 2., 3., 4., 5.])

In [5]:
dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__class_getitem__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__o