# NumPy

NumPy is the work-horse of numerical and scientific computations in python. Its the de-fact and go to library for modern ML/DL tasks.

We will discuss basic and most essential numpy operations/usage for ML tasks and model building

In [2]:
import numpy as np

In [2]:
print(np.__version__)

1.19.2


## Creating ndarray

In [6]:
# 'ndarray' (n-dimentional array) is the most fundamental data-structure in numpy
arr = np.array([1, 2, 3, 4, 5]) # Created using a pythin list of int
print(arr) # looks like python list but note that there is no comma when printed.
print(type(arr))  
print(arr.dtype)  

[1 2 3 4 5]
<class 'numpy.ndarray'>
int64


In [11]:
# As shown above the numpy arrays are of type `ndarray` 
# requires its elements to be of same data-type (int64) above
# if we try mix data-types then it'd 'upcast' the datatype
# Created using a pythin list of mix type but all converted to string 
arr = np.array([1, 2, 'hello', 4.0, 5]) 
print(arr)
print(arr.dtype)

['1' '2' 'hello' '4.0' '5']
<U21


In [13]:
# another example of upcast
arr = np.array([1, 2, 3, 4.0, 5]) # All int except one float
print(arr)
print(arr.dtype)

[1. 2. 3. 4. 5.]
float64


In [12]:
# explicitly set the data type of the resulting array,
arr = np.array([1, 2, 3, 4, 5], dtype='float32') # Created using a pythin list with explit type as float
print(arr)
print(type(arr))  
print(arr.dtype)

[1. 2. 3. 4. 5.]
<class 'numpy.ndarray'>
float32


In [16]:
# A nested list would create multi-dimentional array
md_arr = np.array([[1, 2,3], [4, 5, 6], [7, 8, 9]])
print(md_arr)
print(md_arr.ndim)  # dimention 
print(md_arr.shape) # number of elements (size) in each dimention of the array

[[1 2 3]
 [4 5 6]
 [7 8 9]]
2
(3, 3)


In [17]:
# quik `zeros` intialised array of desired shape as a place holder
z_arr = np.zeros(10, dtype=int)
print(z_arr)

[0 0 0 0 0 0 0 0 0 0]


In [21]:
# quik `ones` intialised array of desired shape as a place holder
o_arr = np.ones((5, 2), dtype=float)
print(o_arr)
print(o_arr.dtype)

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
float64


In [22]:
# quik `user defined value` intialised array of desired shape 
f_arr = np.full((2, 4), 2.333)
print(f_arr)

[[2.333 2.333 2.333 2.333]
 [2.333 2.333 2.333 2.333]]


In [32]:
# quick `uninitialised` empty array of desired shape
e_arr = np.empty((2, 5))
print(e_arr)

[[5.e-324 5.e-324 5.e-324 5.e-324 5.e-324]
 [5.e-324 5.e-324 5.e-324 5.e-324 5.e-324]]


In [34]:
# quick `identity matrix` (all zeros except diagnal)
i_arr = np.eye(4) # only one dimention is required because its always a square matrix
print(i_arr)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [24]:
# quick array using a ramge
r_arr = np.arange(0, 20, 3)  # looks similar to pyton range but this one not a generator, actually creates the array
print(r_arr)

[ 0  3  6  9 12 15 18]


In [27]:
# creating array with values evenly spaced between start and end value range
ls_arr = np.linspace(0, 10, 20)
print(ls_arr)
print(ls_arr[1] - ls_arr[0])  # even spacing between numbers
print(ls_arr[5] - ls_arr[4])

[ 0.          0.52631579  1.05263158  1.57894737  2.10526316  2.63157895
  3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368
  6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842
  9.47368421 10.        ]
0.5263157894736842
0.5263157894736841


In [39]:
# Create array with random values between 0 and 1 drawn from uniform distribution (what are properties ????)
r_array = np.random.random((3, 4))  # range is implicit between 0 and 1 (uniform ??)
print(r_array)

[[0.8607811  0.37286783 0.39088372 0.3822406 ]
 [0.24945776 0.02530805 0.8052618  0.12220215]
 [0.74086759 0.67743261 0.80215903 0.25980809]]


In [43]:
# Create array with random values between 0 and 1 drawn from normal distribution (what are properties ????)
n_array = np.random.normal(0, 3, (3, 4))  # (what does normal do here ???)
print(n_array)
n_array.mean()

[[ 0.07901206  4.56892336  2.296798    4.84769034]
 [ 1.11590889  2.72336096 -0.42421936  2.88766413]
 [-2.25109723  0.6815135   1.6462751   2.90453473]]


1.7563637052977878

In [44]:
# Create array with random iniegers between 0 and 1
ri_arr = np.random.randint(0, 10, (2, 4))
print(ri_arr)

[[3 8 3 5]
 [0 8 3 4]]


## Vectorized Operations

NumPy support vectorized operations. In nutshell this means that we avoid loops in python (loop are in complied C code). This has massive performance gain, one of th primary reason for NumPy being so popular i.e. write in python (enjoy!) but get 'C' like computation performance.

### Performance check

In [8]:
# big array
big_array = np.random.rand(1000000)

In [10]:
%%timeit
# Let try to sum using python sum function
sum(big_array)  # it took 258 milli seconds

258 ms ± 20.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
%%timeit
# Let try to sum using python loop
n = 0
for i in range(1000000):  # in took 81.9 milli seconds, 3x faster that sum
    n += i

81.9 ms ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
%%timeit 
# Let try to sum using numpy sum function 
np.sum(big_array) # 585 micro seconds 150x faster than python loop

585 µs ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Operations and functions

All functions are operations are vectorized

In [15]:
# random floats between 0 and 1
a = np.random.random((3, 4))
b = np.random.random((3, 4))
print(a)
print()
print(b)

[[0.64069855 0.07055229 0.02104335 0.84929808]
 [0.831861   0.33351103 0.22312071 0.90730051]
 [0.22165777 0.26176534 0.90284218 0.57197157]]

[[0.52710636 0.85154614 0.10975757 0.34636116]
 [0.53343638 0.47743996 0.49190189 0.22230756]
 [0.96606617 0.63587878 0.11569593 0.89204096]]


In [17]:
# we can +, - * / by scalar in one shot - element wise - (no loops)
print(a * 2)
print()
print(b + 10)

[[1.28139709 0.14110457 0.0420867  1.69859617]
 [1.663722   0.66702205 0.44624143 1.81460101]
 [0.44331554 0.52353069 1.80568436 1.14394314]]

[[10.52710636 10.85154614 10.10975757 10.34636116]
 [10.53343638 10.47743996 10.49190189 10.22230756]
 [10.96606617 10.63587878 10.11569593 10.89204096]]


In [19]:
# operate over two arrays +, -, * , / - element wise - (no loop)
print(a - b)
print()
print(a / b)

[[ 0.11359218 -0.78099386 -0.08871423  0.50293692]
 [ 0.29842462 -0.14392893 -0.26878117  0.68499295]
 [-0.7444084  -0.37411344  0.78714626 -0.32006939]]

[[1.21550145 0.08285198 0.1917257  2.45205921]
 [1.55943808 0.69854025 0.45358784 4.08128495]
 [0.22944367 0.41165919 7.80357795 0.64119429]]


In [21]:
# or apply any standard math function - element wise - (no loop)
print(np.sin(a)) # trig functions, all available in np module
print()
print(np.cos(b))
print()
print(np.log(a)) # log function
print()
print(np.exp(b)) # exponential function 

[[0.5977556  0.07049377 0.02104179 0.75081697]
 [0.73918604 0.3273626  0.22127405 0.78784406]
 [0.21984714 0.25878616 0.78509047 0.54129087]]

[[0.86426629 0.65882077 0.99398268 0.94061424]
 [0.86106478 0.88817419 0.88143619 0.97539127]
 [0.56854011 0.80455011 0.99331469 0.62782474]]

[[-0.44519622 -2.6514012  -3.86117079 -0.16334506]
 [-0.18408992 -1.09807935 -1.50004234 -0.09728156]
 [-1.50662067 -1.34030681 -0.10220751 -0.55866599]]

[[1.69402332 2.34326708 1.11600749 1.41391318]
 [1.70478053 1.61194248 1.63542365 1.24895545]
 [2.62758763 1.88868115 1.12265445 2.44010473]]


In [23]:
# Agrregate functions - along any dimention

print(np.sum(a, axis=0)) # sum across rows 
print()
print(np.sum(a, axis=1)) # sum across columns

[1.69421732 0.66582866 1.14700624 2.32857016]

[1.58159226 2.29579325 1.95823686]


In [24]:
print(np.max(a, axis=0)) # max across rows 
print()
print(np.min(a, axis=1)) # max across columns

# min, mean, std, var and many more work similarly

[0.831861   0.33351103 0.90284218 0.90730051]

[0.02104335 0.22312071 0.22165777]


In [30]:
# argmax and argmin
# very useful functions and we encounter them regularly in ML
# in mathematics argmax, instead of giving the maximum of function co-domain all possible values 
# produced by a function, it given the value from functions domain (all possible valid input to function) that
# maximizes the function output. argmin is reverse of argmax
# Numpy implementation: It returns the 'index' of the maximum (or minimum) 'value' of the supplied array

a = np.random.randint(1, 100, 10)
print(a)
print(np.max(a))
print(np.argmax(a))
print(np.min(a))
print(np.argmin(a))

[80 14 86 47 39 22 76 30 37 81]
86
2
14
1


In [None]:
# TODO
# important operations (reshape, size expand-dim, squeeze)
# Broadcasting
# Indexing and slicing
# Boolean Operations compare, filter
# Liniear algebra ( dot, norm, eign vector)