## NumPy 

NumPy (https://numpy.org/doc/stable) is a library that extends the base capabilities of Python to add a richer data set including more numeric types, vectors, matrices and matrix functions. NumPy and Python work together fairly seamlessly - Python arithmetic operators work on NumPy data types and many NumPy functions will accept Python data types.

* NumPy basic datatype is an indexable, n-dimensional array, containing elements of the same type (check the type of data present with $dtype$)
* Vector in this context represents ordered array of numbers (with # of elements reffered to as 'dimension' or 'rank').
* Some NumPy routines for creating vectors take # of elements as arguments, others take a shape tuple. 
* NumPy provides very complete set of indexing and slicing capabilities for accessing vector elements.

In [72]:
import numpy as np

# NumPy routines which allocate memory and fill arrays with value
# All examples have shape (4,) i.e. 1-D array with 4 elements for comparison):

v = np.zeros(0)
print("np.zeros(0) =", v, v.shape, v.dtype, "\n")
v = np.zeros((4,))
print("np.zeros((4,)) =", v, v.shape, v.dtype, "\n")
v = np.random.random_sample((4,))  # uniform distribution over [0, 1)
print("np.random.random_sample((4,)) =", v, v.shape, v.dtype, "\n")    
v = np.random.rand(4)  # uniform distribution over [0, 1)
print("np.random.rand(4) =", v, v.shape, v.dtype, "\n")       
v = np.arange(4.)  # evenly spaced values.
print("np.arange(4.) =", v, v.shape, v.dtype, "\n")    
v = np.arange(0., 4, 1)  # start [optional, default = 0], stop, step [optional, default = 1]  
print("np.arange(0., 4, 1) =", v, v.shape, v.dtype, "\n")      
v = np.array([1, 2, 3, 4])
print("np.array([1, 2, 3, 4]) =", v, v.shape, v.dtype, "\n")     
v = np.array([1., 2., 3, 4])  # start [optional, default = 0], stop, step [optional, default = 1],
print("np.array([1., 2., 3, 4]) =", v, v.shape, v.dtype, "\n")     
    
# Indexing (refers 1 element by position) and slicing (refurs a subset based on indices) 
# Index must be withing vector range 

a = np.arange(10)
print("a =", a, "\n",
      "a[2] =", a[2], "\n", 
      "a[-1] =", a[-1], "\n", # laste element
      "a[2:7:1] =", a[2:7:1], "\n", # stat:stop:step
      "a[2:7:2] =", a[2:7:2], "\n", # stat:stop:step
      "a[2:] =", a[2:], "\n", # all elements index 2 and above
      "a[:2] =", a[:2], "\n", # all elements below index 2
      "a[:] =", a[:], "\n", # all elements 
     )

np.zeros(0) = [] (0,) float64 

np.zeros((4,)) = [0. 0. 0. 0.] (4,) float64 

np.random.random_sample((4,)) = [0.75670966 0.51098636 0.82451066 0.11226307] (4,) float64 

np.random.rand(4) = [0.85153783 0.87392441 0.84253932 0.99492498] (4,) float64 

np.arange(4.) = [0. 1. 2. 3.] (4,) float64 

np.arange(0., 4, 1) = [0. 1. 2. 3.] (4,) float64 

np.array([1, 2, 3, 4]) = [1 2 3 4] (4,) int32 

np.array([1., 2., 3, 4]) = [1. 2. 3. 4.] (4,) float64 

a = [0 1 2 3 4 5 6 7 8 9] 
 a[2] = 2 
 a[-1] = 9 
 a[2:7:1] = [2 3 4 5 6] 
 a[2:7:2] = [2 4 6] 
 a[2:] = [2 3 4 5 6 7 8 9] 
 a[:2] = [0 1] 
 a[:] = [0 1 2 3 4 5 6 7 8 9] 



## Vector operations

* NumPy library imrpoves memory efficiency. Vectorization provides a large speed up as NumPy makes a better use of the available parallelism in the underlying hardware. GPU and moden CPU's implement Single Instruction, Multiple Data (SIMD) piplines, allowing multiple operation to be issued in parallel. This has proven to be critical in ML, where the datasets are often very large.
* Most of NumPy arithmetic, logical and comparison operations apply to vectors as well and work emelement-wise (on an element-by-element basis). The requirement  for the vectors is to have the same shapes.
* The dot product of two vectors returns a scalar value and is calculated as:  $x = \sum_{i=0}^{n-1} a^{(i)}.b^{(i)}$

In [73]:
a = np.arange(10)
print("a =", a, "\n",
      "-a =", -a, "\n", 
      "np.sum(a) =", np.sum(a), "\n", # sum all elements 
      "np.mean(a) =", np.mean(a), "\n",  
      "a**2", a**2, "\n",  
      "a*2 =", a*2, "\n",  
      "a-2 =", a-2, "\n", 
     )

b = np.arange(10, 20, 1)
print("b =", b, "\n",
      "a + b =", a + b, "\n", 
      "a.dot(b) =", a.dot(b), "\n",  # dot product of two vectors  
      "np.dot(a, b) =", np.dot(a, b), "\n",
     )

a = [0 1 2 3 4 5 6 7 8 9] 
 -a = [ 0 -1 -2 -3 -4 -5 -6 -7 -8 -9] 
 np.sum(a) = 45 
 np.mean(a) = 4.5 
 a**2 [ 0  1  4  9 16 25 36 49 64 81] 
 a*2 = [ 0  2  4  6  8 10 12 14 16 18] 
 a-2 = [-2 -1  0  1  2  3  4  5  6  7] 

b = [10 11 12 13 14 15 16 17 18 19] 
 a + b = [10 12 14 16 18 20 22 24 26 28] 
 a.dot(b) = 735 
 np.dot(a, b) = 735 



## Broadcasting

> The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation. 