# **Module 2: Python Fundamentals**
### **Notebook 3**: NumPy





## **Learning Outcomes**

In this notebook we discuss fundamental scientific computing data structures vectors, matrices, and multidimensional arrays using NumPy library. Topics include:

- Importing library
- Creating arrays
- Accessing/Slicing arrays
- Basic operations on arrays
- Arrays reshaping and resizing
- Basic math operations
- Basic aggregation functions
- Working with random numbers in NumPy arrays
- Matrix and vector operations



## **NumPy**

Numpy is a fundamental external library in Python which can be used for scientific computing. NumPy performs mathematical, statistical, and data manipulation on arrays much faster than the regular Python objects. This difference is very important when you are performing large calculations.


## **Importing NumPy**

In [1]:
import numpy as np # np is a short-cut of numpy

## **Arrays**

- The central data structure of NumPy is the array object class. 

- Arrays are similar to lists in basic Python, except that every element of an array must be of the same type, typically a numeric type like float or int. 

- Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.


## **Creating arrays**



In [2]:
a = np.array([1, 4, 5, 8]) # using list
a

array([1, 4, 5, 8])

In [3]:
type(a)

numpy.ndarray

In [4]:
a = np.array([1, 4, 5, 8], float)
a

array([1., 4., 5., 8.])

In [5]:
a.ndim # looking dimension

1

## **Multi dimensional arrays**

Arrays can be multidimensional. Unlike lists, different axes are accessed using commas inside
bracket notation. Here is an example with a two-dimensional array (e.g., a matrix):



In [6]:
a = np.array([[1, 2, 3], [4, 5, 6]], dtype = float)                  
a

array([[1., 2., 3.],
       [4., 5., 6.]])

In [7]:
a.ndim

2

## **Accessing/slicing elements of arrays**

Array elements are accessed, sliced, and manipulated just like lists.

In [8]:
a = np.array([1, 4, 5, 8], float)
a

array([1., 4., 5., 8.])

In [9]:
a[2]

5.0

In [10]:
a[:3]

array([1., 4., 5.])

Array slicing works with multiple dimensions in the same way as usual, applying each slice
specification as a filter to a specified dimension. Use of a single ":" in a dimension indicates the
use of everything along that dimension.

In [14]:
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype = float)                  
b

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [15]:
b[0,0] #different than list: b[0][0]

1.0

In [16]:
b[1,0:3]

array([4., 5., 6.])

In [17]:
b[1,:3]

array([4., 5., 6.])

In [21]:
#slicing the array along the first axis (rows), starting from index 1 and including all rows after index 1.
b[1:] 

array([[4., 5., 6.],
       [7., 8., 9.]])

In [22]:
b[1,:]

array([4., 5., 6.])

In [23]:
b[0]

array([1., 2., 3.])

In [24]:
b[0,1:3]

array([2., 3.])

##  **Alternative ways of creating arrays**


Summary of some basic NumPy functions for generating arrays:

**Function Name**   | **Type of Array**
--------------------|--------------------------
**np.array()**:         |  Creates an array for which the elements are given by an array like structre (like list, tuple)
**np.zeros()**:         |  Creates an array with zeros as entries
**np.ones()**:          |  Creates an array with ones as entries
**np.diag()**:          |  Creates a diagonal array with specified values along the diagonal and zeros elsewhere
**np.arrange()**:       |  Creates an array with evenly spaced values between the specified start, end, and increment values
**np.linespace()**:     | Creates an array with evenly spaced values between specified start and end values, using a specified number of elements
**np.logspace()**:      | Creates an arrays with logarithmically spaced between start and end values
**np.meshgrid()**:      | Generate coordinate matrices (and higher dimensional coordinate arrays) from one dimensional coordinate vectors
**np.random.rand()**    |  Generates an array with random numbers that are uniformly distributed between 0 and 1



In [25]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [26]:
3*np.ones((2,4))

array([[3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [29]:
#np.arange(0, 10, 1)
np.arange(0.0, 10.0, 2)#start, end and by(step size)

array([0., 2., 4., 6., 8.])

In [30]:
np.linspace(0.0, 10.0, 11) #third argument is number of elements

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [31]:
np.logspace(0, 2, 5)  # 5 data points in log scale between 10**0=1 to 10**2=100

array([  1.        ,   3.16227766,  10.        ,  31.6227766 ,
       100.        ])

In [32]:
x = np.array([-1, 0, 1])
y = np.array([-2, 0, 2])
X, Y = np.meshgrid(x, y)
#help(np.meshgrid)

In [33]:
X

array([[-1,  0,  1],
       [-1,  0,  1],
       [-1,  0,  1]])

In [34]:
Y

array([[-2, -2, -2],
       [ 0,  0,  0],
       [ 2,  2,  2]])

## **Creating arrays with properties of other arrays**

In [35]:
x = np.array([3,4,6])
z = np.ones_like(x)
z

array([1, 1, 1])

In [36]:
a = np.array([[1, 2, 3], [4, 5, 6]], float)
np.zeros_like(a)

array([[0., 0., 0.],
       [0., 0., 0.]])

## **Creating matrices (2D arrays)**


In [37]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [38]:
np.eye(3, k = 0)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [39]:
np.eye(3, k = 1)

array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

In [40]:
np.eye(3, k = -1)

array([[0., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.]])

In [41]:
np.diag(np.arange(0, 40, 5))

array([[ 0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  5,  0,  0,  0,  0,  0,  0],
       [ 0,  0, 10,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 15,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 20,  0,  0,  0],
       [ 0,  0,  0,  0,  0, 25,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 30,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 35]])

## **Some basic operations on arrays**

 Assume a is an array.
 
- **a.shape**: Array dimensions
- **len(a)**: Length of array
- **a.ndim**: Number of array dimensions
- **a.size**: Number of array elements
- **a.dtype**: Data type of array elements
- **a.astype(int)**:  Converting datatype to integer, can covert to string by str and so on


In [42]:
a = np.array([[4,5,6], [4,5.1,6]])
a
#len(a)
#a.shape
#a.ndim
#a.dtype
#a.size

array([[4. , 5. , 6. ],
       [4. , 5.1, 6. ]])

In [43]:
a.shape

(2, 3)

In [44]:
a.ndim

2

In [45]:
a.dtype

dtype('float64')

In [46]:
a.size

6


## **Array reshaping and resizing**

We can re-organize given data by reshaping and resizing. The following are some of the methods to accomplish these tasks.

**Function Name**     | **Description**
----------------------|--------------------------
**np.reshape()** or **np.ndarray.reshape()**:        | Reshape an n-dimensional array.
**np.ndarray.flatten()**:     | Creates a copy of an N-dimensional array by collapsing all dimensions into one.
**np.ravel()**, **np.ndarray.ravel()**:          |  Creats a view (does not change original) of N-dimensional array as it interpreted as one dimensional.
**np.newaxis()**:          | Add a new axis (dimension)
**np.transpopse()**, **np.ndarray.T**:       | Transose the array
**np.vstack()**:      | Stacks a list of arrays vertically (along axis = 0)
**np.hstack()**:      | Stacks a list of arrays horizontally (along axis = 1)
**np.dstack()**:      | Stacks a list of arrays depth-wise (along axis = 2)
**np.concatenate()**: | Creates a new array by appending arrays after each other, along a given axis
**np.resize()**:      | Creates a new copy of the original array with the requested size. 
**np.append()**:      | Appends an element to an array. Creates a new copy of the array
**np.insert()**:      | Inserts a new element at a given position. Creates a new copy of the array.
**np.delete()**:      | Deletes element from a given position. Creates a new copy of the array.



In [47]:
a = np.arange(0,10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
a.reshape((2, 5)) # it is just a view, original variable will not be changed.


array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [49]:
b = a.reshape((2, 5)) #the reshaped array is assigned to b
b

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [50]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
a.shape

(10,)

In [52]:
b

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [53]:
b.shape

(2, 5)

In [54]:
b.flatten() #One-dimensional versions of multi-dimensional arrays can be generated with flatten()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [55]:
b

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [56]:
b.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [57]:
b

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [58]:
np.vstack((b,b))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [59]:
np.hstack((b,b))

array([[0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9, 5, 6, 7, 8, 9]])

Two or more arrays can be concatenated together using the concatenate function with a
tuple of the arrays to be joined.



In [91]:
x = np.array([0,1,2], float)
y = np.array([3,4,5,6], float)
z = np.array([7,8,9,10], float)
np.concatenate((x, y, z))

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

If an array has more than one dimension, it is possible to specify the axis along which multiple
arrays are concatenated. By default (without specifying the axis), NumPy concatenates along
the first dimension.



In [60]:
a = np.array([[1, 2], [3, 4]], float)
b = np.array([[5, 6], [7,8]], float)
c = np.concatenate((a,b)) # concatenates along axis = 0, i.e., vertically
c

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])

In [61]:
d = np.concatenate((a,b), axis = 0)
d

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])

In [62]:
d = np.concatenate((a,b), axis = 1) # along horizontally
d

array([[1., 2., 5., 6.],
       [3., 4., 7., 8.]])

In [63]:
a = np.array([[1, 2, 3], [3, 4, 5]], float)
b = np.array([[5, 6], [7,8]], float)
c = np.concatenate((a,b), axis = 1) # along horizontally
c

array([[1., 2., 3., 5., 6.],
       [3., 4., 5., 7., 8.]])

In [66]:
#Make  sure the shape is compatible

a = np.array([[1, 2, 3], [3, 4, 5]], float)
b = np.array([[5, 6], [7,8, 9]], float)


c = np.concatenate((a,b), axis = 1) #axis = 1, horizontally; axis = 0,vertically
c

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

## **Basic mathematical functions on NumPy**


**Mathematical Functions**    | **Description**
------------------------------|--------------------------
**np.cos()**, **np.sin()**, **np.tan()**        | Trigonometric functions
**np.arccos()**, **np.arcsin()**, **np.arctan()**    | Inverse trigonometric functions
**np.sqrt()** | Square root
**np.exp()**      | Exponential
**np.log()**, **np.log2()**, **np.log10()**     | logarithms of base e, 2, and 10 respectively





## **Additional  NumPy  functions**

**NumPy Functions**    | **Description**
------------------|--------------------------
**np.add()**, **np.substract()**, **np.multiply()**, **np.divide()**        | Addition, subtraction, multiplication, division 
**np.power()**   | Raises first input argument to the power of the second input argument (applied elementwise)
**np.remainder()**  | The remainder of division
**np.reciprocal()**       | The reciprocal(inverse) each element
**np.real()**, **np.imag()**, **np.conj()** | The real part, imaginary part, and complex conjugate of the elements  in the input arrays
**np.abs()**     | The absolute value
**np.floor()**, **np.ceil()**      | Convert to integer values
**np.round()**     | Round to a given number of decimals


In [67]:
np.sin(x) ** 2 + np.cos(x) ** 2

array([1., 1., 1.])

## **Vectorized implementation**

In [68]:
x = np.array([[1, 2], [3, 4]]) 
y = np.array([[5, 6], [7, 8]])

In [69]:
x + y # elemement wise addition

array([[ 6,  8],
       [10, 12]])

In [70]:
x * y # elemement wise multiplication

array([[ 5, 12],
       [21, 32]])

In [71]:
y / x

array([[5.        , 3.        ],
       [2.33333333, 2.        ]])

If a newly created function is not capable of performaing element-by-element basis in a NumPy array, we can apply **np.vectorize()** function to vectorize the function. For example:

In [72]:
def heaviside(x):
    return 1 if x > 0 else 0

In [73]:
heaviside(-1)

0

In [74]:
heaviside(1.5)

1

In [75]:
x = np.linspace(-5, 5, 11)
x

array([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.,  5.])

In [76]:
heaviside(x)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [77]:
heaviside = np.vectorize(heaviside)
heaviside(x)

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

## **Aggregate functions**

NumPy provides set of functions for calculating aggregates for NumPy arrays, which take an array as input and by default return a scalar as output. For example, finding sample mean (average), standard deviation, and variance, calculating sum and product of elements in an array.



NumPy Functions for Calcuating Aggregates   | Description
------------------|--------------------------
**np.mean()**      | Sample mean
**np.std()**   | Standard deviation
**np.var()**  |  Variance
**np.corrcoef()** |Correlation coefficient
**np.sum()**     | Sum of all elements
**np.prod()** | Product of all elements
**np.cumsum()**  | Cumulative sum of all elements
**np.cumprod()**    |Cumulative product of all elements
**np.min()**, **np.max()**| The minimum/maximum value in an array
**np.argmin()**, **np.argmax()** | The index of the minimum/maximum value in an array
**np.all()** | Returns True if all elements in the argument array are nonzero
**np.any()** | Returns True if any of the elements in the argument array is nonzero


**Note**: By default, the functions  above aggregate over the entire input array. Using the **axis** keyword argument with these functions and their corresponding ndarray methods, it is possible to control over which axis in the array aggregation is carried out. 

In [78]:
a = np.array([[1, 2], [3, 1], [3, 5]], float)
a

array([[1., 2.],
       [3., 1.],
       [3., 5.]])

In [79]:
a.mean() ## mean of all entries

2.5

In [81]:
a.mean(axis=0) # mean along vertical axis(column means)

array([2.33333333, 2.66666667])

In [82]:
a.mean(axis = 1) # mean along horizontal axis(row means)

array([1.5, 2. , 4. ])

## **Comparison operators and value testing**

Boolean comparisons can be used to compare members elementwise on arrays of equal size.
The return value is an array of Boolean True/False values.

In [83]:
a = np.array([1, 3, 0, 4], float)
b = np.array([0, 3, 2, 5], float)
a > b

array([ True, False, False, False])

In [84]:
 a == b

array([False,  True, False, False])

The results of a Boolean comparison can be stored in an array:


In [85]:
c = a > b
c

array([ True, False, False, False])

## **Array item selection and manipulation**

As we have seen before, individual elements and slices of arrays can be selected
using bracket notation. Unlike lists, however, arrays also permit selection using other arrays.
That means, we can use array selectors to filter for specific subsets of elements of other arrays.
Boolean arrays can be used as array selectors.


In [86]:
a = np.array([[6, 4], [5, 9]], float)
a >= 6

array([[ True, False],
       [False,  True]])

In [87]:
a[a >= 6]

array([6., 9.])

## **Random numbers**

An important part of any simulation is the ability to draw random numbers. For this purpose,
we use Numpy's built-in pseudorandom number generators. The numbers are pseudo random in the sense that they are generated deterministically from a seed number, but are distributed in what has statistical similarities to
random fashion. 

In [88]:
 np.random.seed(2933)
    

In [89]:
np.random.random() #just one random number between [0,1]

0.4125913068449133

In [90]:
 np.random.seed(2933)
np.random.random() #just one random number between [0,1]

0.4125913068449133

An array of random numbers in the half-open interval [0.0, 1.0) can be generated as follows.
                                                      

In [91]:
np.random.rand(5) # five random numbers between [0,1)

array([0.69435094, 0.76119926, 0.15583373, 0.18966773, 0.25296952])

In [92]:
np.random.rand(3,2)# random numbers between [0,1) of size (3,2) 

array([[0.37343881, 0.12970704],
       [0.35741684, 0.83353529],
       [0.9907141 , 0.7709841 ]])

In [93]:
 np.random.rand(6).reshape((3,2))

array([[0.22110935, 0.80496913],
       [0.5640127 , 0.00174327],
       [0.63568097, 0.02981465]])

 To generate random integers in the range [min, max) use randint(min, max):

In [94]:
np.random.randint(4, 10) #a random integer between 4 and 10

7

In each of the above examples, we drew random numbers from uniform distributions. NumPy also
includes generators for many other distributions, including the Normal, Poisson, Beta, binomial, chi-square,
Dirichlet, exponential, F, Gamma, geometric, hypergeometric, and so on.


In [95]:
np.random.poisson(5.0) # lambda = 5

5

In [96]:
np.random.normal(1.5, 4.0) # normal distributions with mean = 1.5, std = 4

12.237774592644868

In [97]:
np.random.normal() # mean = 0, std 1

-0.6419500455919834

In [98]:
 np.random.normal(1.5, 4, size = 5)

array([ 0.04494024, -2.9338295 ,  1.25458097, -3.78999846, -1.50852255])

## **NumPy functions for conditional and logical expressions**

**NumPy Functions**  | **Description**
---------------------|--------------------------
**np.where()**    | Chooses values from two arrays depending on the value of a condition array
**np.choose()**   |  Chooses values from a list of arrays depending on the value of a given index array
**np.select()**   | Chooses values from a list of arrays depending  on the list of conditions

In [99]:
x = np.linspace(-4, 4, 9)
x

array([-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])

In [100]:
np.where(x < 0, x**2, x**3) #computes the square where x<0, and computes the cubes of the rest

array([16.,  9.,  4.,  1.,  0.,  1.,  8., 27., 64.])

In [101]:
np.select([x < -1, x < 2, x >= 2],
          [x**2  , x**3 , x**4]) #computes the square where x<-1, and computes the cubes where x <2, 
                                # and computes fourth power of the rest

array([ 16.,   9.,   4.,  -1.,   0.,   1.,  16.,  81., 256.])

In [102]:
#need more information
help(np.where)

Help on function where in module numpy:

where(...)
    where(condition, [x, y])
    
    Return elements chosen from `x` or `y` depending on `condition`.
    
    .. note::
        When only `condition` is provided, this function is a shorthand for
        ``np.asarray(condition).nonzero()``. Using `nonzero` directly should be
        preferred, as it behaves correctly for subclasses. The rest of this
        documentation covers only the case where all three arguments are
        provided.
    
    Parameters
    ----------
    condition : array_like, bool
        Where True, yield `x`, otherwise yield `y`.
    x, y : array_like
        Values from which to choose. `x`, `y` and `condition` need to be
        broadcastable to some shape.
    
    Returns
    -------
    out : ndarray
        An array with elements from `x` where `condition` is True, and elements
        from `y` elsewhere.
    
    See Also
    --------
    choose
    nonzero : The function that is called when x and y

## **Matrix and vector operations**

**NumPy Functions**  | **Description**
------------------|--------------------------
**np.dot()**     | Matrix multiplication (dot product)
**np.inner()**  | Scaler multiplication(inner product) bewteen two arrays representing vectors
**np.cross()**  | Cross product bewteen two arrays representing vectors
**np.tensordot()** |Dot product along specified axes of multidimensional arrays
**np.outer()**  | Outer product(tensor product)  bewteen two arrays representing vectors

In [103]:
A = np.arange(1, 7).reshape(2, 3)
A

array([[1, 2, 3],
       [4, 5, 6]])

In [104]:
B = np.arange(1, 7).reshape(3, 2)
B

array([[1, 2],
       [3, 4],
       [5, 6]])

In [105]:
np.dot(A, B) #Matrix multiplication of A and B

array([[22, 28],
       [49, 64]])

In [106]:
np.dot(B, A) #Matrix multiplication of B and A

array([[ 9, 12, 15],
       [19, 26, 33],
       [29, 40, 51]])

In [107]:
A = np.matrix([[1,2,3],[3,2,1]])
B = np.matrix([[0,2],[1,-1],[0,1]])
A
B

matrix([[ 0,  2],
        [ 1, -1],
        [ 0,  1]])

In [108]:
np.dot(A,B)

matrix([[2, 3],
        [2, 5]])

In [109]:
np.dot(B,A)

matrix([[ 6,  4,  2],
        [-2,  0,  2],
        [ 3,  2,  1]])

## **Efficiency of Numpy arrays**


In [110]:
%%timeit
list1 = list(range(5000000))
#list1

89.7 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [111]:
%%timeit
array1 = np.arange(5000000)

2.53 ms ± 43.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## **References**:

See the Numpy documentation here: https://numpy.org/doc/stable/user/quickstart.html#the-basics

Pandas tutorial for beginners: https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
