## Foundations of Numerical Computing with NumPy

In the previous sections, you've built a strong foundation in Python programming, including an understanding of Object-Oriented Programming (OOP) principles. Now, we're entering the world of data and array manipulation, a fundamental skill for any data scientist.

In this module, we will introduce you to NumPy, short for Numerical Python. NumPy is a powerful Python library that forms the cornerstone of scientific computing and data manipulation. With NumPy, you can perform a wide range of operations on data, from basic mathematical computations to advanced data transformations.


## Numpy

NumPy, short for **Numerical Python**, is the foundational package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. NumPy arrays are called __ndarray or N-dimensional arrays__ and they store elements of the same type and size

### Why NumPy?

You might wonder, "Why do we need NumPy when we already have Python lists?" The answer lies in efficiency and functionality. NumPy arrays are designed to handle large datasets efficiently, and they come with a multitude of functions for data manipulation and analysis. In the world of data science, NumPy is an essential tool for tasks like data cleaning, exploration, and transformation.

### Key Concepts in NumPy

As we delve into NumPy, here are some key concepts we'll explore:

- **Creating Arrays:** Learn how to create arrays from scratch or from existing data.
- **Operations with Arrays:** Perform element-wise operations, mathematical computations, and more.
- **Indexing and Slicing:** Access and modify specific elements or sections of arrays.
- **Universal Functions (UFuncs):** Discover functions that operate element-wise on arrays.
- **Aggregation and Statistics:** Calculate statistics and aggregate data within arrays.
- **Broadcasting:** Understand how NumPy handles operations on arrays with different shapes.
- **Array Shape and Reshaping:** Manipulate the shape and structure of arrays.

### Getting Started

Before we dive into the details, make sure you have NumPy installed. If it's not already installed, you can typically install it using:


`!pip install numpy`

In [58]:
#!pip install numpy

### Using Numpy
Arrays are a collection of elements/values, that can have one or more dimensions. An array of one dimension is called a __Vector__ while having two dimensions is called a __Matrix__. We can create Numpy array using different methods. <br>We can think of a one-dimensional array as a column or a row of a table with one or more elements. While a multi-dimensional array can be likened to an Excel Spreadsheet — it has columns and rows. Each column can be considered as a dimension.


<img src ="https://miro.medium.com/max/436/1*5aZagBBjt1gUClPQOjqj4A.png" alt = "1D array"/>
<br>
<img src ="https://miro.medium.com/max/301/1*DGOyuKMZE4ff9rYpZ6_NbQ.png" alt = "2D array"/>
<br> 
source:https://medium.com/fintechexplained/why-should-we-use-numpy-c14a4fb03ee9

##### Basic ndarray

In [62]:
import numpy as np

# Note: It is common practice to import numpy 
# under the alias np — that way, we have to type less code to reference it further down the line.

# The easiest way to create an array is to use the array function
#creating Numpy array using Python list
age_list = [24, 18, 16, 30, 45]
age_list_array = np.array(age_list)
print(age_list_array)
print(type(age_list))
print(type(age_list_array))

[24 18 16 30 45]
<class 'list'>
<class 'numpy.ndarray'>


In [63]:
# We can specify the data type using the dtype argument
points_scored = np.array([2,3,6,9,5], dtype = np.float64)
points_scored

array([2., 3., 6., 9., 5.])

In [64]:
# We can as well give a multidimensional array
points_upgraded = np.array([[2,4,6,7,8], [1, 3, 5, 4, 2]])
points_upgraded

array([[2, 4, 6, 7, 8],
       [1, 3, 5, 4, 2]])

##### Evenly spaced arrays

You can quickly get an evenly spaced array of numbers using the `np.arange()` method:

In [65]:
spaced_array = np.arange(6)
spaced_array

array([0, 1, 2, 3, 4, 5])

The start, end and step size of the interval of values can be explicitly defined by passing in three numbers as arguments for these values respectively

In [66]:
spaced_array_stepped = np.arange(2, 10, 2)
spaced_array_stepped

array([2, 4, 6, 8])

<img src = "https://cdn.analyticsvidhya.com/wp-content/uploads/2020/04/np_arange.png" alt = "arange description"/>

Another similar function is `np.linspace()`, but instead of step size, it takes in the number of samples that need to be retrieved from the interval.

In [67]:
another_spaced_array = np.linspace(0,1, 6)
another_spaced_array

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

##### Array of random numbers
We can use the `np.random.rand()`method to create an array of a given shape with random values from [0,1]

In [68]:
random_array = np.random.rand(2,3)
random_array

array([[0.32728381, 0.19138463, 0.13038947],
       [0.29034522, 0.14587734, 0.60543195]])

##### Arrays of specific numbers
We can use special functions like `np.ones()`, `np.zeros()`, `np.full()` to create arrays with specific numbers

In [69]:
array_zero = np.zeros((2,2))
array_zero

array([[0., 0.],
       [0., 0.]])

In [70]:
array_one = np.ones(3)
array_one

array([1., 1., 1.])

In [71]:
array_one_multi = np.ones((4,3)) # we want array of ones with dimension 4*3
array_one_multi

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [72]:
choice_array = np.full((2,3), 4) # we want array of fours with dimension 2*3
choice_array

array([[4, 4, 4],
       [4, 4, 4]])

### Getting Basic Info about the array

#### Dimensions of Numpy Arrays
We can easily determine the number of dimensions or axes of a NumPy array using the `ndims` attribute

In [73]:
choice_array_dimension = choice_array.ndim
print(f"Choice array has a dimension of {choice_array_dimension}")

Choice array has a dimension of 2


#### Shape of Numpy Arrays
The `shape` is an attribute of the NumPy array that shows how many rows of elements are there along each dimension

In [75]:
choice_array_shape = choice_array.shape
print(f"Choice array has a shape of {choice_array_shape}")

Choice array has a shape of (2, 3)


#### Size of Numpy Array
You can determine how many values there are in the array using the `size` attribute

In [76]:
choice_array_size = choice_array.size
print(choice_array)
print(f"Choice array contains {choice_array_size} elements")

[[4 4 4]
 [4 4 4]]
Choice array contains 6 elements


### Operations between Arrays and Scalars
Arrays are important because they enable you to express batch operations on data
without writing any for loops. This is usually called vectorization. Any arithmetic op￾erations between equal-size arrays applies the operation elementwise

In [78]:
list1 = [2,3, 8, 11, 20]

# multiplying 2 with every element in a Python list
'''
new_list = []
for element in list1:
    new_list.append(element*2)
    
print(new_list)

'''

array1 = np.array(list1) # Let's create our array from the list
new_array = array1*2     # multplies every element of the array by 2
print(new_array)

[ 4  6 16 22 40]


In [79]:
array2 = np.arange(5)     # creates another array
sub_array = array1-array2 # element-wise subtraction of array1 and array2
print(sub_array)

[ 2  2  6  8 16]


In [80]:
mult_array = array1*array2  #multiplies array1 and array2
print(mult_array)

[ 0  3 16 33 80]


In [81]:
div_array = array2 + 3 # adds 3 to every element in array2
print(div_array)

[3 4 5 6 7]


### Indexing and Slicing
 These techniques allow us to select
a subset of our data or individual elements

In [82]:
# Let's create a new array here 
another_array = np.arange(2,40,4) 
print(another_array)

[ 2  6 10 14 18 22 26 30 34 38]


In [83]:
another_array[2] #gets the element at index 2

10

In [84]:
another_array[2:6] # gets the element between 2 qnd 6

array([10, 14, 18, 22])

In [85]:
# Let's create a 2D arrary
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [86]:
arr2d[0] # fethces the arrays at zeroth index

array([1, 2, 3])

In [89]:
arr2d[1]

array([4, 5, 6])

In [25]:
arr2d[2]

array([7, 8, 9])

In [90]:
arr2d[2][1] # gets the array at index 2 and then the second element of that array

8

### Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output

In [91]:
# Let's create an array here and apply some universal functions on them
our_array = np.arange(10)
our_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [92]:
np.sqrt(our_array) # get the square root of each element

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [93]:
np.exp(our_array) # gets the exponent of each element

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [94]:
np.max(our_array)

9

In [95]:
np.sum(our_array)

45

In [96]:
np.min(our_array)

0

### Mathematical and Statistical Methods
A set of mathematical functions which compute statistics about an entire array or about
the data along an axis are accessible as array methods. Aggregations (often called
reductions) like sum, mean, and standard deviation std can either be used by calling the
array instance method or using the top level NumPy function


In [97]:
arr = np.random.randn(5, 4) # normally-distributed data
arr

array([[ 2.11680621, -0.90924039,  0.39158561, -1.80273441],
       [ 2.38230282,  2.00928676,  0.93573575, -0.95656741],
       [ 0.35065067,  0.94837139,  1.64350293,  0.74446022],
       [ 0.86993566,  1.23246124,  0.01012588,  0.24554177],
       [ 0.10536033, -0.41342285, -0.06289723, -0.65986709]])

In [98]:
arr.mean()  # gets the mean of the whole array

0.4590698933034485

In [99]:
np.sum(arr) # gets the summ of the whole array

9.18139786606897

In [100]:
arr.mean(axis=1) #get the mean along the horizontal axis (i.e row)

array([-0.05089574,  1.09268948,  0.9217463 ,  0.58951614, -0.25770671])

In [37]:
np.mean(arr, axis=0) #get the mean along the vertical axis (i.e column)

array([ 0.4242656 ,  0.10430457, -0.61348887, -0.19928287])

In [101]:
arr.max()

2.3823028206705095

In [102]:
arr.std()

1.0778294300519962

### Unique & Sorting
NumPy has some basic set operations for one-dimensional ndarrays. Probably the most
commonly used one is np.unique, which returns the sorted unique values in an array.

In [103]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [104]:
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [105]:
np.sort(names)

array(['Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Will', 'Will'], dtype='<U4')

### Linear Algebra
Linear algebra, like matrix multiplication, decompositions, determinants, and other
square matrix math, is an important part of any array library

In [109]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
print(x)
print(y)
print(f"Dimension of array x: {x.shape}")
print(f"Dimension of array y: {y.shape}")

[[1. 2. 3.]
 [4. 5. 6.]]
[[ 6. 23.]
 [-1.  7.]
 [ 8.  9.]]
Dimension of array x: (2, 3)
Dimension of array y: (3, 2)


In [107]:
np.dot(x,y)

array([[ 28.,  64.],
       [ 67., 181.]])

***
__Difference between dot product and Matrix Multiplication__
***
The dot product is the sum of products of values in two same-sized vectors and the matrix multiplication is a matrix version of the dot product with two matrices. The output of the dot product is a scalar whereas that of the matrix multiplication is a matrix whose elements are the dot products of pairs of vectors in each matrix.<br><br>
Dot Product $$\begin{bmatrix} a_{1} & a_{2}\end{bmatrix} \begin{bmatrix} b_{1}\\b_{2}\end{bmatrix} = a_{1}b_{1} + a_{2}b_{2}$$

Matrix Multiplication $$\begin{bmatrix} a_{11} & a_{12}\\a_{21} & a_{22}\end{bmatrix} \begin{bmatrix} b_{11} & b_{12}\\b_{21} & b_{22}\end{bmatrix} =  \begin{bmatrix} a_{11}b_{11} + a_{12}b_{21}  & a_{11}b_{12} + a_{12}b_{22}\\a_{21}b_{11} + a_{22}b_{21}  & a_{21}b_{12} + a_{22}b_{22}\end{bmatrix}$$

In [110]:
X = np.arange(2,18,2).reshape(4, 2) # create a 1D array and change it to 2D with shape (4 by 2)
X

array([[ 2,  4],
       [ 6,  8],
       [10, 12],
       [14, 16]])

In [46]:
X_transpose = X.T # get the transpose of the array i.e flipping the vertical axis (column) to become the horizontal axis
X_transpose

array([[ 2,  6, 10, 14],
       [ 4,  8, 12, 16]])

In [47]:
from numpy.linalg import inv

X_transpose_dot = X_transpose.dot(X) # find the dot product of the transpose and X
X_transpose_dot

array([[336, 400],
       [400, 480]])

In [48]:
inv(X_transpose_dot) # calculates the inverse: Read up how matrix inverse is calculated

array([[ 0.375 , -0.3125],
       [-0.3125,  0.2625]])

### Shape Manipulation

In [111]:
a = np.random.random((3,4))
a

array([[0.68871511, 0.46346999, 0.96459236, 0.56397997],
       [0.67491938, 0.47149256, 0.55719734, 0.10674738],
       [0.14924297, 0.44825097, 0.22753075, 0.20784481]])

In [116]:
print(a.shape) # Let's check the shape
print(a.ndim)

(3, 4)
2


In [115]:
a_flattened = a.ravel()  # this flattens the array
print(a_flattened)
print(f"Dimension of flattened a: {a_flattened.ndim}")

[0.68871511 0.46346999 0.96459236 0.56397997 0.67491938 0.47149256
 0.55719734 0.10674738 0.14924297 0.44825097 0.22753075 0.20784481]
Dimension of flattened a: 1


In [119]:
a_reshaped = a.reshape(6,2) # this returns the array with a modified shape
print(f"The modified shape is now {a_reshaped.shape}")
a_reshaped

The modified shape is now (6, 2)


array([[0.68871511, 0.46346999],
       [0.96459236, 0.56397997],
       [0.67491938, 0.47149256],
       [0.55719734, 0.10674738],
       [0.14924297, 0.44825097],
       [0.22753075, 0.20784481]])

In [53]:
# If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:
a_reshaped = a.reshape(4, -1)
a_reshaped

array([[0.48357879, 0.84253715, 0.6051085 ],
       [0.79104015, 0.34433375, 0.57710734],
       [0.55372131, 0.73670158, 0.57713076],
       [0.06650873, 0.22199996, 0.67193253]])

### Stacking arrays

We can create a new array fom existing arrays using the vstack() method, thereby increasing the number of rows in the resulting array Or combine the arrays in a horizontal fashion (i.e. along the columns) using the hstack(), thereby increasing the number of columns in the resultant array.

In [120]:
a = np.full((3,3),7)
a

array([[7, 7, 7],
       [7, 7, 7],
       [7, 7, 7]])

In [125]:
b = np.ones((3,3))
print(b)
print(b.shape)
print(b.size)
print(b.ndim)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
(3, 3)
9
2


In [127]:
vertical_stack = np.vstack((a,b))
print(vertical_stack.size)
vertical_stack

18


array([[7., 7., 7.],
       [7., 7., 7.],
       [7., 7., 7.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [129]:
horizontal_stack = np.hstack((a,b))
print(horizontal_stack.shape)
print(horizontal_stack.ndim)
horizontal_stack

(3, 6)
2


array([[7., 7., 7., 1., 1., 1.],
       [7., 7., 7., 1., 1., 1.],
       [7., 7., 7., 1., 1., 1.]])

### Numpy Drill -- DIY

In [1]:
# Uncomment the next line if you need install numpy
# !pip install numpy --upgrade

#### 1. Import the numpy package under the name `np` 

#### 2. Create a null vector of size 10

#### 3. Create a vector with values ranging from 10 to 30 

#### 4. Reverse a vector (first element becomes last)

#### 5. Create a 3x3 matrix with values ranging from 0 to 8 

#### 6. Create a 10x10 array with random values and find the minimum and maximum values

#### 7. Create random vector of size 10 and replace the maximum value by 0

#### 8. Create an array and find the most frequent value in the array

#### 9. Find the total value of the elements in the array created in 8

#### 9. Add 3 to every element of the array created in 8 and find the lowest value in the resulting array

#### 10. Create an array and find the most frequent value in the array

The array below represents the distance covered by racers in meters

In [3]:
distance_covered = np.random.randint(200, 550, 30)
#print(distance_covered)

[440 429 203 499 532 338 425 355 399 216 283 500 430 202 207 387 270 280
 416 368 436 360 305 291 349 472 544 285 521 467]


#### 11. What is the avergae distance covered by the racers?

#### 11. What is the total distance covered in `kilometers`?

#### 12. How many meter is covered by the racer with the lowest distance travelled?

#### 13. How many racers are included altogether?

#### 14. Given the average time travelled as 85 minutes, callute the speed travelled by the respective racer in `Km/hr`

#### 15. Which racer travelled with the maximum speed?