### Introduction

- NumPy is a Python library used to perform numerical computations with large datasets. The name stands for Numerical Python and it is a popular library used by data scientists, especially for machine learning problems.

- NumPy is useful while pre-processing the data before you train it using a machine learning algorithm.

- Working with n-dimensional arrays is easier in NumPy compared to Python lists. NumPy arrays are also faster than Python lists since, unlike lists, NumPy arrays are stored at one continuous place in memory. This enables the processor to perform computations efficiently.

In [1]:
# Importing Numpy

import numpy as np

In [2]:
# Coverting arrays to numpy array

arr = [1,2,3,4]
np.array(arr)

array([1, 2, 3, 4])

In [3]:
# Converting Multi-Dimensional Array to Numpy array

mul_arr = [[1,2],[3,4],[5,6]]
np.array(mul_arr)

array([[1, 2],
       [3, 4],
       [5, 6]])

#### Arange Function

- When working with data, you will often come across use cases where you need to generate data.

- NumPy as an “arrange()” method with which you can generate a range of values between two numbers. The arrange function takes the start, end, and an optional distance parameter.

In [5]:
# without distance parameter

print(np.arange(0,10))

[0 1 2 3 4 5 6 7 8 9]


In [6]:
# with distance parameter

print(np.arange(0,10,2))

[0 2 4 6 8]


#### Zeros and Ones Function

- You can also generate an array or matrix of zeroes or ones using NumPy

In [7]:
print(np.zeros(3))

[0. 0. 0.]


In [8]:
print(np.ones(3))

[1. 1. 1.]


In [9]:
# Both these functions support n-dimensional arrays as well. You can add the shape as a tuple with rows and columns.

print(np.zeros((4,5)))

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [10]:
print(np.ones((4,5)))

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


#### Identity Matrix

- You can also generate an identity matrix using a built-in NumPy function called “eye”.

In [11]:
np.eye(6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

#### Linspace Function

NumPy has a linspace method that generates evenly spaced points between two numbers.

In [13]:
print(np.linspace(0,30,6))

[ 0.  6. 12. 18. 24. 30.]


In [15]:
print(np.linspace(0,30,60))

[ 0.          0.50847458  1.01694915  1.52542373  2.03389831  2.54237288
  3.05084746  3.55932203  4.06779661  4.57627119  5.08474576  5.59322034
  6.10169492  6.61016949  7.11864407  7.62711864  8.13559322  8.6440678
  9.15254237  9.66101695 10.16949153 10.6779661  11.18644068 11.69491525
 12.20338983 12.71186441 13.22033898 13.72881356 14.23728814 14.74576271
 15.25423729 15.76271186 16.27118644 16.77966102 17.28813559 17.79661017
 18.30508475 18.81355932 19.3220339  19.83050847 20.33898305 20.84745763
 21.3559322  21.86440678 22.37288136 22.88135593 23.38983051 23.89830508
 24.40677966 24.91525424 25.42372881 25.93220339 26.44067797 26.94915254
 27.45762712 27.96610169 28.47457627 28.98305085 29.49152542 30.        ]


#### Random Number Generation

- When you are working on machine learning problems, you will often need to generate random numbers. NumPy has in-built functions for that as well.

-  let's look at two major types of distributions.

- **Normal Distribution**
    - The normal distribution is a very important concept in statistics since it seen in many natural phenomena. It is also called a “bell curve”.

- **Uniform Distribution**
    - If the values in the distribution have the probability as a constant, it is called a uniform distribution.
    - For example, a coin toss has a uniform distribution since the probability of getting either heads or tails in a coin toss is the same.




* To generate random numbers in a uniform distribution, use the rand() function from np.random

In [16]:
print(np.random.rand(10))

[0.21108322 0.60483049 0.73548205 0.4836799  0.23785066 0.15370758
 0.2581757  0.06696842 0.78365945 0.87163195]


In [17]:
print(np.random.rand(3,4))

[[0.78037849 0.3385571  0.51362921 0.98781439]
 [0.73109742 0.90091227 0.25317502 0.91790698]
 [0.44513724 0.74946739 0.9547816  0.43686479]]


* To generate random numbers in a normal distribution, use the randn() function from np.random

In [18]:
print(np.random.randn(10))

[ 1.07268951 -0.92752933  0.14184022 -1.2061683   1.13766092  0.99211944
 -1.34729821  1.57551891  0.53501719 -0.38049955]


In [19]:
print(np.random.randn(3,4))

[[ 0.15138121  1.35466735  0.04538869 -2.16632895]
 [ 1.13555758  0.55443082  0.30059231 -1.23546029]
 [-0.74491879 -0.36497527 -0.38852684  0.33473563]]


* To generate random integers between a low and high value, use the randint() function from np.random

In [28]:
print(np.random.randint(1,100,10))

[40 16 37 66  2 44 16 87 84 59]


In [29]:
print(np.random.randint(1,100,(2,3)))

[[26 67 75]
 [36 97 57]]


* A seed value is used if you want your random numbers to be the same during each computation
* Whenever you use a seed number, you will always get the same array generated without any change.

In [31]:
np.random.seed(23)
print(np.random.rand(4))

[0.51729788 0.9469626  0.76545976 0.28239584]


#### Reshaping Arrays

- As a data scientist, you will work with re-shaping the data sets for different types of computations. 



- To get the shape of an array, use the shape property.

In [32]:
arr = np.random.rand(2,2)
print(arr)
print(arr.shape)

[[0.22104536 0.68622209]
 [0.1671392  0.39244247]]
(2, 2)


- To reshape an array, use the reshape() function.
- Also, reshape only works if the existing structure makes sense. You cannot reshape a 2x2 array into a 3x1 array.

In [34]:
print(arr.reshape(1,4))
print("-------------")
print(arr.reshape(4,1))

[[0.22104536 0.68622209 0.1671392  0.39244247]]
-------------
[[0.22104536]
 [0.68622209]
 [0.1671392 ]
 [0.39244247]]


#### Slicing Data

- Let's look at fetching data from NumPy arrays. NumPy arrays work similarly to Python lists during fetch operations.


In [35]:
myarr = np.arange(0,11)
print(myarr)

[ 0  1  2  3  4  5  6  7  8  9 10]


In [36]:
sliced = myarr[0:5]
print(sliced)

[0 1 2 3 4]


In [37]:
sliced[:] = 99
print(sliced)

[99 99 99 99 99]


In [38]:
print(myarr)

[99 99 99 99 99  5  6  7  8  9 10]


- If you look at the above example, even though we assigned the slice of “myarr” to the variable “sliced”, changing the value of “sliced” affects the original array. This is because the “slice” was just pointing to the original array.

- To make an independent section of an array, use the copy() function.

In [40]:
sliced = myarr.copy()[0:5]
print(sliced)

[99 99 99 99 99]


- Slicing multi-dimensional arrays work similarly to one-dimensional arrays.

In [41]:
my_matrix = np.random.randint(1,30,(3,3))
print(my_matrix)

[[14 13 18]
 [ 3 12 22]
 [ 6  1 13]]


In [42]:
# # print a single row
print(my_matrix[0])

[14 13 18]


In [44]:
# print a single value or row 0, column 0
print(my_matrix[0][0])

#alternate way to print value from row0,col0
print(my_matrix[0,0]) 

14
14


#### Array Computations

- NumPy is known for its speed when performing complex computations on large multi-dimensional arrays.

In [45]:
new_arr = np.arange(1,11)
print(new_arr)

[ 1  2  3  4  5  6  7  8  9 10]


In [46]:
# Addition

print(new_arr + 5)

[ 6  7  8  9 10 11 12 13 14 15]


In [47]:
# Subtraction

print(new_arr - 5)

[-4 -3 -2 -1  0  1  2  3  4  5]


In [48]:
# Array Addition

print(new_arr + new_arr)

[ 2  4  6  8 10 12 14 16 18 20]


In [49]:
# Array Division

print(new_arr / new_arr)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


- For zero division errors, Numpy will convert the value to NaN (not a number).

- There are also a few in-built computation methods available in NumPy to calculate values like mean, standard deviation, variance, and others.

   - Sum — np.sum()
   - Square Root — np.sqrt()
   - Mean — np.mean()
   - Variance — np.var()
   - Standard Deviation — np.std()
   
- While working with 2d arrays, you will often need to calculate row wise or column-wise sum, mean, variance, and so on. You can use the optional axis parameter to specify if you want to choose a row or a column.

In [52]:
arr2d = np.arange(25).reshape(5,5)
print(arr2d)

print(arr2d.sum())

# sum of columns
print("sum of columns ",arr2d.sum(axis=0))

# sum of rows
print("sum of rows ",arr2d.sum(axis=1))

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
300
sum of columns  [50 55 60 65 70]
sum of rows  [ 10  35  60  85 110]


#### Conditional Operations

- You can also do conditional filtering with NumPy using the square bracket notation.

In [53]:
arr = np.arange(0,10)

print(arr > 4)

print(arr[arr > 4])

[False False False False False  True  True  True  True  True]
[5 6 7 8 9]
