<a href="https://colab.research.google.com/github/roop01/numpy-tutorial/blob/main/01_numpy_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Numerical Computation using Numpy

“NumPy is the fundamental package for scientific computing with Python. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

**The Main Benefit of NumPy**

The main benefit of NumPy is that it allows for extremely fast data generation and handling. NumPy has its own built-in data structure called an array which is similar to the normal Python list, but can store and operate on data much more efficiently.

**How to work with numerical data in Python**

1. How to turn Python lists into Numpy arrays
2. Multi-dimensional Numpy arrays and their benefits
3. Array operations, broadcasting, indexing, and slicing
4. How to work with CSV data files using Numpy

Data in Data Science refers to numerical data like stock prices, sales figures, sport score and sensor readings etc.

Numpy library provides many specialized data structures, functions and other tools for numerical computing in python.


Ref [Tutorial](https://www.freecodecamp.org/news/exploratory-data-analysis-with-numpy-pandas-matplotlib-seaborn/)

For example if we want to access the climate suitability to grow apple in a region.
We would formulate a relationship between annual yield of apples (tons per hectare) and climatic conditions like average temperature  (in degrees Fahrenheit), rainfall (in millimeters), and average relative humidity (in percentage) as a linear equation.

yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity

This is simple linear model is used to make predictions. Based on historical data and statistical analysis we can find the values of w1, w2 and w3 as below.

w1, w2, w3 = 0.3, 0.2, 0.5

We can define some variables to record climate data for a region as -
temp = 73
rainfall = 67
humidity = 43

The liner equation after subsitution -

yield of apples = .3*73 + .2*67 + .5*43 = 56.8

The expected yield of apples in region is 56.8 tons per hectare.

To predict for multiple regions we can represent the climate data for each region as a vector, that is a list of numbers.

kinnur = [73, 67, 43]
manali = [91, 88, 64]
shimla = [87, 134, 58]
sissu = [102, 43, 37]
kullu = [69, 96, 70]

weights = [w1, w2, w3]

Yield can be calculated according to below function for multi[lt regions:
def crop_yield(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result
    
crop_yield(kinnur, weights)
56.8

crop_yield(manali, weights)
76.9

crop_yield(kullu, weights)
74.9

The calculation performed by the crop_yield (element-wise multiplication of two vectors and taking a sum of the results) is also called the dot product.

The Numpy library provides a built-in function to compute the dot product of two vectors.

Numpy is library python library to implement methematical functions.

To create a numpy array:


In [108]:
import numpy as np
kinnur = np.array([73, 67, 43])
print(kinnur)
print(kinnur.shape)

[73 67 43]
(3,)


In [109]:
weights = np.array([.3,.2,.5])
type(weights)

numpy.ndarray

In [110]:
kinnur[1]

67

# How to operate on Numpy arrays

**dot product of two arrays**


The * performs elementwise multiplication of two arrays if they have same dimensions

In [111]:
np.dot(kinnur, weights)
(kinnur*weights).sum()

56.8

In [112]:
arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr1*arr2

array([ 4, 10, 18])

In [113]:
arr2.sum()

15

# Benefits of using Numpy

1. Easy to use
2. Performance : Numpy operations and functions are written internally using C+ which makes it faster than python statements and loops.
3. It is faster for large datasets
4. Multidimensional arrays can be represented and worked upon faster with minimum coding.


# The Two Different Types of NumPy Arrays

There are two different types of NumPy arrays: vectors and matrices.

Vectors are one-dimensional NumPy arrays, and look like this:

In [114]:
my_vector = np.array(['this', 'is', 'a', 'vector'])
my_vector

array(['this', 'is', 'a', 'vector'], dtype='<U6')

Matrices are two-dimensional arrays and are created by passing a list of lists into the np.array() method

In [115]:
my_matrix = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
print(np.array(my_matrix))

[[1 2 3]
 [4 5 6]
 [7 8 9]]


# Multidimensional Numpy Arrays

We have 2D and 3D arrays in Numpy

In [116]:
#2D array 5*3
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

climate_data.shape

(5, 3)

In [117]:
#3D array

arr3 = np.array([[[11,22,33],
                 [44,55,66]],
                [[77,88,99],
                 [1,2,3]]])
arr3.shape

(2, 2, 3)

In [118]:
arr3.dtype

dtype('int64')

In [119]:
arr4 = np.array([[[11,22,33],
                 [44,55,66]],
                [[77,88,99],
                 [1.0,2,3]]])
arr4.dtype

dtype('float64')

In [120]:
#matrix multiplication
climate_data@weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

# Work with CSV Data Files
We have a climate data file in csv format which contains 10000 climate measurements (temperature, rainfall, humidity)






In [121]:
climate_data = np.genfromtxt('/content/sample_data/Climate_data_numpy.txt', delimiter = ',' , skip_header=1)
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [122]:
climate_data.shape

(10000, 3)

In [123]:
#matrix multiplication
weights = np.array([0.3, 0.2, 0.5])
yields = climate_data@weights
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [124]:
yields.shape

(10000,)

We want to add yields to climate_data as fourth column using concatenate function of numpy.

We are adding two arrays and both sould be of same dimension and the same length alomg each except the dimension used for concatenation.
The axis argument specifies the dimention for concatenation.

 We use the np.reshape function to change the shape of yields from (10000,) to (10000,1).

In [125]:
climate_results = np.concatenate((climate_data, yields.reshape(10000,1)), axis =1)
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [126]:
climate_results.shape

(10000, 4)

In [127]:
np.savetxt('climate_result.txt',
           climate_results,
           fmt = '%.2f',
           delimiter = ',',
           header='temperature,rainfall,humidity,yield_apples',
           comments='')

# Numpy Operations

Numpy provides hundreds of functions for performing operations on arrays. Here are some commonly used functions:

1. Mathematics: np.sum, np.exp, np.round, arithmetic operators.
2. Array manipulation: np.reshape, np.stack, np.concatenate, np.split
3. Linear Algebra: np.matmul, np.dot, np.transpose, np.eigvals
4. Statistics: np.mean, np.median, np.std, np.max


In [128]:
x = np.ones(4)
y = np.arange(1,5)
print(x)
print(y)
print(np.vstack((x, y)))

[1. 1. 1. 1.]
[1 2 3 4]
[[1. 1. 1. 1.]
 [1. 2. 3. 4.]]


In [129]:
#Transpose
print(np.vstack((x, y)).T)

[[1. 1.]
 [1. 2.]
 [1. 3.]
 [1. 4.]]


In [130]:
print(np.hstack((x, y)))

[1. 1. 1. 1. 1. 2. 3. 4.]


In [131]:
#In a single 1D array
x = np.arange(1,3)
y = np.arange(3,5)
z= np.arange(5,7)
np.concatenate([x,y,z])


array([1, 2, 3, 4, 5, 6])

# Numpy Arithmetic Operations, Broadcasting, and Comparison

Numpy arrays support arithmetic operators like +, -, *, etc. You can perform an arithmetic operation with a single number (also called a scalar) or with another array of the same shape.

In [132]:
arr_1 = np.array([[1,2,3,4],
                  [5,6,7,8],
                  [9,1,2,3]])

arr_2 = np.array([[11, 12, 13, 14],
                 [15, 16, 17, 18],
                 [19, 11, 12, 13]])

#adding a number to each element
print(arr_1+3)

#element wise substraction
print(arr_1-arr_2)

#Division by a number
print(arr_2/2)

# Element-wise multiplication
print(arr_1 * arr_2)

# Modulus with scalar
print(arr_2 % 4)

[[ 4  5  6  7]
 [ 8  9 10 11]
 [12  4  5  6]]
[[-10 -10 -10 -10]
 [-10 -10 -10 -10]
 [-10 -10 -10 -10]]
[[5.5 6.  6.5 7. ]
 [7.5 8.  8.5 9. ]
 [9.5 5.5 6.  6.5]]
[[ 11  24  39  56]
 [ 75  96 119 144]
 [171  11  24  39]]
[[3 0 1 2]
 [3 0 1 2]
 [3 3 0 1]]


# Numpy Array Broadcasting

Broadcasting is allowing arithmetic operations between two arrays with different numbers of dimensions but compatible shapes.

In [133]:
a1 = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 1, 2, 3]])

a1.shape

(3, 4)

In [134]:
a2 = np.array([4,5,6,7])
a2.shape

(4,)

In [135]:
print(a1 + a2)

[[ 5  7  9 11]
 [ 9 11 13 15]
 [13  6  8 10]]


When the expression a1 + a2 is evaluated, a2 (which has the shape (4,)) is replicated three times to match the shape (3, 4) of a1. Numpy performs the replication without actually creating three copies of the smaller dimension array, thus improving performance and using lower memory.

Broadcasting only works if one of the arrays can be replicated to match the other array's shape.

In [138]:
a3 = np.array([7,8])
a3.shape


(2,)

In [150]:
#Broadcasting only works if one of the arrays can be replicated to match the other array's shape.
print(a1 + a3)

ValueError: operands could not be broadcast together with shapes (3,4) (2,) 

# Numpy Array Comparision

Numpy arrays also support comparison operations like ==, !=, > and so on. It is elementwise comparision and result is array of booleans. True represents 1 and False represents 0 when you use booleans in arithmetic operations



In [140]:
a4 = np.array([[1, 2, 3], [3, 4, 5]])
a5 = np.array([[2, 2, 3], [1, 2, 5]])

print(a4 == a5)

print(a4 != a5)

print(a4 >= a5)

print(a4 < a5)

[[False  True  True]
 [False False  True]]
[[ True False False]
 [ True  True False]]
[[False  True  True]
 [ True  True  True]]
[[ True False False]
 [False False False]]


Array comparision is frequently used to count the number of equal elements in two arrays using sum method

In [141]:
print((a4 == a5).sum())

3


# Numpy Array Indexing and Slicing

Numpy arrays use python's list indexing. It also takes comma seperated list of indices or ranges to select a specific element or subarray.


In [142]:
a6 = np.array([
    [[11, 12, 13, 14],
     [13, 14, 15, 19]],

    [[15, 16, 17, 21],
     [63, 92, 36, 18]],

    [[98, 32, 81, 23],
     [17, 18, 19.5, 43]]])

print(a6.shape)
print("-------------")
# Single element
print(a6[2,1,2])
print("-------------")
#subarray using ranges
print(a6[1:, 0:1, :2])
print("-------------")
# Mixing indices and ranges
print(a6[1:, 1, 3])
print("-------------")
print(a6[1:, 1, :3])
print("-------------")
# Using fewer indices
print(a6[1])
print("-------------")
print(a6[:2,1])
print("-------------")
# Indexerror Using too many indices
print(a6[1,3,2,1])

(3, 2, 4)
-------------
19.5
-------------
[[[15. 16.]]

 [[98. 32.]]]
-------------
[18. 43.]
-------------
[[63.  92.  36. ]
 [17.  18.  19.5]]
-------------
[[15. 16. 17. 21.]
 [63. 92. 36. 18.]]
-------------
[[13. 14. 15. 19.]
 [63. 92. 36. 18.]]
-------------


IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

# How to Create Numpy Arrays – Other Methods




In [143]:
# All zeros
print(np.zeros((3, 2)))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# All ones
print(np.ones([2, 2, 3]))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Identity matrix
print(np.eye(3))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Random vector - random numbers between 0 and 1.
print(np.random.rand(5))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Random matrix - random numbers between 0 and 1, following the normal distribution.
print(np.random.randn(2, 3))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

#Random integers
print(np.random.randint(10, 20, 6))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Fixed value
print(np.full([2, 3], 42))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Range with start, end and step
print(np.arange(10, 90, 3))
print("~~~~~~~~~~~~~~~~~~~~~~~~~")

# Equally spaced numbers in a range
print(np.linspace(3, 27, 9))


[[0. 0.]
 [0. 0.]
 [0. 0.]]
~~~~~~~~~~~~~~~~~~~~~~~~~
[[[1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]]]
~~~~~~~~~~~~~~~~~~~~~~~~~
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
~~~~~~~~~~~~~~~~~~~~~~~~~
[0.09664555 0.19643125 0.67754233 0.13226249 0.60488857]
~~~~~~~~~~~~~~~~~~~~~~~~~
[[ 0.82169992  1.77118708  1.15275089]
 [-1.30309696  0.6665522  -0.08004355]]
~~~~~~~~~~~~~~~~~~~~~~~~~
[13 16 13 14 16 13]
~~~~~~~~~~~~~~~~~~~~~~~~~
[[42 42 42]
 [42 42 42]]
~~~~~~~~~~~~~~~~~~~~~~~~~
[10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79
 82 85 88]
~~~~~~~~~~~~~~~~~~~~~~~~~
[ 3.  6.  9. 12. 15. 18. 21. 24. 27.]


# Other Numpy methods

Below are some of the popular Numpy Methods

In [146]:
#Equally divide a range of numbers into intervals.
print(np.linspace(0, 1, 10))
print("-----------------------------")

#Identity Matrix A square matrix whose diagonal values are all 1
print(np.eye(5))
print("-----------------------------")

#Maximum and Minimum Value Of A NumPy Array
print(a6.max())
print(a6.min())
print("-----------------------------")

#Index of min and max value in numpy array
print(a6.argmax())
print(a6.argmin())

[0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]
-----------------------------
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
-----------------------------
98.0
11.0
-----------------------------
16
0


In [148]:
#calculate the square root, exponential, log, sin of every element in an array
print(np.sqrt(a6))
print(np.exp(a6))
print(np.log(a6))
print(np.sin(a6))

[[[3.31662479 3.46410162 3.60555128 3.74165739]
  [3.60555128 3.74165739 3.87298335 4.35889894]]

 [[3.87298335 4.         4.12310563 4.58257569]
  [7.93725393 9.59166305 6.         4.24264069]]

 [[9.89949494 5.65685425 9.         4.79583152]
  [4.12310563 4.24264069 4.41588043 6.55743852]]]
[[[5.98741417e+04 1.62754791e+05 4.42413392e+05 1.20260428e+06]
  [4.42413392e+05 1.20260428e+06 3.26901737e+06 1.78482301e+08]]

 [[3.26901737e+06 8.88611052e+06 2.41549528e+07 1.31881573e+09]
  [2.29378316e+27 9.01762841e+39 4.31123155e+15 6.56599691e+07]]

 [[3.63797095e+42 7.89629602e+13 1.50609731e+35 9.74480345e+09]
  [2.41549528e+07 6.56599691e+07 2.94267566e+08 4.72783947e+18]]]
[[[2.39789527 2.48490665 2.56494936 2.63905733]
  [2.56494936 2.63905733 2.7080502  2.94443898]]

 [[2.7080502  2.77258872 2.83321334 3.04452244]
  [4.14313473 4.52178858 3.58351894 2.89037176]]

 [[4.58496748 3.4657359  4.39444915 3.13549422]
  [2.83321334 2.89037176 2.97041447 3.76120012]]]
[[[-0.99999021 -0.5365

In [149]:
#Round each element to 2 decimal places
arr = np.random.rand(4)
print(arr)
print(np.round(arr, 2))
print("----------------------------------")

print(arr[arr > 0.7])

[0.91981576 0.53267625 0.3916807  0.13476192]
[0.92 0.53 0.39 0.13]
----------------------------------
[0.91981576]


**That's all for now!**