<a href="https://colab.research.google.com/github/machave11/Python---Data-Science/blob/main/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from skimage.io import imread 

# Numpy -  A multidimentional array object

In [3]:
# Generate some random data
data = np.random.randn(2,3)
print(data,'\n')
print(data * 10)

[[ 0.23742729 -0.52931968  0.50686927]
 [ 1.50572329 -0.58569508 -0.85955582]] 

[[ 2.37427294 -5.29319679  5.06869272]
 [15.05723287 -5.85695085 -8.5955582 ]]


In [4]:
#Every array has a shape, a tuple indicating the size of each dimension,
# A dtype, an object describing the data type of the array:
print("data shape = ", data.shape)
print("data dtype = ", data.dtype)

data shape =  (2, 3)
data dtype =  float64


# Creating ndarrays

In [5]:
data1 = [1, 2.5, 3, 7.5, 4]
arr1 = np.array(data1)
print(arr1)
arr1

[1.  2.5 3.  7.5 4. ]


array([1. , 2.5, 3. , 7.5, 4. ])

# Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐sional array:


In [6]:
data2 =[[1,2,3,4,5.5], [2.5,1.2,3.2,3,3,1.9]]
arr2 = np.array(data2)
print(arr2)
print("diamension of arr2=",arr2.ndim)
print("shape of arr2=",arr2.shape)

[list([1, 2, 3, 4, 5.5]) list([2.5, 1.2, 3.2, 3, 3, 1.9])]
diamension of arr2= 1
shape of arr2= (2,)


  


In [7]:
# Zero to identity matrices
print(np.zeros(10),'\n')
print(np.zeros((3, 6)),'\n')
print(np.eye(3))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]] 

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [8]:
# Arange is array valued built in python range function 
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]


#Data Types for ndarrays

In [9]:
arr1 = np.array([1,2,3,4], dtype=np.float64)
arr2 = np.array([1,3,2,5], dtype=np.int32)
print("data type of arr1=", arr1.dtype)
print("data type of arr2=", arr2.dtype)

data type of arr1= float64
data type of arr2= int32


In [10]:
#You can explicitly convert or cast an array from one dtype to another using ndarray’s
#astype method:
arr = np.array([1,3,4,5])
print(arr.dtype,'\n')

float_arr = arr.astype(np.float64)
print(float_arr.dtype)

int64 

float64


In [11]:
#If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr.astype(np.int32))

[ 3 -1 -2  0 12 10]


In [12]:
numeric_string = np.array(['1.23', '4.56', '5.65'])
print(numeric_string)
print(numeric_string.astype(np.float64))

['1.23' '4.56' '5.65']
[1.23 4.56 5.65]


# Arithmetic with NumPy Arrays

In [13]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
print(arr)
print('\n',arr.dtype)

[[1. 2. 3.]
 [4. 5. 6.]]

 float64


Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation ELEMENT-WISE:

In [14]:
print(arr * arr, '\n' )
print(arr + arr, '\n')
print(1 / arr)

[[ 1.  4.  9.]
 [16. 25. 36.]] 

[[ 2.  4.  6.]
 [ 8. 10. 12.]] 

[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]


In [15]:
#Comparisons between arrays of the same size yield boolean arrays:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2>arr

array([[False,  True, False],
       [ True, False,  True]])

# Basic Indexing and Slicing

In [16]:
arr = np.arange(10)
print(arr)
print(arr[5])
print(arr[5:8])
arr[5:8] = 12
print(arr)


[0 1 2 3 4 5 6 7 8 9]
5
[5 6 7]
[ 0  1  2  3  4 12 12 12  8  9]


In [17]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d[2])
print(arr2d[0][2])
print(arr2d[0,2])

[7 8 9]
3
3


#Indexing with slices

In [18]:
print(arr2d)
print()
print(arr2d[:2])

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[[1 2 3]
 [4 5 6]]


#Transposing Arrays and Swapping Axes

In [19]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [20]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [21]:
#For higher dimensional arrays, transpose will accept a tuple of axis numbers to per‐
#mute the axes (for extra mind bending):
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [22]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

#Universal Functions: Fast Element-Wise Array Functions

A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

Many ufuncs are simple element-wise transformations, like sqrt or exp:

In [26]:
arr = np.arange(10)
print(np.sqrt(arr))
print(arr)
print(np.exp(arr))

[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]
[0 1 2 3 4 5 6 7 8 9]
[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
 5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03
 2.98095799e+03 8.10308393e+03]


These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays
(thus, binary ufuncs) and return a single array as the result:

In [31]:
x = np.random.randn(10)
print(x)
y = np.random.randn(10)
print()
print(y)

[ 0.89505663  1.70766924 -1.77607996 -0.4789596   0.68579852 -1.15229331
 -1.28117146 -1.17213992 -0.9349124  -0.21458391]

[ 1.11916836 -0.96887216  0.79238701  0.23692975 -0.89300529 -1.1606944
  0.57142673 -1.05891984 -1.01106954  0.20875752]


Here, numpy.maximum computed the element-wise maximum of the elements in x and y.

# Array-Oriented Programming with Arrays

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectoriza‐ tion. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations

# Expressing Conditional Logic as Array Operations

In [32]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.5,2.6,2.7,2.8,2.9])
cond = np.array([True, False, True, True, False])

'''Suppose we wanted to take a value from xarr whenever the corresponding value in
cond is True, and otherwise take the value from yarr. A list comprehension doing
this might look like:''

In [33]:
result = [(x if c else y)
          for x, y, c in zip(xarr, yarr, cond)]
result

[1.1, 2.6, 1.3, 1.4, 2.9]

This has multiple problems. First, it will not be very fast for large arrays (because all the work is being done in interpreted Python code). Second, it will not work with multidimensional arrays. With np.where you can write this very concisely:

In [34]:
result = np.where(cond, xarr, yarr)
result

array([1.1, 2.6, 1.3, 1.4, 2.9])

The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2. This is very easy to do with np.where:

In [39]:
arr = np.random.randn(4,4)
print(arr)
print()
print(arr>0)
print()
print(np.where(arr > 0, 2, -2))

[[-1.80909063  0.25080459 -0.19943397 -0.26299628]
 [ 1.2372832  -1.12086635  0.45183309 -0.54264353]
 [ 1.66280753 -1.87749533 -2.63102169  1.05572336]
 [-2.09762885  0.35086834  0.91668062  0.48385779]]

[[False  True False False]
 [ True False  True False]
 [ True False False  True]
 [False  True  True  True]]

[[-2  2 -2 -2]
 [ 2 -2  2 -2]
 [ 2 -2 -2  2]
 [-2  2  2  2]]


# Mathematical and Statistical Methods

You can use aggregations (often called reductions) like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function

In [40]:
arr = np.random.randn(5, 4)
print(arr)

[[ 1.25040319  0.85199599  0.54991495  0.72825115]
 [ 1.41105493  0.40559748 -0.02199548 -1.62269658]
 [-0.63356295 -0.57489727  1.18343599 -1.04675327]
 [-1.05190334 -0.43675853  0.79296819  0.13268241]
 [-2.1828166   0.5601189  -0.14147502  0.18565952]]


In [46]:
print("mean:", arr.mean())
print("mean:", np.mean(arr))
print("mean:", arr.sum())
print("mean:", np.sum(arr))

mean: 0.016961183187938093
mean: 0.016961183187938093
mean: 0.3392236637587619
mean: 0.3392236637587619


In [48]:
print(arr.mean(axis=1))
print(arr.sum(axis=0))

[ 0.84514132  0.04299009 -0.26794438 -0.14075282 -0.3946283 ]
[-1.20682477  0.80605657  2.36284863 -1.62285677]


Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0) means “compute sum down the rows.”

# Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:

# Numpy offers a separate module for linear algebra named linalg

In [52]:
mat1 = np.arange(4).reshape(2,2)
mat2 = (np.arange(4)*2).reshape(2,2)
mat3 = (np.arange(4)*3).reshape(2,2)
print(np.linalg.multi_dot( [mat1, mat2, mat3] ))

[[ 36  66]
 [132 234]]


# Performing multiple dot product in one go 


As we see from above, calculating multiple dot product can be done just by passing the matrices as list into multi_dot function. This depicts the ease of use of linalg module

# 3X + Y = 9
# X + 2Y = 8

In [56]:
a = np.array([[3, 1],[1, 2]])
b = np.array([9, 8])
''' Checking if system of equation has unique solution '''
print(np.linalg.det(a)) 
# 5.0
''' Since det = 5 which is non-zero. Hence, we have unique solutions
 Finding unique solution '''
print(np.linalg.solve(a, b))
# [ 2.  3.]
''' Calculating Inverse: Since, determinant is non-zero 
 hence, matrix is invertible '''
print(np.linalg.inv(a))
# [[ 0.4 -0.2]
#  [-0.2  0.6]]
''' Calculating Rank of the matrix '''
print(np.linalg.matrix_rank(a))
# 2 

5.000000000000001
[2. 3.]
[[ 0.4 -0.2]
 [-0.2  0.6]]
2
