## What is Numpy?
* Numpy is an extension library (package) for Python.
* Numpy is written specifically to work with homogenous multidimensional arrays .
* Numpy is designed for scientific computing.

## Basics
* Numpy's main class is <tt class="docutils literal"><span class="pre">ndarray</span></tt> which also has an alias <tt class="docutils literal"><span class="pre">array</span></tt>.
* Useful attributes of
an <tt class="docutils literal"><span class="pre">ndarray</span></tt> object are:

    * <tt class="docutils literal"><span class="pre">ndarray.ndim</span></tt>: the number of dimensions of the array.
    
    * <tt class="docutils literal"><span class="pre">ndarray.shape</span></tt>: A tuple of integers showing the size of the array in each dimension.
    
    * <tt class="docutils literal"><span class="pre">ndarray.size</span></tt>: Total number of elements in the array.
    
    * <tt class="docutils literal"><span class="pre">ndarray.dtype</span></tt>: Type of the array elements.

## Importing Numpy
* The first thing to do before we can work with NumPy is to import the library into the workspace.

In [2]:
import numpy as np

## Creating one dimensional arrays
* Arrays can be created from Python sequences such as a list or a tuple. The type of resulting array depends on the type of elements in the sequences.
* The easiest way to create an array is to use the <b>array</b> function.


In [3]:
data = [1,2,3,4,5,6]  # this is a python list
arr = np.array(data)
arr

array([1, 2, 3, 4, 5, 6])

In [4]:
data1 = np.arange(10)
data1
print(type(data1))

<class 'numpy.ndarray'>


## Creating multidimensional arrays
* Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array

In [5]:
mulData = [[1, 2, 3, 4], [5, 6, 7, 8]]
mulArr = np.array(mulData)
mulArr

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

## Checking the properties of multidimensional arrays

In [6]:
print("The number of dimensions in mulArr : ", mulArr.ndim)
print("The shape of mulArr : ", mulArr.shape)
print("The size of mulArr : ", mulArr.size)
print("The data type of mulArr : ", mulArr.dtype)

The number of dimensions in mulArr :  2
The shape of mulArr :  (2, 4)
The size of mulArr :  8
The data type of mulArr :  int32


## Other arrays
* Zero array/matrix

In [7]:
zeroArr = np.zeros(10)
zeroArr

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [8]:
zeroMat = np.zeros((10,10))
zeroMat

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

* Identity matrix

In [9]:
identityArr = np.eye(5)
identityArr

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

* Empy matrix initialized to garbage values

In [10]:
emptyMat = np.empty((2,2,2))
emptyMat

array([[[0.000e+000, 0.000e+000],
        [0.000e+000, 0.000e+000]],

       [[0.000e+000, 6.225e-321],
        [0.000e+000, 0.000e+000]]])

## Other array creation functions
![title](ArrayCreationFunctions.PNG)

## Converting data types of ndarrays

In [12]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int32')

In [17]:
convertedArray = arr.astype(np.float64)
print(convertedArray)
print(convertedArray.dtype)

[1. 2. 3. 4. 5.]
float64


* Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.

## Other data types
![title](NumpyDataType.PNG)


## Operations between Arrays and Scalars
* Arrays are important because they enable you to express batch operations on data without writing any for loops. 
* This is usually called <b>vectorization</b>. 
* Any arithmetic operations between equal-size arrays applies the operation elementwise.

In [19]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [20]:
arr * arr 

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [22]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [24]:
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [26]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

## Broadcasting
Broadcasting is NumPy's terminology for performing mathematical operations between arrays with different shapes.

#### Broadcasting Rules

1. If all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.

2. If an array has a size of 1 along a particular dimension it acts as if it has the size of the largest shape along that dimension by copying along that dimension.

![title](broadcasting.png)

In [31]:
a = np.array([[[1,3,4],[9,8, 5]],[[2,1,5],[8,6,2]]])
b = np.array([6,5,3])
print("Shape of a :", np.shape(a))
print("Shape of b :",np.shape(b))
c = a + b
print(c) # Elementwise addition with broadcasting
print("Shape of c :", np.shape(c))

Shape of a : (2, 2, 3)
Shape of b : (3,)
[[[ 7  8  7]
  [15 13  8]]

 [[ 8  6  8]
  [14 11  5]]]
Shape of c : (2, 2, 3)


## Indexing and Slicing of ndarrays
* One dimensional arrays behave like Python lists

In [34]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [36]:
arr[5]

5

In [38]:
arr[5:8]

array([5, 6, 7])

## Array slices are views of an array, so any changes made on the slice will be reflected back in the main array

In [41]:
arr[5:8] = 24
arr

array([ 0,  1,  2,  3,  4, 24, 24, 24,  8,  9])

In [44]:
arr_slice = arr[5:8]
arr_slice

array([24, 24, 24])

In [46]:
arr_slice[:] = 200
print(arr_slice)
print(arr)

[200 200 200]
[  0   1   2   3   4 200 200 200   8   9]


* The main reason of not copying data over when selecting a slice is because NumPy is meant to be used with extremely large datasets. If for every slicing and indexing action, a copy of the data is made then the data would start growing in the memory. If you need to make a copy of the slice then you need to explicitly use the copy function.

In [53]:
arr_copied_slice = arr[5:8].copy()
arr_copied_slice

array([200, 200, 200])

In [54]:
arr_copied_slice[:] = 100
arr_copied_slice

array([100, 100, 100])

In [56]:
arr

array([  0,   1,   2,   3,   4, 200, 200, 200,   8,   9])

* <b>Indexing on multidimensional arrays</>

In [100]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d)
print("Shape of arr2d is : ", arr2d.shape)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Shape of arr2d is :  (3, 3)


In [101]:
# Get the first row from the matrix
arr2d[0]

array([1, 2, 3])

In [102]:
# Get all the rows except the first one
arr2d[1:]

array([[4, 5, 6],
       [7, 8, 9]])

In [103]:
# Get the first column from the matrix
arr2d[:,0]

array([1, 4, 7])

In [104]:
# Get all columns except the first one
arr2d[:,1:]

array([[2, 3],
       [5, 6],
       [8, 9]])

In [105]:
# Get a specific value from the array
arr2d[1,1]

5

In [107]:
arr2d[0:2, 0:1]

array([[1],
       [4]])

In [108]:
arr2d[:,1:3]

array([[2, 3],
       [5, 6],
       [8, 9]])

![title](SlicingandIndexing.PNG)

## In multidimensional arrays, if you omit later indices, the returned object will be a lower-dimensional ndarray consisting of all the data along the higher dimensions. So in the 2 × 2 × 3 array arr3d

In [88]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3d)
print("Shape of arr3d is : ", arr3d.shape)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
Shape of arr3d is :  (2, 2, 3)


In [90]:
print(arr3d[0])
print(arr3d[0].shape)

[[1 2 3]
 [4 5 6]]
(2, 3)


In [109]:
# Get a specific value in the array
# Suppose 8
print("Method 1 : ", arr3d[1][0][1]) # One way to specify the indices
print("Method 2 : ", arr3d[1,0,1] ) # Other way to specify the indices

Method 1 :  8
Method 2 :  8


## Boolean Indexing and Fancy Indexing

In [123]:
data = np.random.randn(7, 4)
data

array([[-0.44167523, -1.97200878,  0.73650949, -0.91808634],
       [ 0.48332645, -2.13185797, -1.6858922 , -0.74864627],
       [ 0.93635429,  0.71244322,  1.48521273,  0.34951415],
       [-1.29792783, -0.13529247, -0.33548242,  0.32998531],
       [-1.06636579,  0.59646383,  0.59677129,  0.13626078],
       [ 1.4119929 , -0.667308  ,  0.30486825, -0.32670151],
       [-0.38431003,  1.62260944, -0.55842758, -0.52550116]])

In [124]:
# Boolean values based on the condition
data>0

array([[False, False,  True, False],
       [ True, False, False, False],
       [ True,  True,  True,  True],
       [False, False, False,  True],
       [False,  True,  True,  True],
       [ True, False,  True, False],
       [False,  True, False, False]])

In [125]:
data[data>0]

array([0.73650949, 0.48332645, 0.93635429, 0.71244322, 1.48521273,
       0.34951415, 0.32998531, 0.59646383, 0.59677129, 0.13626078,
       1.4119929 , 0.30486825, 1.62260944])

In [127]:
data[data<0] = 0
data

array([[0.        , 0.        , 0.73650949, 0.        ],
       [0.48332645, 0.        , 0.        , 0.        ],
       [0.93635429, 0.71244322, 1.48521273, 0.34951415],
       [0.        , 0.        , 0.        , 0.32998531],
       [0.        , 0.59646383, 0.59677129, 0.13626078],
       [1.4119929 , 0.        , 0.30486825, 0.        ],
       [0.        , 1.62260944, 0.        , 0.        ]])

In [139]:
arr = np.empty((10,10))

for i in range(10):
    arr[i] = i
    
arr

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

In [141]:
arr[[7,8,9]]

array([[7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

In [143]:
arr[[-3,-2,-1]] # example of fancy indexing

array([[7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

In [149]:
arr[-3:,:] # example of fancy indexing 

# Fancy indexing makes a copy fo the data slice

array([[7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

In [153]:
arr_slice = arr[-3:,:]
arr_slice = 56
arr

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [23., 23., 23., 23., 23., 23., 23., 23., 23., 23.],
       [23., 23., 23., 23., 23., 23., 23., 23., 23., 23.],
       [23., 23., 23., 23., 23., 23., 23., 23., 23., 23.]])

## Shape Manipulation
* The function <tt class="docutils literal"><span class="pre">reshape</span></tt> changes the shape of an array.

In [155]:
a=np.linspace(1,6,6 ) # Create a 1 by 6 array
b=a.reshape(3,2) # Reshape the array into a 3 by 2 array
print("a = ",a)
print("b = ",b)

a =  [1. 2. 3. 4. 5. 6.]
b =  [[1. 2.]
 [3. 4.]
 [5. 6.]]


In [157]:
a = np.array([[1,2,3],[9,8,7]]) # create a 2 by 3 array
b=a.ravel()
print("a = ",a)
print("b = ",b)

a =  [[1 2 3]
 [9 8 7]]
b =  [1 2 3 9 8 7]


## Universal Functions: Fast Element-wise Array Functions
* A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [159]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [161]:
np.sqrt(arr) # Unary functions because they take only one input

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [162]:
np.square(arr)

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

In [169]:
a = np.arange(10)
b = np.arange(11,21)
print(a)
print(b)
np.add(a,b) # Binary funcs

[0 1 2 3 4 5 6 7 8 9]
[11 12 13 14 15 16 17 18 19 20]


array([11, 13, 15, 17, 19, 21, 23, 25, 27, 29])

## More Unary and Binary Funcs
![title](UnaryFuncs.PNG)
![title](BinaryFuncs.PNG)

## Transposing Arrays
* Transposing is a special form of reshaping which similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute

In [173]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [175]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [178]:
arr.transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

## Mathematical and Statistical Methods
* A set of mathematical functions which compute statistics about an entire array or about the data along an axis are accessible as array methods. Aggregations (often called reductions) like sum, mean, and standard deviation std can either be used by calling the array instance method or using the top level NumPy function

In [180]:
arr = np.random.randn(5, 4)
arr

array([[-0.16729736,  1.60096175,  1.17475752, -0.0229628 ],
       [ 0.42084669, -0.20722808,  1.42739655, -0.17164631],
       [ 1.47865594,  0.23397392,  0.35566835, -1.55927152],
       [ 1.10564572,  0.24653163,  0.72569172,  0.4056782 ],
       [-0.14831486,  0.62073961, -1.31847265, -0.59474493]])

In [181]:
arr.mean()

0.2803304549175599

In [182]:
np.mean(arr)

0.2803304549175599

In [184]:
arr.sum()

5.606609098351198

In [186]:
arr[0,:].sum() # It takes the sum of all values along axis = 1 i.e it adds all the column values for that row

2.585459117175821

In [188]:
arr.sum(axis = 1) 

array([ 2.58545912,  1.46936886,  0.50902669,  2.48354726, -1.44079283])

In [191]:
arr.sum(axis = 0) 

array([ 2.68953612,  2.49497884,  2.36504149, -1.94294735])

## More statistical methods
![title](Stats.PNG)

## Random Number Generation
* The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

In [192]:
samples = np.random.normal(size=(4, 4)) # Normal distribution
samples

array([[-1.64170625, -2.17406365,  0.33684239, -0.24918339],
       [ 2.49938082, -0.02790401,  1.44144598, -1.24432311],
       [-0.53293175, -0.4650931 ,  0.99184992, -0.94799895],
       [-0.12765985, -0.95166656, -0.44826878,  0.16545961]])

In [196]:
samples = np.random.randn(4,4) #  Normal distribution with mean 0 and standard deviation 1
samples

array([[-0.13538805,  0.81632488,  1.27598777, -0.55188616],
       [ 0.15316133, -0.62950222, -1.05543509,  1.93330572],
       [ 0.87048813, -0.06435712,  0.16863916, -0.35082557],
       [ 0.16646026,  0.78495259,  0.64916174,  0.0648836 ]])

## More Random Generator Functions
![title](RAndom.PNG)

## Linear Algebra
* numpy.linalg package provides build in linear algebra functions

In [204]:
from numpy.linalg import inv, qr
X = np.random.randn(3, 3)
X

array([[-1.20100954,  0.02563032, -0.19475529],
       [ 0.59588279, -0.2812012 ,  0.65247839],
       [ 0.56961918, -0.01305011, -1.59246025]])

In [205]:
mat = X.T.dot(X)
mat

array([[ 2.12196623, -0.2057788 , -0.28439229],
       [-0.2057788 ,  0.07990133, -0.16768757],
       [-0.28439229, -0.16768757,  2.99958732]])

In [206]:
inv(mat)

array([[ 0.71518659,  2.24794269,  0.19347515],
       [ 2.24794269, 21.24459947,  1.40077699],
       [ 0.19347515,  1.40077699,  0.43003107]])

In [207]:
mat.dot(inv(mat)) # Get an identity matrix

array([[ 1.00000000e+00,  1.17959106e-15,  1.87617434e-17],
       [-3.67596671e-17,  1.00000000e+00, -4.54966527e-17],
       [ 2.91996619e-17,  5.95734177e-16,  1.00000000e+00]])

## More Linear Algebra built in functions
![title](LinAlg.png)

## File Input and Output with Arrays
* NumPy is able to save and load data to and from disk either in text or binary format. 
* NumPy creates a .npy file on disk

In [211]:
arr = np.arange(20).reshape(5,4)
print(arr)
np.save('some_array', arr)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


In [214]:
np.load('some_array.npy')

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])