### Numpy

#### 1. Introduction
- What is Numpy?
    - Numpy short for numerical python is a python library that is used for scientic calculations.
- What does it offer?
    - It helps us create and work with arrays and matrices of various dimensions in a more efficient manner than the inbuilt python array method.
    - It provides various mathematical methods like Fourier transforms, linear algebra methods etc, which allows us to perform complex calculations.
    - Broadcasting: This is a special feature of numpy which allows us to perform operations between arrays of varying dimensions.
- How is it useful?
    - It is written in C++, which makes it extremely fast.
    - It is also integrated into other scientific python packages like scipy, pandas, which makes it an essential package to learn to work scientifically with python.
    - It is very concise, leading to a very readable code.

#### 2. Numpy setup
- Install Numpy using the command `py -m pip install numpy`
- Import numpy to use in our program

In [155]:
import numpy as np
print(np.__version__)

2.2.1


#### 3. Creating Arrays

In [156]:
array1d = np.array([1,2,3,4])
print(array1d)

array2d = np.array([[1,2,3],[4,5,6]])
print(array2d)

array3d = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(array3d)

[1 2 3 4]
[[1 2 3]
 [4 5 6]]
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


##### Creating random arrays

In [157]:
#generate a random array with values in range [0,1) (1 exclusive) of provided size 
randArr = np.random.rand(3,3)
print(randArr)

#generate random integer array with provided range
randIntArr = np.random.randint(low=23, high=45, size=(2,2))
print(randIntArr)

#generate random floats array with provided range
randRangeArr = np.random.uniform(low=30, high=35, size=(10))
print(randRangeArr)

[[0.98674506 0.05530423 0.78053462]
 [0.18425695 0.59656217 0.16614951]
 [0.68657949 0.61472462 0.29051474]]
[[32 23]
 [39 33]]
[33.02289866 32.82901884 32.78387586 33.31930172 34.66213353 33.93813409
 34.88302631 30.44533435 34.34459006 34.28632771]


##### Creating arrays with incremental values using arange method

In [180]:
print(np.arange(start=1, stop=10, step=2)) #stop is exclusive
print(np.arange(start=1, stop=10, step=1)) #stop is exclusive
print(np.arange(12)) 

[1 3 5 7 9]
[1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


: 

##### Other inbuilt methods to generate array

In [158]:
print(np.ones(shape=(5,5)))
print(np.zeros(shape=(10)))

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


#### 4. Array Properties

In [159]:
print(array1d)
print(array2d)

#shape of the array
print(array1d.shape)
print(array2d.shape)

#size of the array - no of elements
print(array1d.size)
print(array2d.size)

#data type of the elements in the array
print(array1d.dtype)

# lets see what data type is returned if data types are mixed
print(np.array([True,1,'a']).dtype) #<U21
print(np.array(['a']).dtype) #<U1

# lets see a weird behaviour for the data type returned below
print(np.array(True).dtype) #bool
print(np.array([True, 2]).dtype) #this returns int64 


[1 2 3 4]
[[1 2 3]
 [4 5 6]]
(4,)
(2, 3)
4
6
int64
<U21
<U1
bool
int64


#### 5. Array Indexing

In [160]:
print(array1d,array1d[0])

print(array2d, array2d[1,1])

print(array3d, array3d[0,1,1])

#Negative indices works as well
print(array1d, array1d[-1])


[1 2 3 4] 1
[[1 2 3]
 [4 5 6]] 5
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]] 5
[1 2 3 4] 4


#### 6. Array Slicing

In [161]:
print(array1d)
print(array1d[1:3]) #start = 1, end = 3 (end index is exclusive)


#Getting rows and columns, and other sub arrays by slicing higher dimensional arrays
array2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(array2d)
print(array2d[1,:]) #second row - slice the second row with all the columns
print(array2d[:,0]) #first column - slice all rows of the 0 index column
print(array2d[-2:,1]) #middle column only last 2 rows - slice last 2 rows and only of the middle column

print(array3d)
print(array3d[:,0,1]) #capture from 3d array the second column element of the first row, from all the available matrices
print(array3d[:,-1,:]) #capture final rows from all the matrices stored in 3d array
print(array3d[:,0,1:3]) #capture 2nd and 3rd col of the first row from all the matrices stored in 3d array

[1 2 3 4]
[2 3]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[4 5 6]
[1 4 7]
[5 8]
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
[2 2]
[[4 5 6]
 [4 5 6]]
[[2 3]
 [2 3]]


#### 7. Array Operations

##### Arithmetic Operations
-  we can perform arithmetic operations like addition, subtraction, multiplication and division between the arrays



In [162]:
arr1 = np.array([1,2,3])
arr2 = np.array([3,4,5])

"""
if elements are of same shape, the corresponding elements at the same index in the two array objects 
are multiplied and we in return get an array of the same shape with the products
"""
print(arr1 * arr2)

arr2d1 = np.array([[1,2,3],[4,5,6]])
print(arr2d1 * arr2d1)

#Similarly we can do rest of the operations on the arrays

#For the sake of experimentation, let's see how arithmetic operations on arrays of different shapes react
arr1 = np.array([1,2])
arr2 = np.array([4,5,6,7])

# print(arr1 * arr2) #operands could not be broadcast together with shapes (2,) (4,) 

[ 3  8 15]
[[ 1  4  9]
 [16 25 36]]



##### Scalar Operations
- Operations with scalar operators aka single numbers

In [163]:

print(arr1)
print(arr1 * 8)

print(arr2)
print(arr2 - 10)

#For the sake of experimentation, let's try to do a scalar addition on an array of strings
strarr = np.array(['a','b','c'])
print(strarr + 'c') #Hey it works - it concatenates

[1 2]
[ 8 16]
[4 5 6 7]
[-6 -5 -4 -3]
['ac' 'bc' 'cc']


#### 8. numpy operations on string arrays (To be covered later)

#### 9. Ufuncs (Universal functions)
- numpy provides a vast number of universal functions that perform mathematical operations on the arrays
- The different categories of Ufuncs are as follows:

##### Arithmetic uFuncs


In [164]:
#we can provide the arguments as direct arrays or as numpy arrays
print(np.add([1,2], [3,4]))
print(np.add(np.array([1,2]), np.array([3,4])))

#modulus
print(np.mod([5,34],5))

#for more precise modulus on float values
print(np.fmod([5.3345,34.4353525],5))

#power
print(np.pow([1,2],8))

#more such functions are available for addition, subtraction etc.

[4 6]
[4 6]
[0 4]
[0.3345    4.4353525]
[  1 256]



##### Trignometric uFuncs


In [165]:
print(np.sin([0, np.pi/2]))
print(np.cos([0, np.pi/2]))
print(np.tan([0, np.pi/2]))

[0. 1.]
[1.000000e+00 6.123234e-17]
[0.00000000e+00 1.63312394e+16]



##### Exponentiation and Logarithmic uFuncs (not that important, but just know it's there)

##### Comparison uFuncs

In [166]:
# checks if corresponding element of array 1 greater than array2
print(np.greater([1, 5], [3, 4]))

print(np.less([[1, 2], [3, 4]], [[5, 6], [7, 1]]))

print(np.equal([1,2],[1,2]))

print(np.greater_equal([4,5],[4,3]))

#and some more useful comparison operators are available

[False  True]
[[ True  True]
 [ True False]]
[ True  True]
[ True  True]



##### Statistical uFuncs
- To extract statistical information from an array, or perform cumulative calculation on an array

In [167]:
print(np.prod([1,2,3,4]))
print(np.mean([1,2,3,4]))
print(np.median([1,2,3,4]))
print(np.max([1,2,3,4]))
print(np.min([1,2,3,4]))
print(np.sum([1,2,3,4]))

#And more such statistical functions are available

24
2.5
2.5
4
1
10


##### Set uFuncs
- perform set related operation on arrays


In [168]:
#Removes duplicates from an array
print(np.unique([1,2,3,4,1]))
#if higher dimension array, unique function will flatten the array and perform the unique operation
print(np.unique([[1,2,3,4,1],[1,2,3,56,42]]))

#union for 1dimensional array
print(np.union1d([1,2,3],[4,5,6]))

#intersection for 1d array
print(np.intersect1d([1,2,3],[4,5,6,1]))

[1 2 3 4]
[ 1  2  3  4 42 56]
[1 2 3 4 5 6]
[1]


##### Other useful uFuncs

In [169]:
#Generates absolute value, removes negative symbol
print(np.abs([-1.4455,345.34534,-4]))

#Generates indication based on positive(1) or negative(-1) values 
print(np.sign([-1,1,98,-9]))


#clip the array to the provided min and max range
#if below, clips to min
#if above, clips to max
print(np.clip(a=[1,45,49,99], min=40, max=50)) 

[  1.4455  345.34534   4.     ]
[-1  1  1 -1]
[40 45 49 50]


#### 10. Broadcasting
- Broadcasting is nothing but numpy intelligently reshaping arrays during mathematical operation when there a arrays of disimilar operations involved in the operation

In [170]:
arr1 = [1,2,3]
arr2 = [10]

print(np.add(arr1, arr2)) #output: [11, 12, 13] 
# the second array [10] is automatically reshaped implicitly to [10,10,10] during the operation

[11 12 13]


##### Rules of broadcasting
1. shape compatibility: 
    - The arrays must have the same rank (i.e same number of dimensions)
    - The dimensions of the arrays must be compatible
        - Equal dimensions
        - One of the dimension must have a size 1
2. Implicit expansion:
    - If the array does not have equal dimensions, then the array with dimension 1 is stretched by padding the remaning values with that to the left.

Note: During broadcasting, the smaller array is stretched, but the values are not copied, so the calculations are efficient. (Need better understanding on this)

##### More examples of broadcasting

In [171]:

arr1 = [[1, 2, 3], [4, 5, 6]]
arr2 = [10, 20, 30]

"""
For the above arrays, let's examine if they adhere to the rules of broadcasting
- shape of arr1: (2,3)
- shape of arr2: (3) - this can also be imagined as (1,3) 1 row with 3 columns

There's a tip to reimagine shapes of arrays with lower dimensions,
- if you want to reimagine a dimension of (3) into a 2 dimensional array, 
- push the available dimension size to the right most place, and pad the remaining with 1
- so (3) in 2 dimensions becomes (1,3)
- similarly (3) in 3 dimensions becomes (1,1,3)

Let's first check if the arrays are compatible based on the broadcasting rules:
 - Shape compatibility:
    - Are the ranks same? 
        - The ranks are same, (2,3) and (1,3)
    - Are the dimensions equal?
        - No, then is atleast one of the dimension size 1?
            - Yes, arr2 has a dimension with size 1.

- Implicit expansion:
    - The dimensions are not equal, the padding will be done along the dimension, which has size 1.
    - so now our arr2 which is [[10, 20, 30]] will be stretched by padding with the value to the left, 
    - [[10,20,30],[<padded value copied from the left>]] which now becomes [[10,20,30],[10,20,30]]
"""

print(np.add(arr1, arr2))

# Now let's check the compatibility of the below array with another arr1
arr3 = [100, 200]  # shape is (2) - can be imagined in 2 dimensions as (1,2)

# As per the rules of broadcasting arr1(2,3) and arr2(1,2) are not compatible, so an error is thrown
#print(np.add(arr1, arr3)) #ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

[[11 22 33]
 [14 25 36]]


In [172]:
arr1 = [[1, 2], [2, 3], [4, 5]]  # shape (3,2)
arr2 = [[10], [20], [30]]  # shape (3,1) - compatible with arr1
arr3 = [10, 20, 30]  # shape (3) imagined as (1,3) - incompatible with arr1

# so now arr2 is implicitly expanded from [[10],[20],[30]] to [[10,10],[20,20],[30,30]]
# as the expansion happens in the dimension of size 1, which is the column dimension,
# and padded using the value to the left

#so implicitly addition takes place between [[1, 2], [2, 3], [4, 5]] and [[10,10],[20,20],[30,30]]
print(np.add(arr1, arr2))

#and let's confirm the incompatibility between arr1 and arr3
# print(np.add(arr1, arr3)) #ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

"""
Now let's see the shapes of arr2(3,1) and arr3(1,3) - they are compatible
But here the padding will happen in both arr2 and arr3, as they are matched to the shape of (3,3)
in arr2, the padding will happen in the column
    - arr2 [[10], [20], [30]] will become [[10,10,10],[20,20,20],[30,30,30]]
in arr1, the padding will happen in the row
    - arr1 [10,20,30] will become [[10,20,30],[10,20,30], [10,20,30]]
"""

#So, here the addition takes place between  [[10,10,10],[20,20,20],[30,30,30]] and [[10,20,30],[10,20,30], [10,20,30]]
print(np.add(arr2, arr3))

[[11 12]
 [22 23]
 [34 35]]
[[20 30 40]
 [30 40 50]
 [40 50 60]]


#### 11. Array reshaping
- NOTE: Array reshaping can only be applied on numpy arrays
- Below are the 2 main functions used for array reshaping
- `reshape()`:
    - Used to reshape an array
    - arguments provided is the dimension
    - only rule: the size of the array (i.e total no of elements) must remain the same post reshaping
    - returns the reshaped array, does not change the shape of the array on which this function is called
- `ravel()`:
    - used to flatten any array into a 1d array

##### reshape


In [176]:
arr1 = np.array([10,20,30])
print(arr1.reshape(3,1)) #returns the reshaped array, arr1 is not changed

arr2 = np.array([[1,2],[3,4],[5,6]])
print(arr2.reshape(2,3))


[[10]
 [20]
 [30]]
[[1 2 3]
 [4 5 6]]


##### ravel

In [177]:
print(arr2)
print(arr2.ravel())

[[1 2]
 [3 4]
 [5 6]]
[1 2 3 4 5 6]



##### ravel