References:
1. https://numpy.org/doc/stable/
2. https://numpy.org/doc/stable/reference/index.html#reference
3. https://www.learndatasci.com/tutorials/applied-introduction-to-numpy-python-tutorial/
4. https://www.analyticsvidhya.com/blog/2020/04/the-ultimate-numpy-tutorial-for-data-science-beginners/
5. https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html
6. https://www.dataquest.io/blog/numpy-tutorial-python/
7. https://github.com/rougier/numpy-100
8. https://github.com/santoshswansi/Data-Analysis-With-Python/blob/main/NUMPY%20MODULE.ipynb
9. https://github.com/nrbnbs/Python_libraries/blob/main/PythonLibraries.ipynb

## NumPy Tutorial based on Official [Documentation of Numpy](https://numpy.org/doc/stable/) 

### NumPy is a short form of <u>Num</u>erical <u>Py</u>thon

- NumPy is the [fundamental library for scientific computing](https://numpy.org/doc/stable/) in Python. 
- It provides a **multidimensional array object**, various **derived objects** (such as masked arrays and matrices), and an many **routines** for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
- At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. NumPy fully supports an object-oriented approach, starting, with the class "ndarray". 

**Differences between ndarrays and python sequences:**

1. NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

2. All elements in NumPy array need to be of same datatype hence of same size in memory. However, a NumPy array of Python objects can have different sized objects. 

3. Operations on NumPy arrays are faster.

4. Many advanced scientific and mathematical python packages use NumPy arrays for efficient data manipulation. 



**Why is NumPy Fast?**

NumPy is fast due to **Vectorization**. 
Vectorization means that the user code does not have to use any explicit looping, indexing, etc.. The loops and indexing takes place behind the scenes in pre-compiled C code. Thus user code is more readable, concise and less prone to bugs.

NumPy is fast due to **Broadcasting**. 
Broadcasting means implicit element-by-element behavior of operations; i.e., all NumPy operations behave in element-by-element fashion, i.e., they broadcast. 


**NumPy's array class: ndarray**

**ndarray** or **array** is the class that encapsulates a multidimensional array. Some of its important attributes are:

1. ndarray.ndim - the number of axes (dimensions) of the array.
2. ndarray.shape - a tuple of integers indicating the size of the array in each dimension. the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. The length of the shape tuple is the number of axes, ndim.
3. ndarray.size - the total number of elements of the array. 
4. ndarray.dtype - an object describing the type of the elements in the array. 
5. ndarray.itemsize - the size in bytes of each element of the array. 
6. ndarray.data - the buffer containing the actual elements of the array. 


### Creation of numpy array. 
There are numerous methods to create a numpy array.
We are assuming that you already have installed NumPy in your Python environment. 

In [174]:
# Importing numpy and creating an ndarray using numpy function array()
import numpy as np
# Lets create a 1-Dimensional array
a = np.array([2, 3, 4])
a

array([2, 3, 4])

In [175]:
# 2-D array, two rows, three columns
b = np.array([(1.5, 2, 3), (4, 5, 6)])
b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [2]:
# Creating ndarray using another numpy function arange()
import numpy as np
a = np.arange(15).reshape(3, 5)  # Reshape the 15 values produced by arange(15) into 3 rows and 5 columns
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [6]:
np.zeros((3, 4)) # Create a 2D array of all zeros arranged in three rows, four columns

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [176]:
# Create a 3D array of two 2D arrays , each 2D array has 3 rows, 4 columns. Data type of each element is 16 bit Integer
np.ones((2, 3, 4), dtype=np.int16) 

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [177]:
# Create an array with arbitary values arranged as two rows and 3 columns
np.empty((2, 3))

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [180]:
# Create a 3x4 array containing -1 as elements in all positions
np.full((3,4),-1)

array([[-1, -1, -1, -1],
       [-1, -1, -1, -1],
       [-1, -1, -1, -1]])

In [181]:
# Create an array containing random 5 integers between 10 and 20
a=np.random.randint(10,20,5)
a

array([13, 11, 19, 10, 17])

### Universal Functions "ufuncs"
NumPy provides mathematical functions such as add, sub, sqrt, exp, sin, cos , all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where etc. These functions operate elementwise on an array and produce an array as output. 

In [182]:
#arange() is a numpy function similar in functionality to Python core library's range() function
b = np.arange(3)
b

array([0, 1, 2])

In [183]:
np.exp(b) # exponent of every value in array b

array([1.        , 2.71828183, 7.3890561 ])

In [184]:
c = np.array([2., -1., 4.])
np.add(c, b) #add arrays c and b

array([2., 0., 6.])

In [187]:
# Set operations - intersection and difference on two arrays
a=np.array([1, 2, 3, 4, 5])
b=np.array([4, 5, 6, 7, 8])
print(np.intersect1d(a,b))
print(np.setdiff1d(a,b))

[4 5]
[1 2 3]


In [192]:
# Mean,  median, standard deviation
print(a.mean())
print(np.mean(a))
print(np.median(a))
print(np.std(a))

3.0
3.0
3.0
1.4142135623730951


In [195]:
# Saving an ndarray to a file and loading an ndarray from a file
# extension of these files is .npy
np.save('a_arr_file',a)
a_fromfile = np.load('a_arr_file.npy')
a_fromfile

array([1, 2, 3, 4, 5])

#### Using magic command %load to display contents of a text file in jupyter notebook

In [None]:
# %load data_file.txt
1, 2, 3, 4, 5, 6, 7

#### Generating a numpy array by loading data from a text file

In [227]:
# Generating a numpy array by loading data from a text file
file_text = np.genfromtxt('data_file.txt', delimiter=',')
file_text = file_text.astype('int16')  # to convert it to type int16
print(file_text)

[1 2 3 4 5 6 7]


#### Copying an array - using copy() function and using assignment operator

In [199]:
# Copying an array - using copy() function and using assignment operator
# Deep copy Using copy()
a = np.array([[1,2,3], [4,5,6]])
b = a.copy()
print('a: \n', a)
print('b: \n',b)
b[0] = 89
print('a: \n', a) # Change in b doesn't change a
print('b: \n',b)


# Shallow copy , 'b' is just a reference for 'a'
a = np.array([[1,2,3], [4,5,6]])
b = a
print('a: \n', a)
print('b: \n',b)
b[0] = 67
print('a: \n', a) # Change in b changes a
print('b: \n',b)

a: 
 [[1 2 3]
 [4 5 6]]
b: 
 [[1 2 3]
 [4 5 6]]
a: 
 [[1 2 3]
 [4 5 6]]
b: 
 [[89 89 89]
 [ 4  5  6]]
a: 
 [[1 2 3]
 [4 5 6]]
b: 
 [[1 2 3]
 [4 5 6]]
a: 
 [[67 67 67]
 [ 4  5  6]]
b: 
 [[67 67 67]
 [ 4  5  6]]


#### Indexing, Slicing 
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas.

In [2]:
# THis creates an array of size 10
import numpy as np
a = np.array([   0,    1,   16,   81,  256,  625, 1296, 2401, 4096, 6561])
a

array([   0,    1,   16,   81,  256,  625, 1296, 2401, 4096, 6561])

In [3]:
# print element at index 5
a[5]

625

In [4]:
# print slice of elements at indices 3, 4, 5, 6. notice that last index is excluded
a[3:7]

array([  81,  256,  625, 1296])

In [5]:
# replace every alternate element starting at index 0 to index 5, with -78
a[:6:2] = -78
a

array([ -78,    1,  -78,   81,  -78,  625, 1296, 2401, 4096, 6561])

In [6]:
# Reverse the elements in a
a[::-1]  

array([6561, 4096, 2401, 1296,  625,  -78,   81,  -78,    1,  -78])

In [229]:
# Multidimensional array indexing
b = np.array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

b

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [230]:
# Element in row 2, col 2
b[2, 2]

6

In [44]:
# Elements in rows 0, 1, and 2 in column 1
b[0:3, 1]  

array([3, 4, 5])

In [45]:
# Elements in row 1 in each column of b
b[1:2, :]  

array([[3, 4, 5]])

In [231]:
# second last row
b[-2]  

array([3, 4, 5])

In [232]:
#  last row
b[-1]  

array([4, 5, 6])

In [113]:
# Another example of indezing in multidimensional array
a = np.array([[3, 22, 33, 34], [35, 26, 27,38], [39, 12, 13, 1], [4, 5, 6, 7]])
print("\n Array a:\n",  a)
print("\n Shape of a:", a.shape)
print("\n Element at index [0, 1] is ", a[0, 1])


 Array a:
 [[ 3 22 33 34]
 [35 26 27 38]
 [39 12 13  1]
 [ 4  5  6  7]]

 Shape of a: (4, 4)

 Element at index [0, 1] is  22


In [117]:
# Subsetting rows and columns of 'a' into another array 'b'

b = a[0:2, 0:2]
print(b, '\n')

b = a[1:2, 1:3]
print(b, '\n')

b = a[:, :]
print(b, '\n')

[[ 3 22]
 [35 26]] 

[[26 27]] 

[[ 3 22 33 34]
 [35 26 27 38]
 [39 12 13  1]
 [ 4  5  6  7]] 



In [124]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print('a : \n', a , '\n')
print("Elements at index (0,2) and (1, 2) in a: ", a[[0, 1],[2, 2]])


a : 
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]] 

Elements at index (0,2) and (1, 2) in a:  [3 6]


In [140]:
# Boolean Indexing to identify elements that satisfy some conditions
a = np.array([[1, 2, 3], [1, 3, 4], [3, 5, 6]])
# Condition is that the element in a should be greater than 4 or less than 3
cond = np.logical_or(a > 4, a<3)
print('Boolean Mask of Elements that satisfy the condition a > 4 or a<3:\n', cond)

print('\n\nElements that satisfy the cond a>4 or a<3:\n', a[cond])
print('\n\nAlso elements that satisfy the cond a>4 or a<3:\n',a[np.logical_or(a>4, a<3)])

Boolean Mask of Elements that satisfy the condition a > 4 or a<3:
 [[ True  True False]
 [ True False False]
 [False  True  True]]


Elements that satisfy the cond a>4 or a<3:
 [1 2 1 5 6]


Also elements that satisfy the cond a>4 or a<3:
 [1 2 1 5 6]


### numpy [where()](https://numpy.org/doc/stable/reference/generated/numpy.where.html)
##The where() function of numpy returns the indices of elements that satisfy a given condition(s)

numpy.where(condition, [x, y, ]/)

Return elements chosen from x or y depending on condition.

####  where() outputs an array with elements from x where condition is True, and elements from y elsewhere

In [26]:
##The where() function of numpy returns the indices of elements that satisfy a given condition(s)

a = np.array([[1, 2, 3, 5], [14, 15, 16, 17]])
print(a, '\n a.shape\n', a.shape )
print("indices where elements are >15 :", np.where(a>15))

b = np.where(a>15)
print('indices: ', b)
  
print("\nElements which are >15" , a[b])
print('\n')

# Printing index of every element, iterating row by row
n_rows = a.shape[0]  # number of rows in a
for s in range(n_rows): # for each row
    for i in a[s]:  # for each column in that row
        print('Index of {} is {}'.format(i, np.where(a==i)))# print the index of the element
    

[[ 1  2  3  5]
 [14 15 16 17]] 
 a.shape
 (2, 4)
indices where elements are >15 : (array([1, 1], dtype=int64), array([2, 3], dtype=int64))
indices:  (array([1, 1], dtype=int64), array([2, 3], dtype=int64))

Elements which are >15 [16 17]


Index of 1 is (array([0], dtype=int64), array([0], dtype=int64))
Index of 2 is (array([0], dtype=int64), array([1], dtype=int64))
Index of 3 is (array([0], dtype=int64), array([2], dtype=int64))
Index of 5 is (array([0], dtype=int64), array([3], dtype=int64))
Index of 14 is (array([1], dtype=int64), array([0], dtype=int64))
Index of 15 is (array([1], dtype=int64), array([1], dtype=int64))
Index of 16 is (array([1], dtype=int64), array([2], dtype=int64))
Index of 17 is (array([1], dtype=int64), array([3], dtype=int64))


### Changing the shape of an array
An array has a shape given by the number of elements along each axis:

In [47]:
a = a = np.arange(10)**4
a

array([   0,    1,   16,   81,  256,  625, 1296, 2401, 4096, 6561],
      dtype=int32)

In [48]:
a.shape

(10,)

In [55]:
a = a.reshape(5, 2) # Changes the shape of your array to 5 rows, 2 columns
a

array([[   0,    1],
       [  16,   81],
       [ 256,  625],
       [1296, 2401],
       [4096, 6561]], dtype=int32)

In [56]:
a.shape

(5, 2)

In [57]:
a= a.ravel()  # flattens array to single dimension

In [58]:
a.shape

(10,)

In [63]:
a = a.reshape(2, 5)# 2 rows, 5 columns
a

array([[   0,    1,   16,   81,  256],
       [ 625, 1296, 2401, 4096, 6561]], dtype=int32)

In [64]:
a_transpose = a.T
a_transpose

array([[   0,  625],
       [   1, 1296],
       [  16, 2401],
       [  81, 4096],
       [ 256, 6561]], dtype=int32)

The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself

In [66]:
a.resize((5, 2))
a

array([[   0,    1],
       [  16,   81],
       [ 256,  625],
       [1296, 2401],
       [4096, 6561]], dtype=int32)

Several arrays can be stacked together along different axes:

In [73]:
# Let us create 2 arrays, of different shapes
a = np.arange(10).reshape(5,2)
print("a\n", a)
b = np.arange(16).reshape(8,2)
print("\n\n b\n", b)

a
 [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


 b
 [[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]]


In [74]:
# both arrays have same number of columns so they are vertically stacked without error
np.vstack((a, b))

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

#### Since a and b have different number of rows, they cannot be horizontally stacked
np.hstack((a, b))  will throw an error

####  Mathematical Operations on Numpy arrays

In [215]:
a = np.array([[1, 2], [3, 4], [5,6], [7,8]])
b = np.arange(11, 19).reshape(4, 2)
print('Sum of a and b: \n', a + b)
print('Sum of a and b using ufunc add: \n',np.add(a, b))

print('Division of a and b using ufunc divide: \n',np.divide(a, b))

print(' Sum of elements in array a: \n', np.sum(a))
print(' Rowwise Sum of elements in array a: \n',np.sum(a, axis=0))
print(' Columnwise Sum of elements in array a: \n',np.sum(a, axis=1))


Sum of a and b: 
 [[12 14]
 [16 18]
 [20 22]
 [24 26]]
Sum of a and b using ufunc add: 
 [[12 14]
 [16 18]
 [20 22]
 [24 26]]
Division of a and b using ufunc divide: 
 [[0.09090909 0.16666667]
 [0.23076923 0.28571429]
 [0.33333333 0.375     ]
 [0.41176471 0.44444444]]
 Sum of elements in array a: 
 36
 Rowwise Sum of elements in array a: 
 [16 20]
 Columnwise Sum of elements in array a: 
 [ 3  7 11 15]


In [213]:
# Multiplication of two arrays, arrays must have compatible dimensions to allow multiplication, just like in case of matrix multiplication
m1 = np.array([[1, 2], [3, 4]])
m2 = np.array([2, 3])
print(' Multiplication of two arrays using np.matmul() :\n', np.matmul(m1,m2))
print(' Multiplication of two arrays using * operator:\n', m1*m2)

#Dot product
print(' Dot product:\n', np.dot(m1,m2))

# Determinant of matrix (two dimensional array)
print(' Determinant of matrix :\n', np.linalg.det(m1)) 


 Multiplication of two arrays using np.matmul() :
 [ 8 18]
 Multiplication of two arrays using * operator:
 [[ 2  6]
 [ 6 12]]
 Dot product:
 [ 8 18]
 Determinant of matrix :
 -2.0000000000000004


In [218]:
# Minimum and maximum of array
# minimum of entire array
print('Min of a: ', np.min(a))
# minimum of row 1
print('Min of row1: ', np.min(a[0,:]))
# minimum of col 2
print('Min of col2: ', np.min(a[:,1]))

print('Max of entire a: ', np.max(a))
print('Max of row1: ', np.max(a[1,:]))


Min of a:  1
Min of row1:  1
Min of col2:  2
Max of entire a:  10
Max of row1:  10


In [216]:
# Operations on single array
a =np.array([[1,2,3,4,5], [6, 7, 8, 9, 10]])
print(a**5)   # every element of a is raised to power 5
print(a//5)   # every element of a is divided by 5
print(a%5)    # modulus
print(np.sin(a))  # sine of all elements in a
print(np.cos(a))  # cosine of all the elements in a 

[[     1     32    243   1024   3125]
 [  7776  16807  32768  59049 100000]]
[[0 0 0 0 1]
 [1 1 1 1 2]]
[[1 2 3 4 0]
 [1 2 3 4 0]]
[[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
 [-0.2794155   0.6569866   0.98935825  0.41211849 -0.54402111]]
[[ 0.54030231 -0.41614684 -0.9899925  -0.65364362  0.28366219]
 [ 0.96017029  0.75390225 -0.14550003 -0.91113026 -0.83907153]]


### Applying numpy functions to titanic.csv dataset.
#### https://www.learndatasci.com/tutorials/applied-introduction-to-numpy-python-tutorial/
##### Data is present here https://github.com/datasciencedojo/datasets/blob/master/titanic.csv

In [35]:
# Example on datset
# SInce the input file is a csv, we use csv module
import csv
data = [] # an empty list

with open("titanic.csv", 'r') as csvfile:
    file_reader = csv.reader(csvfile, delimiter=',') # delimiter for file is comma, file is read row by row
    for row in file_reader:
        data.append(row) # append rows to data list
        
data = np.array(data) #convert the data to ndarray



In [36]:
print("Number of rows is {}, number of columns is {}".format(data.shape[0], data.shape[1]))

Number of rows is 892, number of columns is 12


In [38]:
print("Headers of all columns :", data[0, :]) #first row is the header or column names row

Headers of all columns ['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
 'Ticket' 'Fare' 'Cabin' 'Embarked']


In [44]:
print("Only names: ", data[1:11, 3]) # 10 rows of column 3, i.e. name column

Only names:  ['Braund, Mr. Owen Harris'
 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)'
 'Heikkinen, Miss. Laina' 'Futrelle, Mrs. Jacques Heath (Lily May Peel)'
 'Allen, Mr. William Henry' 'Moran, Mr. James' 'McCarthy, Mr. Timothy J'
 'Palsson, Master. Gosta Leonard'
 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)'
 'Nasser, Mrs. Nicholas (Adele Achem)']


In [70]:
# Subsetting data first 25 rows of columns 3, 4, and 1(name, sex and survived), first row is the header or column names row
namesexsurvived25 = data[0:25, [3, 4, 1]]
namesexsurvived25

array([['Name', 'Sex', 'Survived'],
       ['Braund, Mr. Owen Harris', 'male', '0'],
       ['Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'female',
        '1'],
       ['Heikkinen, Miss. Laina', 'female', '1'],
       ['Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'female', '1'],
       ['Allen, Mr. William Henry', 'male', '0'],
       ['Moran, Mr. James', 'male', '0'],
       ['McCarthy, Mr. Timothy J', 'male', '0'],
       ['Palsson, Master. Gosta Leonard', 'male', '0'],
       ['Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)', 'female',
        '1'],
       ['Nasser, Mrs. Nicholas (Adele Achem)', 'female', '1'],
       ['Sandstrom, Miss. Marguerite Rut', 'female', '1'],
       ['Bonnell, Miss. Elizabeth', 'female', '1'],
       ['Saundercock, Mr. William Henry', 'male', '0'],
       ['Andersson, Mr. Anders Johan', 'male', '0'],
       ['Vestrom, Miss. Hulda Amanda Adolfina', 'female', '0'],
       ['Hewlett, Mrs. (Mary D Kingcome) ', 'female', '1'],
       ['Rice, Ma

In [71]:
namesexsurvived25.shape

(25, 3)

In [72]:
namesexsurvived25[0]

array(['Name', 'Sex', 'Survived'], dtype='<U82')

In [81]:
# Males in first 25 rows who survived

sex25 = namesexsurvived25[:,1]
survived25 = namesexsurvived25[:,2]
names23 = namesexsurvived25[:,0]

onlymales_mask = np.where(sex25 == 'male', True, False)
onlysurvived_mask = np.where(survived25 == '1', True, False)
maleandsurvived = np.logical_and(onlysurvived_mask, onlymales_mask)
maleandsurvived
namesofmaleswhosurvived = names23[maleandsurvived]
print("Males who survived in first 25 rows", namesofmaleswhosurvived)

Males who survived in first 25 rows ['Williams, Mr. Charles Eugene' 'Beesley, Mr. Lawrence'
 'Sloper, Mr. William Thompson']


In [95]:
#Total number of passengers who survived 
# since data type of elements in array data is str, we extract the survived column and type cast it into integer
survived = data[1:, 1].astype(int)
survived
print("Total number of passengers who survived" , survived.sum())

Total number of passengers who survived 342
