# Week-1 Assignment

*Welcome to your first assignment for the SimuTech Winter Project 2022! I hope you are excited to implement and test everything you have learned up until now. There is an interesting set of questions for you to refine your acquired skills as you delve into hands-on coding and deepen your understanding of numpy, pandas, and data visualization libraries.*

# Section 0 : Importing Libraries

*Let's begin by importing numpy, pandas and matplotlib.*

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Section 1 : Playing with Python and Numpy

### Q1. Matrix Multiplication

##### (i) Check if matrix multiplication is valid

In [2]:
def isValid(A,B):
  if (A.shape[1]==B.shape[0]):
    return(True)
  else:
    return(False)

##### (ii) Using loops (without using numpy)

In [3]:
def matrix_multiply(A,B):
  R=np.zeros((A.shape[0],B.shape[1]))
  for i in range(A.shape[0]):
    for j in range(B.shape[1]):
      R[i][j]=0
      for k in range(A.shape[1]):
        R[i][j]+=A[i][k]*B[k][j]
  return R

##### (iii) Using numpy

In [4]:
def matrix_multiply_2(A,B):
    return(np.matmul(A,B)) 

##### (iv) Testing your code

Run the following cell to check if your functions are working properly.

*Expected output:*
[ [102 108 114]
 [246 261 276]
 [390 414 438]
 [534 567 600] ]

In [5]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

B = np.array([
    [13, 14, 15],
    [16, 17, 18],
    [19, 20, 21],
])

if isValid(A,B):
    print(f"Result using loops:\n{matrix_multiply(A,B)}")
    print(f"Result using numpy:\n{matrix_multiply_2(A,B)}")
else:
    print(f"Matrix multiplication is not valid")

Result using loops:
[[102. 108. 114.]
 [246. 261. 276.]
 [390. 414. 438.]
 [534. 567. 600.]]
Result using numpy:
[[102 108 114]
 [246 261 276]
 [390 414 438]
 [534 567 600]]


### Q2. Z-Score Normalisation

Z-score normalization refers to the process of normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1.

We use the following formula to perform a z-score normalization on every value in a dataset:

New value = (x – μ) / σ

where:

x: Original value

μ: Mean of data

σ: Standard deviation of data

##### (i) Without using numpy

In [6]:
def mean(x):
    return sum(x)/len(x)

In [7]:
def standard_deviation(x):
    m = mean(x)
    sum = 0
    for i in x:
        sum += (i - m) ** 2
    return (sum / len(x)) ** 0.5


In [8]:
def zscore_normalisation(x):
  m = mean(x) 
  std = standard_deviation(x) 
  for i in range(len(x)): 
    x[i] = (x[i] - m) / std 
  return x


##### (ii) Using numpy

Numpy has in_built functions for calculating mean and standard deviation

In [9]:
def zscore_normalisation_2(x):
  x=np.array(x)
  m=np.mean(x)
  std=np.std(x)
  for i in range (len(x)):
    x[i]=(x[i]-m)/std
  return x


##### (iii) Testing your code

Run the following cell to check if your functions are working properly.

*Expected Output:* [-1.06753267 -0.99745394 -0.99745394 -0.81057732 -0.41346451 -0.06307086
  0.31068237  0.91803138  1.22170588  1.89913361]

In [10]:
x = [4, 7, 7, 15, 32, 47, 63, 89, 102, 131]
print(f"Result without using numpy:\n{zscore_normalisation(x)}")
print(f"Result using numpy:\n{zscore_normalisation_2(x)}")

Result without using numpy:
[-1.0675326683028088, -0.9974539373420117, -0.9974539373420117, -0.8105773214465528, -0.41346451266870277, -0.06307085786471743, 0.3106823739262003, 0.9180313755864415, 1.2217058764165623, 1.8991336090376005]
Result using numpy:
[-1.06753267 -0.99745394 -0.99745394 -0.81057732 -0.41346451 -0.06307086
  0.31068237  0.91803138  1.22170588  1.89913361]


### Q3. Sigmoid fn and its derivative

The sigmoid function is a mathematical function that maps any input value to a value between 0 and 1.

It is defined mathematically as s(x) = 1/(1+e^(-x)).

##### (i) Write a fn to implement sigmoid fn

In [11]:
def sigmoidfn(x):
  s=1/(1+np.exp(-x))
  return s


##### (ii) Write a fn to implement derivative of sigmoid fn

In [12]:
def derivative(x):
  der=sigmoidfn(x)*(1-sigmoidfn(x))
  return der

##### (iii) Test your code

Run the following cell to check if your functions are working properly.

*Expected output:*

x on applying sigmoid activation fn is: [ [0.99987661 0.88079708 0.99330715 0.5        0.5       ]
 [0.99908895 0.99330715 0.5        0.5        0.5       ] ]

x on applying derivative of sigmoid activation fn is: [ [-1.23379350e-04 -1.04993585e-01 -6.64805667e-03 -2.50000000e-01
  -2.50000000e-01]
 [-9.10221180e-04 -6.64805667e-03 -2.50000000e-01 -2.50000000e-01
  -2.50000000e-01] ]

In [13]:
x = np.array([
    [9,2,5,0,0],
    [7,5,0,0,0]
])
print(f"x on applying sigmoid activation fn is:\n{sigmoidfn(x)}")
print(f"x on applying derivative of sigmoid activation fn is:\n{derivative(x)}")

x on applying sigmoid activation fn is:
[[0.99987661 0.88079708 0.99330715 0.5        0.5       ]
 [0.99908895 0.99330715 0.5        0.5        0.5       ]]
x on applying derivative of sigmoid activation fn is:
[[1.23379350e-04 1.04993585e-01 6.64805667e-03 2.50000000e-01
  2.50000000e-01]
 [9.10221180e-04 6.64805667e-03 2.50000000e-01 2.50000000e-01
  2.50000000e-01]]


# Section 2: Exploring Pandas

*You have been provided with a dataset which includes information about properties of superheated vapor.*

*The dataset consists of the thermophysical properties: specific volume, specific internal energy, specific enthalpy, specific entropy of superheated vapor.*

*Pressure is in kPa and Temperature in centigrade. In the dataframe 75, 100, 125, etc. are temperatures.*

### Read the csv file


In [14]:
data=pd.read_csv('superheated_vapor_properties.csv')
data.head()

Unnamed: 0,Pressure,Property,Liq_Sat,Vap_Sat,75,100,125,150,175,200,...,425,450,475,500,525,550,575,600,625,650
0,1.0,V,1.0,129200.0,160640.0,172180.0,183720.0,195270.0,206810.0,218350.0,...,,333730.0,,356810.0,,379880.0,,402960.0,,426040.0
1,1.0,U,29.334,2385.2,2480.8,2516.4,2552.3,2588.5,2624.9,2661.7,...,,3049.9,,3132.4,,3216.7,,3302.6,,3390.3
2,1.0,H,29.335,2514.4,2641.5,2688.6,2736.0,2783.7,2831.7,2880.1,...,,3383.6,,3489.2,,3596.5,,3705.6,,3816.4
3,1.0,S,0.106,8.9767,9.3828,9.5136,9.6365,9.7527,9.8629,9.9679,...,,10.82,,10.9612,,11.0957,,11.2243,,11.3476
4,10.0,V,1.01,14670.0,16030.0,17190.0,18350.0,19510.0,20660.0,21820.0,...,,33370.0,,35670.0,,37980.0,,40290.0,,42600.0


### Display the shape of data frame


In [15]:
data.shape

(544, 37)

### Return an array containing names of all the columns

In [16]:
arr=np.array(data.columns)
print(arr)

['Pressure' 'Property' 'Liq_Sat' 'Vap_Sat' '75' '100' '125' '150' '175'
 '200' '220' '225' '240' '250' '260' '275' '280' '290' '300' '320' '325'
 '340' '350' '360' '375' '380' '400' '425' '450' '475' '500' '525' '550'
 '575' '600' '625' '650']


### Display the number of null values in each column of the dataframe



In [17]:
for i in data.columns:
    n=0
    for j in data[i]:
        if pd.isnull(j):
            n=n+1
    print(i,": ",n)


Pressure :  0
Property :  0
Liq_Sat :  0
Vap_Sat :  0
75 :  528
100 :  508
125 :  488
150 :  448
175 :  384
200 :  320
220 :  408
225 :  400
240 :  408
250 :  344
260 :  384
275 :  340
280 :  380
290 :  488
300 :  60
320 :  480
325 :  136
340 :  476
350 :  68
360 :  476
375 :  204
380 :  476
400 :  0
425 :  204
450 :  0
475 :  204
500 :  0
525 :  272
550 :  0
575 :  340
600 :  0
625 :  476
650 :  0


### Create a column which contains the Pressure and Property columns, seperated with 'at' (For eg. V at 1, H at 101.325). Using this print the following:
- Enthalpy at 75 kPa and 573 K
- Entropy at 493 K and 250 kPa



In [18]:
data['Property at Pressure']=data['Property']+' at '+data['Pressure'].astype(str)
data.head()

Unnamed: 0,Pressure,Property,Liq_Sat,Vap_Sat,75,100,125,150,175,200,...,450,475,500,525,550,575,600,625,650,Property at Pressure
0,1.0,V,1.0,129200.0,160640.0,172180.0,183720.0,195270.0,206810.0,218350.0,...,333730.0,,356810.0,,379880.0,,402960.0,,426040.0,V at 1.0
1,1.0,U,29.334,2385.2,2480.8,2516.4,2552.3,2588.5,2624.9,2661.7,...,3049.9,,3132.4,,3216.7,,3302.6,,3390.3,U at 1.0
2,1.0,H,29.335,2514.4,2641.5,2688.6,2736.0,2783.7,2831.7,2880.1,...,3383.6,,3489.2,,3596.5,,3705.6,,3816.4,H at 1.0
3,1.0,S,0.106,8.9767,9.3828,9.5136,9.6365,9.7527,9.8629,9.9679,...,10.82,,10.9612,,11.0957,,11.2243,,11.3476,S at 1.0
4,10.0,V,1.01,14670.0,16030.0,17190.0,18350.0,19510.0,20660.0,21820.0,...,33370.0,,35670.0,,37980.0,,40290.0,,42600.0,V at 10.0


### Find out the column with the highest number of missing values

In [19]:
L=[]
for i in data.columns:
    n=0
    for j in data[i]:
        if pd.isnull(j):
            n=n+1
    L.append(n)
m=max(L)
for i in data.columns:
    n=0
    for j in data[i]:
        if pd.isnull(j):
            n=n+1
    if n==m:
        print("Column with highest number of missing values: ",i)

Column with highest number of missing values:  75


### What is the average enthalpy of Sat. Liq. at all different pressures in the dataset?

In [20]:
e=data[data['Property']=='H']
print('Average enthalpy of Sat.Liq. at all diff pressures: ',e['Liq_Sat'].mean())

Average enthalpy of Sat.Liq. at all diff pressures:  936.9707720588235


### Separate out the V,U,H,S data from the dataset into V_data, U_data, H_data, S_data

In [21]:
V_data=data[data['Property']=='V']
print('V_data:\n',V_data)
U_data=data[data['Property']=='U']
print('U_data\n',U_data)
H_data=data[data['Property']=='H']
print('H_data\n',H_data)
S_data=data[data['Property']=='S']
print('S_data\n',S_data)

V_data:
      Pressure Property  Liq_Sat     Vap_Sat        75       100       125  \
0         1.0        V    1.000  129200.000  160640.0  172180.0  183720.0   
4        10.0        V    1.010   14670.000   16030.0   17190.0   18350.0   
8        20.0        V    1.017    7649.800    8000.0    8584.7    9167.1   
12       30.0        V    1.022    5229.300    5322.0    5714.4    6104.6   
16       40.0        V    1.027    3993.400       NaN    4279.2    4573.3   
..        ...      ...      ...         ...       ...       ...       ...   
524   10600.0        V    1.474      16.778       NaN       NaN       NaN   
528   10800.0        V    1.481      16.385       NaN       NaN       NaN   
532   11000.0        V    1.489      16.006       NaN       NaN       NaN   
536   11200.0        V    1.496      15.639       NaN       NaN       NaN   
540   11400.0        V    1.504      15.284       NaN       NaN       NaN   

          150       175       200  ...         450     475        

# Section 4 : Conculsion




*Congratulations on reaching this point! I hope you had fun solving your first assignment and have also built confidence in applying these libraries. If you are wondering, we will cover more about z-score normalization in Week 2, and the sigmoid function will be used in Week 3. After completing this assignment, you are now prepared to learn about machine learning techniques and implement your own machine learning models.*