# Week-1 Assignment

*Welcome to your first assignment for the SimuTech Winter Project 2022! I hope you are excited to implement and test everything you have learned up until now. There is an interesting set of questions for you to refine your acquired skills as you delve into hands-on coding and deepen your understanding of numpy, pandas, and data visualization libraries.*

# Section 0 : Importing Libraries

*Let's begin by importing numpy, pandas and matplotlib.*

In [1]:
#your code here
import numpy as np
import pandas as pd
import matplotlib as mp

# Section 1 : Playing with Python and Numpy

### Q1. Matrix Multiplication

##### (i) Check if matrix multiplication is valid

In [2]:
def isValid(A,B):
    A_shape = np.array(A.shape)
    B_shape = np.array(B.shape)
    if A_shape[1] == B_shape[0]:
        return 1
    else:
        return 0


##### (ii) Using loops (without using numpy)

In [3]:
def matrix_multiply(A,B):
    A_shape = np.array(A.shape)
    B_shape = np.array(B.shape)
    C = np.zeros((A_shape[0],B_shape[1]))
    for i in range(0,A_shape[0]):
        
        for j in range(0,B_shape[1]):
            
            for k in range(0,B_shape[0]):
                C[i][j] += (A[i][k]*B[k][j])
                k+=1
            j+=1
        i+=1
    return C

  

##### (iii) Using numpy

In [4]:
def matrix_multiply_2(A,B):
    return np.matmul(A,B)
     

##### (iv) Testing your code

Run the following cell to check if your functions are working properly.

*Expected output:*
[ [102 108 114]
 [246 261 276]
 [390 414 438]
 [534 567 600] ]

In [5]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

B = np.array([
    [13, 14, 15],
    [16, 17, 18],
    [19, 20, 21]
])

if isValid(A,B):
  print(f"Result using loops: {matrix_multiply(A,B)}")
  print(f"Result using numpy: {matrix_multiply_2(A,B)}")
else:
  print(f"Matrix multiplication is not valid")

Result using loops: [[102. 108. 114.]
 [246. 261. 276.]
 [390. 414. 438.]
 [534. 567. 600.]]
Result using numpy: [[102 108 114]
 [246 261 276]
 [390 414 438]
 [534 567 600]]


### Q2. Z-Score Normalisation

Z-score normalization refers to the process of normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1.

We use the following formula to perform a z-score normalization on every value in a dataset:

New value = (x – μ) / σ

where:

x: Original value

μ: Mean of data

σ: Standard deviation of data

##### (i) Without using numpy

In [6]:
def mean(x):
  #your code here
  return np.sum(x)/len(x)


In [7]:
import cmath
def standard_deviation(x):
    sum = 0
    for i in range(0,len(x)):
        sum+=(x[i]**2)
    sd1 = sum/len(x)
    return cmath.sqrt(sd1 - (mean(x)**2)).real

In [8]:
def zscore_normalisation(x):
  mean1 = mean(x)
  sd = standard_deviation(x)
  result = np.zeros(len(x))
  for i in range(0,len(x)):
    result[i] = (x[i] - mean1)/sd
  return result
  

##### (ii) Using numpy

Numpy has in_built functions for calculating mean and standard deviation

In [9]:
def zscore_normalisation_2(x):
  #your code here
  mean1 = np.mean(x)
  sd = np.std(x)
  result = np.zeros(len(x))
  for i in range(0,len(x)):
    result[i] = (x[i] - mean1)/sd
  return result

##### (iii) Testing your code

Run the following cell to check if your functions are working properly.

*Expected Output:* [-1.06753267 -0.99745394 -0.99745394 -0.81057732 -0.41346451 -0.06307086
  0.31068237  0.91803138  1.22170588  1.89913361]

In [10]:
x = [4, 7, 7, 15, 32, 47, 63, 89, 102, 131]
print(f"Result without using numpy: {zscore_normalisation(x)}")
print(f"Result using numpy: {zscore_normalisation_2(x)}")

Result without using numpy: [-1.06753267 -0.99745394 -0.99745394 -0.81057732 -0.41346451 -0.06307086
  0.31068237  0.91803138  1.22170588  1.89913361]
Result using numpy: [-1.06753267 -0.99745394 -0.99745394 -0.81057732 -0.41346451 -0.06307086
  0.31068237  0.91803138  1.22170588  1.89913361]


### Q3. Sigmoid fn and its derivative

The sigmoid function is a mathematical function that maps any input value to a value between 0 and 1.

It is defined mathematically as s(x) = 1/(1+e^(-x)).

##### (i) Write a fn to implement sigmoid fn

In [11]:
import cmath
def sigmoidfn(x):
  return (1/(1+cmath.e**(-x)))
sigmoidfn(2)

0.8807970779778823

##### (ii) Write a fn to implement derivative of sigmoid fn

In [12]:
def derivative(x):
  #your code here
  sigd = sigmoidfn(x)
  result = sigd*(1-sigd)
  return result

##### (iii) Test your code

Run the following cell to check if your functions are working properly.

*Expected output:*

x on applying sigmoid activation fn is: [ [0.99987661 0.88079708 0.99330715 0.5        0.5       ]
 [0.99908895 0.99330715 0.5        0.5        0.5       ] ]

x on applying derivative of sigmoid activation fn is: [ [-1.23379350e-04 -1.04993585e-01 -6.64805667e-03 -2.50000000e-01
  -2.50000000e-01]
 [-9.10221180e-04 -6.64805667e-03 -2.50000000e-01 -2.50000000e-01
  -2.50000000e-01] ]

In [13]:
x = np.array([
    [9,2,5,0,0],
    [7,5,0,0,0]
])
print(f"x on applying sigmoid activation fn is: {sigmoidfn(x)}")
print(f"x on applying derivative of sigmoid activation fn is: {derivative(x)}")

x on applying sigmoid activation fn is: [[0.99987661 0.88079708 0.99330715 0.5        0.5       ]
 [0.99908895 0.99330715 0.5        0.5        0.5       ]]
x on applying derivative of sigmoid activation fn is: [[1.23379350e-04 1.04993585e-01 6.64805667e-03 2.50000000e-01
  2.50000000e-01]
 [9.10221180e-04 6.64805667e-03 2.50000000e-01 2.50000000e-01
  2.50000000e-01]]


# Section 2: Exploring Pandas

*You have been provided with a dataset which includes information about properties of superheated vapor.*

*The dataset consists of the thermophysical properties: specific volume, specific internal energy, specific enthalpy, specific entropy of superheated vapor.*

*Pressure is in kPa and Temperature in centigrade. In the dataframe 75, 100, 125, etc. are temperatures.*

### Read the csv file


In [19]:
#your code here
prop_shv = pd.read_csv('D:/Repositories/VisionCraft-TheWinter-Challenge/Soumya_241034/superheated_vapor_properties.csv')


### Display the shape of data frame


In [20]:
#your code here
prop_shv.shape

(1089, 37)

### Return an array containing names of all the columns

In [21]:
#your code here
prop_shv.head(6)

Unnamed: 0,Pressure,Property,Liq_Sat,Vap_Sat,75,100,125,150,175,200,...,425,450,475,500,525,550,575,600,625,650
0,1,V,1.0,129200.0,160640.0,172180.0,183720.0,195270.0,206810.0,218350.0,...,,333730.0,,356810.0,,379880.0,,402960.0,,426040.0
1,1,U,29.334,2385.2,2480.8,2516.4,2552.3,2588.5,2624.9,2661.7,...,,3049.9,,3132.4,,3216.7,,3302.6,,3390.3
2,1,H,29.335,2514.4,2641.5,2688.6,2736.0,2783.7,2831.7,2880.1,...,,3383.6,,3489.2,,3596.5,,3705.6,,3816.4
3,1,S,0.106,8.9767,9.3828,9.5136,9.6365,9.7527,9.8629,9.9679,...,,10.82,,10.9612,,11.0957,,11.2243,,11.3476
4,10,V,1.01,14670.0,16030.0,17190.0,18350.0,19510.0,20660.0,21820.0,...,,33370.0,,35670.0,,37980.0,,40290.0,,42600.0
5,10,U,191.822,2438.0,2479.7,2515.6,2551.6,2588.0,2624.5,2661.4,...,,3049.8,,3132.3,,3216.6,,3302.6,,3390.3


### Display the number of null values in each column of the dataframe



In [22]:
#your code here
null_values = prop_shv.isnull().sum()
print(null_values)

Pressure       0
Property       1
Liq_Sat        1
Vap_Sat        1
75          1056
100         1016
125          976
150          896
175          768
200          640
220          816
225          800
240          816
250          688
260          768
275          680
280          760
290          976
300          120
320          960
325          272
340          953
350          136
360          953
375          408
380          953
400            1
425          409
450            1
475          409
500            1
525          545
550            1
575          681
600            1
625          953
650            1
dtype: int64


### Create a column which contains the Pressure and Property columns, seperated with 'at' (For eg. V at 1, H at 101.325). Using this print the following:
- Enthalpy at 75 kPa and 573 K
- Entropy at 493 K and 250 kPa



In [23]:
#your code here
prop_shv['Pressure_Property'] = prop_shv['Property'] + " at " + prop_shv['Pressure'].astype(str)


### Find out the column with the highest number of missing values

In [24]:
#your code here
null_values = prop_shv.isnull().sum()
print(null_values.sort_values(ascending = False))

75                   1056
100                  1016
290                   976
125                   976
320                   960
340                   953
380                   953
625                   953
360                   953
150                   896
220                   816
240                   816
225                   800
175                   768
260                   768
280                   760
250                   688
575                   681
275                   680
200                   640
525                   545
475                   409
425                   409
375                   408
325                   272
350                   136
300                   120
Liq_Sat                 1
Property                1
Vap_Sat                 1
550                     1
400                     1
450                     1
500                     1
650                     1
600                     1
Pressure_Property       1
Pressure                0
dtype: int64

In [25]:
prop_shv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1089 entries, 0 to 1088
Data columns (total 38 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Pressure           1089 non-null   object 
 1   Property           1088 non-null   object 
 2   Liq_Sat            1088 non-null   object 
 3   Vap_Sat            1088 non-null   object 
 4   75                 33 non-null     float64
 5   100                73 non-null     float64
 6   125                113 non-null    float64
 7   150                193 non-null    float64
 8   175                321 non-null    float64
 9   200                449 non-null    float64
 10  220                273 non-null    float64
 11  225                289 non-null    float64
 12  240                273 non-null    float64
 13  250                401 non-null    float64
 14  260                321 non-null    float64
 15  275                409 non-null    float64
 16  280                329 n

### What is the average enthalpy of Sat. Liq. at all different pressures in the dataset?

In [26]:
#your code here
prop_shv['Liq_Sat'] = pd.to_numeric(prop_shv['Liq_Sat'],errors = 'coerce')
prop_shv['Liq_Sat']= prop_shv['Liq_Sat'].fillna(0)
prop_shv['Liq_Sat'] = prop_shv['Liq_Sat'].astype(float)
print(prop_shv.groupby(['Pressure'])['Liq_Sat'].mean())



Pressure
1            14.943750
10           96.328325
100         209.315675
1000        381.837050
10000       701.578375
               ...    
9600        693.017625
975         379.419375
9800        697.323025
Pressure      0.000000
Name: Liq_Sat, Length: 138, dtype: float64


### Separate out the V,U,H,S data from the dataset into V_data, U_data, H_data, S_data

In [27]:
#your code here
V_data = prop_shv[prop_shv['Property'] == 'V']
U_data = prop_shv[prop_shv['Property'] == 'U']
H_data = prop_shv[prop_shv['Property'] == 'H']
S_data = prop_shv[prop_shv['Property'] == 'S']
V_data.head()

Unnamed: 0,Pressure,Property,Liq_Sat,Vap_Sat,75,100,125,150,175,200,...,450,475,500,525,550,575,600,625,650,Pressure_Property
0,1,V,1.0,129200.0,160640.0,172180.0,183720.0,195270.0,206810.0,218350.0,...,333730.0,,356810.0,,379880.0,,402960.0,,426040.0,V at 1
4,10,V,1.01,14670.0,16030.0,17190.0,18350.0,19510.0,20660.0,21820.0,...,33370.0,,35670.0,,37980.0,,40290.0,,42600.0,V at 10
8,20,V,1.017,7649.8,8000.0,8584.7,9167.1,9748.0,10320.0,10900.0,...,16680.0,,17830.0,,18990.0,,20140.0,,21300.0,V at 20
12,30,V,1.022,5229.3,5322.0,5714.4,6104.6,6493.2,6880.8,7267.5,...,11120.0,,11890.0,,12660.0,,13430.0,,14190.0,V at 30
16,40,V,1.027,3993.4,,4279.2,4573.3,4865.8,5157.2,5447.8,...,8340.1,,8917.6,,9494.9,,10070.0,,10640.0,V at 40


# Section 4 : Conculsion




*Congratulations on reaching this point! I hope you had fun solving your first assignment and have also built confidence in applying these libraries. If you are wondering, we will cover more about z-score normalization in Week 2, and the sigmoid function will be used in Week 3. After completing this assignment, you are now prepared to learn about machine learning techniques and implement your own machine learning models.*