# **HW1: Regression**
In *assignment 1*, you need to finish:

1.  Basic Part: Implement two regression models to predict the Systolic blood pressure (SBP) of a patient. You will need to implement **both Matrix Inversion and Gradient Descent**.


> *   Step 1: Split Data
> *   Step 2: Preprocess Data
> *   Step 3: Implement Regression
> *   Step 4: Make Prediction
> *   Step 5: Train Model and Generate Result

2.  Advanced Part: Implement one regression model to predict the SBP of multiple patients in a different way than the basic part. You can choose **either** of the two methods for this part.

# **1. Basic Part (55%)**
In the first part, you need to implement the regression to predict SBP from the given DBP


## 1.1 Matrix Inversion Method (25%)


*   Save the prediction result in a csv file **hw1_basic_mi.csv**
*   Print your coefficient


### *Import Packages*

> Note: You **cannot** import any other package

In [1873]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv
import math
import random

### *Global attributes*
Define the global attributes

In [1874]:
training_dataroot = 'hw1_basic_training.csv' # Training data file file named as 'hw1_basic_training.csv'
testing_dataroot = 'hw1_basic_testing.csv'   # Testing data file named as 'hw1_basic_training.csv'
output_dataroot = 'hw1_basic_mi.csv' # Output file will be named as 'hw1_basic.csv'

training_datalist =  [] # Training datalist, saved as numpy array
testing_datalist =  [] # Testing datalist, saved as numpy array

output_datalist =  [] # Your prediction, should be 20 * 1 matrix and saved as numpy array
                      # The format of each row should be ['sbp']

You can add your own global attributes here


In [1875]:
training_dataset = []
validation_dataset = []
w = 0

### *Load the Input File*
First, load the basic input file **hw1_basic_training.csv** and **hw1_basic_testing.csv**

Input data would be stored in *training_datalist* and *testing_datalist*

In [1876]:
# Read input csv to datalist
with open(training_dataroot, newline='') as csvfile:
  training_datalist = np.array(list(csv.reader(csvfile)))

with open(testing_dataroot, newline='') as csvfile:
  testing_datalist = np.array(list(csv.reader(csvfile)))

### *Implement the Regression Model*

> Note: It is recommended to use the functions we defined, you can also define your own functions


#### Step 1: Split Data
Split data in *training_datalist* into training dataset and validation dataset
* Validation dataset is used to validate your own model without the testing data



In [1877]:
def SplitData():
    global training_dataset
    global validation_dataset
    global training_datalist
    training_dataset = training_datalist[1:301]
    validation_dataset = training_datalist[301:]

SplitData()


#### Step 2: Preprocess Data
Handle the unreasonable data
> Hint: Outlier and missing data can be handled by removing the data or adding the values with the help of statistics  

In [1878]:
def PreprocessData():
    global training_dataset
    global validation_dataset
    training_dbp = training_dataset[:, 0].astype(float)
    training_sbp = training_dataset[:, 1].astype(float)
    validation_dbp = validation_dataset[:, 0].astype(float)
    validation_sbp = validation_dataset[:, 1].astype(float)
    training_dbp_mean = np.mean(training_dbp)
    training_dbp_std = np.std(training_dbp)
    training_sdp_mean = np.mean(training_sbp)
    training_sdp_std = np.std(training_sbp)
    validation_dbp_mean = np.mean(validation_dbp)
    validation_dbp_std = np.std(validation_dbp)
    validation_sbp_mean = np.mean(validation_sbp)
    validation_sbp_std = np.std(validation_sbp)
    training_dataset = training_dataset[(training_sbp >= training_sdp_mean - 1 * training_sdp_std) & 
                                         (training_sbp <= training_sdp_mean + 1 * training_sdp_std) & 
                                         (training_dbp >= training_dbp_mean - 1 * training_dbp_std) & 
                                         (training_dbp <= training_dbp_mean + 1 * training_dbp_std)]
    validation_dataset = validation_dataset[(validation_sbp >= validation_sbp_mean - 1 * validation_sbp_std) & 
                                             (validation_sbp <= validation_sbp_mean + 1 * validation_sbp_std) & 
                                             (validation_dbp >= validation_dbp_mean - 1 * validation_dbp_std) & 
                                             (validation_dbp <= validation_dbp_mean + 1 * validation_dbp_std)]
    
PreprocessData()


#### Step 3: Implement Regression
> use Matrix Inversion to finish this part




In [1879]:
def MatrixInversion():
    global training_dataset
    global validation_dataset
    global w

    training_dpt = []
    training_spt = []

    for i, j in training_dataset:
        training_dpt.append([float(i)])
        training_spt.append([float(j)])

    training_dpt = np.array(training_dpt)
    training_spt = np.array(training_spt)

    valid_dpt = []
    valid_spt = []

    for i, j in validation_dataset:
        valid_dpt.append([float(i)])
        valid_spt.append([float(j)])

    valid_dpt = np.array(valid_dpt)
    valid_spt = np.array(valid_spt)

    training_dpt_transpose = np.transpose(training_dpt)
    tmp = np.dot(training_dpt_transpose, training_dpt)
    tmp_inv = np.linalg.inv(tmp)
    w = np.dot(np.dot(tmp_inv,training_dpt_transpose),training_spt)

    w = np.array(w)
    prediction = np.dot(valid_dpt, w)
    mape = np.mean(np.abs((valid_spt - prediction) / valid_spt)) * 100
    print(mape)

MatrixInversion()

5.298458938732301


#### Step 4: Make Prediction
Make prediction of testing dataset and store the value in *output_datalist*
The final *output_datalist* should look something like this 
> [ [100], [80], ... , [90] ] where each row contains the predicted SBP

In [1880]:
def MakePrediction():
    global testing_datalist
    global w
    global output_datalist

    testing_dpt = []
    for i in testing_datalist[1:]:
        testing_dpt.append([float(i[0])])
    
    testing_dpt = np.array(testing_dpt)
    testing_spt = np.dot(testing_dpt, w)
    output_datalist = testing_spt.tolist()
    print(output_datalist)

MakePrediction()

[[146.25555726049816], [130.69645542427494], [129.1405452406526], [157.1469285458544], [96.46643138458388], [154.03510817860976], [141.5878267096312], [141.5878267096312], [132.25236560789727], [122.91690450616335], [121.36099432254102], [126.02872487340798], [110.46962303718477], [121.36099432254102], [144.69964707687583], [136.92009615876424], [164.92647946396602], [146.25555726049816], [149.3673776277428], [160.25874891309905]]


#### Step 5: Train Model and Generate Result

> Notice: **Remember to output the coefficients of the model here**, otherwise 5 points would be deducted
* If your regression model is *3x^2 + 2x^1 + 1*, your output would be:
```
3 2 1
```





In [1881]:
print(float(w))

1.5559101836223208


### *Write the Output File*
Write the prediction to output csv
> Format: 'sbp'




In [1882]:
with open(output_dataroot, 'w', newline='', encoding="utf-8") as csvfile:
  writer = csv.writer(csvfile)
  for row in output_datalist:
    writer.writerow(row)

## 1.2 Gradient Descent Method (30%)


*   Save the prediction result in a csv file **hw1_basic_gd.csv**
*   Output your coefficient update in a csv file **hw1_basic_coefficient.csv**
*   Print your coefficient





### *Global attributes*

In [1883]:
output_dataroot = 'hw1_basic_gd.csv' # Output file will be named as 'hw1_basic.csv'
coefficient_output_dataroot = 'hw1_basic_coefficient.csv'

training_datalist =  [] # Training datalist, saved as numpy array
testing_datalist =  [] # Testing datalist, saved as numpy array

# Read input csv to datalist
with open(training_dataroot, newline='') as csvfile:
  training_datalist = np.array(list(csv.reader(csvfile)))

with open(testing_dataroot, newline='') as csvfile:
  testing_datalist = np.array(list(csv.reader(csvfile)))

output_datalist =  [] # Your prediction, should be 20 * 1 matrix and saved as numpy array
                      # The format of each row should be ['sbp']

coefficient_output = [] # Your coefficient update during gradient descent
                   # Should be a (number of iterations * number_of coefficient) matrix
                   # The format of each row should be ['w0', 'w1', ...., 'wn']

Your own global attributes

In [1884]:
training_dataset = []
validation_dataset = []
w = 0

### *Implement the Regression Model*


#### Step 1: Split Data

In [1885]:
def SplitData():
    global training_dataset
    global validation_dataset
    global training_datalist
    training_dataset = training_datalist[1:301]
    validation_dataset = training_datalist[301:]

SplitData()


#### Step 2: Preprocess Data

In [1886]:
def PreprocessData():
    global training_dataset
    global validation_dataset
    training_dbp = training_dataset[:, 0].astype(float)
    training_sbp = training_dataset[:, 1].astype(float)
    validation_dbp = validation_dataset[:, 0].astype(float)
    validation_sbp = validation_dataset[:, 1].astype(float)
    training_dbp_mean = np.mean(training_dbp)
    training_dbp_std = np.std(training_dbp)
    training_sdp_mean = np.mean(training_sbp)
    training_sdp_std = np.std(training_sbp)
    validation_dbp_mean = np.mean(validation_dbp)
    validation_dbp_std = np.std(validation_dbp)
    validation_sbp_mean = np.mean(validation_sbp)
    validation_sbp_std = np.std(validation_sbp)
    training_dataset = training_dataset[(training_sbp >= training_sdp_mean - 1 * training_sdp_std) & 
                                         (training_sbp <= training_sdp_mean + 1 * training_sdp_std) & 
                                         (training_dbp >= training_dbp_mean - 1 * training_dbp_std) & 
                                         (training_dbp <= training_dbp_mean + 1 * training_dbp_std)]
    validation_dataset = validation_dataset[(validation_sbp >= validation_sbp_mean - 1 * validation_sbp_std) & 
                                             (validation_sbp <= validation_sbp_mean + 1 * validation_sbp_std) & 
                                             (validation_dbp >= validation_dbp_mean - 1 * validation_dbp_std) & 
                                             (validation_dbp <= validation_dbp_mean + 1 * validation_dbp_std)]
    
PreprocessData()


#### Step 3: Implement Regression
> use Gradient Descent to finish this part

In [1887]:
def GradientDescent():
    global training_dataset
    global validation_dataset
    global w
    global coefficient_output

    training_dpt = []
    training_spt = []

    for i, j in training_dataset:
        training_dpt.append([float(i)])
        training_spt.append([float(j)])

    training_dpt = np.array(training_dpt)
    training_spt = np.array(training_spt)

    valid_dpt = []
    valid_spt = []

    for i, j in validation_dataset:
        valid_dpt.append([float(i)])
        valid_spt.append([float(j)])

    learning_rate = 0.0001
    n_iterations = 1000

    w = 0
    coefficient_output.append([w])

    for iteration in range(n_iterations):
        # Calculate predictions
        spt_pred = training_dpt.dot(w)
        
        # Calculate the gradient of the Mean Squared Error
        gradient = -2 * training_dpt.T.dot(training_spt - spt_pred) / training_dpt.shape[0]
        
        # Update the coefficients
        w -= learning_rate * gradient
        coefficient_output.append(w.flatten())

    print(float(w))

    w = np.array(w)
    prediction = np.dot(valid_dpt, w)
    mape = np.mean(np.abs((valid_spt - prediction) / valid_spt)) * 100
    print("mape:", mape)

    print("coefficient_output:", coefficient_output)

    

GradientDescent()

1.555910183622321
mape: 5.298458938732301
coefficient_output: [[0], array([2.1581473]), array([1.32280576]), array([1.64613656]), array([1.52098679]), array([1.56942778]), array([1.55067801]), array([1.55793537]), array([1.55512631]), array([1.55621359]), array([1.55579274]), array([1.55595564]), array([1.55589259]), array([1.55591699]), array([1.55590755]), array([1.5559112]), array([1.55590979]), array([1.55591034]), array([1.55591012]), array([1.55591021]), array([1.55591017]), array([1.55591019]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.55591018]), array([1.5

#### Step 4: Make Prediction

Make prediction of testing dataset and store the values in *output_datalist*
The final *output_datalist* should look something like this 
> [ [100], [80], ... , [90] ] where each row contains the predicted SBP

Remember to also store your coefficient update in *coefficient_output*
The final *coefficient_output* should look something like this
> [ [1, 0, 3, 5], ... , [0.1, 0.3, 0.2, 0.5] ] where each row contains the [w0, w1, ..., wn] of your coefficient





In [1888]:
def MakePrediction():
    global testing_datalist
    global w
    global output_datalist

    testing_dpt = []
    for i in testing_datalist[1:]:
        testing_dpt.append([float(i[0])])
    
    testing_dpt = np.array(testing_dpt)
    testing_spt = np.dot(testing_dpt, w)
    output_datalist = testing_spt.tolist()
    print(output_datalist)

MakePrediction()

[[146.25555726049816], [130.69645542427497], [129.14054524065264], [157.14692854585442], [96.4664313845839], [154.0351081786098], [141.5878267096312], [141.5878267096312], [132.25236560789727], [122.91690450616336], [121.36099432254105], [126.02872487340801], [110.46962303718479], [121.36099432254105], [144.69964707687586], [136.92009615876424], [164.92647946396602], [146.25555726049816], [149.36737762774283], [160.25874891309905]]


#### Step 5: Train Model and Generate Result

> Notice: **Remember to output the coefficients of the model here**, otherwise 5 points would be deducted
* If your regression model is *3x^2 + 2x^1 + 1*, your output would be:
```
3 2 1
```



In [1889]:
print(float(w))

1.555910183622321


### *Write the Output File*

Write the prediction to output csv
> Format: 'sbp'

**Write the coefficient update to csv**
> Format: 'w0', 'w1', ..., 'wn'
>*   The number of columns is based on your number of coefficient
>*   The number of row is based on your number of iterations

In [1890]:
with open(output_dataroot, 'w', newline='', encoding="utf-8") as csvfile:
  writer = csv.writer(csvfile)
  for row in output_datalist:
    writer.writerow(row)

with open(coefficient_output_dataroot, 'w', newline='', encoding="utf-8") as csvfile:
  writer = csv.writer(csvfile)
  for row in coefficient_output:
    writer.writerow(row)

# **2. Advanced Part (40%)**
In the second part, you need to implement the regression in a different way than the basic part to help your predictions of multiple patients SBP.

You can choose **either** Matrix Inversion or Gradient Descent method.

The training data will be in **hw1_advanced_training.csv** and the testing data will be in **hw1_advanced_testing.csv**.

Output your prediction in **hw1_advanced.csv**

Notice:
> You cannot import any other package other than those given



### Input the training and testing dataset

In [1891]:
training_dataroot = 'hw1_advanced_training.csv' # Training data file file named as 'hw1_basic_training.csv'
testing_dataroot = 'hw1_advanced_testing.csv'   # Testing data file named as 'hw1_basic_training.csv'
output_dataroot = 'hw1_advanced.csv' # Output file will be named as 'hw1_basic.csv'

training_datalist =  [] # Training datalist, saved as numpy array
testing_datalist =  [] # Testing datalist, saved as numpy array

output_datalist =  [] # Your prediction, should be 220 * 1 matrix and saved as numpy array
                      # The format of each row should be ['sbp']

### Your Implementation

In [1892]:
# Read input csv to datalist
with open(training_dataroot, newline='') as csvfile:
  training_datalist = np.array(list(csv.reader(csvfile)))

with open(testing_dataroot, newline='') as csvfile:
  testing_datalist = np.array(list(csv.reader(csvfile)))

def SplitData():
    global training_dataset
    global validation_dataset
    global training_datalist
    
    training_datalist = training_datalist[1:]
    np.random.shuffle(training_datalist)
    training_dataset = training_datalist[1:]
    validation_dataset = training_datalist[200:500]

def PreprocessData():
   global training_dataset
   global validation_dataset
   global testing_datalist
   testing_datalist = testing_datalist[1:]
   #avg_spt = np.mean(training_dataset[:, 6].astype(float))
   for i, data in enumerate(training_dataset):
      if (data[2] == '' or (float(data[2]) >= 98 or float(data[2]) <= 96)):
         data[2] = 97
      if (data[3] == '' or (float(data[3]) >= 98 or float(data[3]) <= 72)):
         data[3] = 85
      if (data[4] == '' or (float(data[4]) >= 19.5 or float(data[4]) <= 14.5)):
         data[4] = 17
      if (data[5] == '' or (float(data[5]) >= 101 or float(data[5]) <= 93)):
         data[5] = 97
      training_dataset[i] = data

   for i, data in enumerate(validation_dataset):
      if (data[2] == '' or (float(data[2]) >= 106 or float(data[2]) <= 90)):
         data[2] = 97
      if (data[3] == '' or (float(data[3]) >= 145 or float(data[3]) <= 40)):
         data[3] = 85
      if (data[4] == '' or (float(data[4]) >= 24 or float(data[4]) <= 10)):
         data[4] = 17
      if (data[5] == '' or (float(data[5]) >= 110 or float(data[5]) <= 85)):
         data[5] = 97
      validation_dataset[i] = data

SplitData()
PreprocessData()

def sigmoid(x):
   return 1 / (1 + np.exp(-x))

def GradientDescent():
   global training_dataset

   training_phi1 = []
   training_phi2 = []
   training_phi3 = []
   training_phi4 = []
   training_phi5 = []
   training_phi6 = []
   training_phi7 = []
   training_phi8 = []
   training_phi9 = []
   training_phi10 = []
   training_phi11 = []
   valid_phi = []
   # valid_phi2 = []
   # valid_phi3 = []
   # valid_phi4 = []
   # valid_phi5 = []
   # valid_phi6 = []
   # valid_phi7 = []
   # valid_phi8 = []
   # valid_phi9 = []
   # valid_phi10 = []
   # valid_phi11 = []

   training_spt1 = []
   training_spt2 = []
   training_spt3 = []
   training_spt4 = []
   training_spt5 = []
   training_spt6 = []
   training_spt7 = []
   training_spt8 = []
   training_spt9 = []
   training_spt10 = []
   training_spt11 = []
   valid_spt = []


   temp_avg = np.mean(training_dataset[:, 2].astype(float))
   temp_std = np.std(training_dataset[:, 2].astype(float))
   heart_avg = np.mean(training_dataset[:, 3].astype(float))
   heart_std = np.std(training_dataset[:, 3].astype(float))
   resp_avg = np.mean(training_dataset[:, 4].astype(float))
   resp_std = np.std(training_dataset[:, 4].astype(float))
   o2_avg = np.mean(training_dataset[:, 5].astype(float))
   o2_std = np.std(training_dataset[:, 5].astype(float))
   training_dataset_num = len(training_dataset)
   validation_dataset_num = len(validation_dataset)

   for i in range(training_dataset_num):
      if (training_dataset[i][0] == '11526383'):
         training_phi1.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt1.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '12923910'):
         training_phi2.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt2.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '14699420'):
         training_phi3.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt3.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '15437705'):
         training_phi4.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt4.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '15642911'):
         training_phi5.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt5.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '16298357'):
         training_phi6.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt6.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '17331999'):
         training_phi7.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt7.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '17593883'):
         training_phi8.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt8.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '18733920'):
         training_phi9.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt9.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '18791093'):
         training_phi10.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt10.append([float(training_dataset[i][6])])
      elif (training_dataset[i][0] == '19473413'):
         training_phi11.append([1, sigmoid((float(training_dataset[i][2]) - temp_avg) / temp_std),
                                 sigmoid((float(training_dataset[i][3]) - heart_avg) / heart_std), 
                                 sigmoid((float(training_dataset[i][4]) - resp_avg) / resp_std),
                                 sigmoid((float(training_dataset[i][5]) - o2_avg) / o2_std)])
         training_spt11.append([float(training_dataset[i][6])])

   training_spt1, training_spt2, training_spt3, training_spt4, training_spt5, training_spt6, training_spt7, training_spt8, training_spt9, training_spt10, training_spt11 = np.array(training_spt1), np.array(training_spt2), np.array(training_spt3), np.array(training_spt4), np.array(training_spt5), np.array(training_spt6), np.array(training_spt7), np.array(training_spt8), np.array(training_spt9), np.array(training_spt10), np.array(training_spt11)

   for i in range (validation_dataset_num):
      valid_phi.append([validation_dataset[i][0], 1, 
                        sigmoid((float(validation_dataset[i][2]) - temp_avg) / temp_std),
                        sigmoid((float(validation_dataset[i][3]) - heart_avg) / heart_std), 
                        sigmoid((float(validation_dataset[i][4]) - resp_avg) / resp_std),
                        sigmoid((float(validation_dataset[i][5]) - o2_avg) / o2_std)])
      valid_spt.append([float(validation_dataset[i][6])])
   
   training_phi1, training_phi2, training_phi3, training_phi4, training_phi5, training_phi6, training_phi7, training_phi8, training_phi9, training_phi10, training_phi11 = np.array(training_phi1), np.array(training_phi2), np.array(training_phi3), np.array(training_phi4), np.array(training_phi5), np.array(training_phi6), np.array(training_phi7), np.array(training_phi8), np.array(training_phi9), np.array(training_phi10), np.array(training_phi11)
   valid_phi = np.array(valid_phi).astype(float)
   valid_spt = np.array(valid_spt).astype(float)
   learning_rate = 0.1
   n_iterations = 100000

   w1 = [[0], [0], [0], [0], [0]]
   w2 = [[0], [0], [0], [0], [0]]
   w3 = [[0], [0], [0], [0], [0]]
   w4 = [[0], [0], [0], [0], [0]]
   w5 = [[0], [0], [0], [0], [0]]
   w6 = [[0], [0], [0], [0], [0]]
   w7 = [[0], [0], [0], [0], [0]]
   w8 = [[0], [0], [0], [0], [0]]
   w9 = [[0], [0], [0], [0], [0]]
   w10 = [[0], [0], [0], [0], [0]]
   w11 = [[0], [0], [0], [0], [0]]

   for iteration in range(n_iterations):
      
      spt_pred1 = training_phi1.dot(w1)
      gradient = -2 * training_phi1.T.dot(training_spt1 - spt_pred1) / training_phi1.shape[0]
      w1 -= learning_rate * gradient

      spt_pred2 = training_phi2.dot(w2)
      gradient = -2 * training_phi2.T.dot(training_spt2 - spt_pred2) / training_phi2.shape[0]
      w2 -= learning_rate * gradient

      spt_pred3 = training_phi3.dot(w3)
      gradient = -2 * training_phi3.T.dot(training_spt3 - spt_pred3) / training_phi3.shape[0]
      w3 -= learning_rate * gradient

      spt_pred4 = training_phi4.dot(w4)
      gradient = -2 * training_phi4.T.dot(training_spt4 - spt_pred4) / training_phi4.shape[0]
      w4 -= learning_rate * gradient

      spt_pred5 = training_phi5.dot(w5)
      gradient = -2 * training_phi5.T.dot(training_spt5 - spt_pred5) / training_phi5.shape[0]
      w5 -= learning_rate * gradient

      spt_pred6 = training_phi6.dot(w6)
      gradient = -2 * training_phi6.T.dot(training_spt6 - spt_pred6) / training_phi6.shape[0]
      w6 -= learning_rate * gradient

      spt_pred7 = training_phi7.dot(w7)
      gradient = -2 * training_phi7.T.dot(training_spt7 - spt_pred7) / training_phi7.shape[0]
      w7 -= learning_rate * gradient

      spt_pred8 = training_phi8.dot(w8)
      gradient = -2 * training_phi8.T.dot(training_spt8 - spt_pred8) / training_phi8.shape[0]
      w8 -= learning_rate * gradient

      spt_pred9 = training_phi9.dot(w9)
      gradient = -2 * training_phi9.T.dot(training_spt9 - spt_pred9) / training_phi9.shape[0]
      w9 -= learning_rate * gradient

      spt_pred10 = training_phi10.dot(w10)
      gradient = -2 * training_phi10.T.dot(training_spt10 - spt_pred10) / training_phi10.shape[0]
      w10 -= learning_rate * gradient

      spt_pred11 = training_phi11.dot(w11)
      gradient = -2 * training_phi11.T.dot(training_spt11 - spt_pred11) / training_phi11.shape[0]
      w11 -= learning_rate * gradient

   w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11 = np.array(w1), np.array(w2), np.array(w3), np.array(w4), np.array(w5), np.array(w6), np.array(w7), np.array(w8), np.array(w9), np.array(w10), np.array(w11)
   
   mape = 0
   for i, data in enumerate(valid_phi):
      if (data[0] == 11526383):
         prediction = np.dot(data[1:], w1)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 12923910):
         prediction = np.dot(data[1:], w2)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 14699420):
         prediction = np.dot(data[1:], w3)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 15437705):
         prediction = np.dot(data[1:], w4)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 15642911):
         prediction = np.dot(data[1:], w5)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 16298357):
         prediction = np.dot(data[1:], w6)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 17331999):
         prediction = np.dot(data[1:], w7)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 17593883):
         prediction = np.dot(data[1:], w8)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 18733920):
         prediction = np.dot(data[1:], w9)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 18791093):
         prediction = np.dot(data[1:], w10)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])
      elif (data[0] == 19473413):
         prediction = np.dot(data[1:], w11)
         mape += np.abs((valid_spt[i] - prediction) / valid_spt[i])

   mape /= validation_dataset_num
   mape *= 100
   print("mape:", mape)

   for i, data in enumerate(testing_datalist):
      d = [1, sigmoid((float(testing_datalist[i][2]) - temp_avg) / temp_std),
               sigmoid((float(testing_datalist[i][3]) - heart_avg) / heart_std), 
               sigmoid((float(testing_datalist[i][4]) - resp_avg) / resp_std),
               sigmoid((float(testing_datalist[i][5]) - o2_avg) / o2_std)]
      if (data[0] == '11526383'):
         output_datalist.append(np.dot(d[:], w1)) 
      elif (data[0] == '12923910'):
         output_datalist.append(np.dot(d[:], w2)) 
      elif (data[0] == '14699420'):
         output_datalist.append(np.dot(d[:], w3)) 
      elif (data[0] == '15437705'):
         output_datalist.append(np.dot(d[:], w4)) 
      elif (data[0] == '15642911'):
         output_datalist.append(np.dot(d[:], w5)) 
      elif (data[0] == '16298357'):
         output_datalist.append(np.dot(d[:], w6)) 
      elif (data[0] == '17331999'):
         output_datalist.append(np.dot(d[:], w7)) 
      elif (data[0] == '17593883'):
         output_datalist.append(np.dot(d[:], w8)) 
      elif (data[0] == '18733920'):
         output_datalist.append(np.dot(d[:], w9)) 
      elif (data[0] == '18791093'):
         output_datalist.append(np.dot(d[:], w10)) 
      elif (data[0] == '19473413'):
         output_datalist.append(np.dot(d[:], w11)) 
      else:
         print('hi')


GradientDescent()

mape: [11.83468206]


### Output your Prediction

> your filename should be **hw1_advanced.csv**

In [1893]:
with open(output_dataroot, 'w', newline='', encoding="utf-8") as csvfile:
  writer = csv.writer(csvfile)
  for row in output_datalist:
    writer.writerow(row)

# Report *(5%)*

Report should be submitted as a pdf file **hw1_report.pdf**

*   Briefly describe the difficulty you encountered
*   Summarize your work and your reflections
*   No more than one page






# Save the Code File
Please save your code and submit it as an ipynb file! (**hw1.ipynb**)