# **Example Solutions for Lab 1**

##**Problem 1**
In this problem, we will explore the basic linear regression: $y_n=w_0 +w_1x_n$, where $n=1,\dots, N$ is the index of the data sample. Your task is to determine the appropriate values of $w_0$ and $w_1$ for the given data samples in Data_1.csv.

Requirments: 
*   You are required to use gradient descent algorithm to complete this problem.
*   You need to include the following four components in your lab report: (1) the codes, (2) the obtained appropriate value of $w_0$ and $w_1$, (3) the obtained training error, and (4) the obtained testing error. 

In [None]:
# Some useful Python libaries (feel free to import other libaries)
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from tensorflow.keras import layers, models, optimizers
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split 

# Obtain your data samples
data=pd.read_csv('sample_data/Lab1_1.csv') # you may need to change the path
x_data = data['x'].values
y_data = data['y'].values

# Generate training data (70% of the given data samples) and the testing data (30% of the given data samples). You can change to other percentage value as long as test_size <=0.3.
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.3) 

In [None]:
# Example Code
# Gradient-descent solution
# ground truth: -3, 5
model = models.Sequential()
model.add(layers.Normalization(input_shape=(1,), axis=None))
model.add(layers.Dense(1))
model.summary()
# Build learning model by using gradient-descent method
adam = optimizers.Adam(learning_rate=0.2)
model.compile(optimizer='adam',
              loss='mean_absolute_error',
              metrics=['mean_absolute_error'])

history = model.fit(x_train, y_train, epochs=1200, 
                    validation_data=(x_test, y_test))
# Visualize performance evaluation
plt.plot(history.history['mean_absolute_error'], label='training_error')
plt.plot(history.history['val_mean_absolute_error'], label = 'testing_error')
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.xlim([0, 1200])
plt.ylim([0, 8])
plt.legend(loc='lower right')

test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=2)
print(test_acc)
W = model.layers[1].get_weights()
print(W)

##**Problem 2**
In this problem, we will explore an extended linear regression: $y_n=w_0 +w_1x_n+w_2x_n^2$, where $n=1,\dots, N$ is the index of the data sample. Your task is to determine the appropriate values of $w_0$, $w_1$, and $w_2$ for the given data samples in Lab1_2.csv.

Requirments: 
*   You are required to use gradient descent algorithm to complete this problem. 
*   You need to include the following four components in your lab report: (1) the codes, (2) the obtained appropriate value of $w_0$, $w_1$, and $w_2$, (3) the obtained training error, and (4) the obtained testing error. 

In [None]:
# Example Code
# Ground truth: 2, 5, -3
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from tensorflow.keras import layers, models, optimizers
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split 

# Obtain your data samples
data=pd.read_csv('sample_data/Lab1_2.csv') # you may need to change the path
x_1 = data['x'].values
x_2 = np.power(x_1,2)
x_data = np.stack((x_1, x_2), axis=1)
y_data = data['y'].values

# Generate training data (70% of the given data samples) and the testing data (30% of the given data samples). You can change to other percentage value as long as test_size <=0.3.
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.3) 

# Gradient-descent solution
model = models.Sequential()
model.add(layers.Normalization(input_shape=(2,), axis=None))
model.add(layers.Dense(1))
model.summary()

# Build learning model by using gradient-descent method
adam = optimizers.Adam(learning_rate=0.5)
model.compile(optimizer='adam',
              loss='mean_absolute_error',
              metrics=['mean_absolute_error'])

history = model.fit(x_train, y_train, epochs=2000, 
                    validation_data=(x_test, y_test))

# Visualize performance evaluation
plt.plot(history.history['mean_absolute_error'], label='training_error')
plt.plot(history.history['val_mean_absolute_error'], label = 'testing_error')
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.xlim([0, 2000])
plt.ylim([0, 8])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=2)
print(test_acc)

W0 = model.layers[1].get_weights()
print(W0)

##**Problem 3**
In this problem, we will explore to use extended linear regression: $y_n=w_0 +\sum_{k=1}^Kw_kx_{n,k}$ to solve a real-world problem on stock forecasting. Your task is predict the Close value based on the Open, High, and Low values given in Lab1_3.csv.  
Requirments: 
*   You are required to use gradient descent algorithm to complete this problem. 
*   You need to include the following four components in your lab report: (1) the codes, (2) the obtained appropriate value of $w_0$, $w_1$, ..., $w_K$ , (3) the obtained training error, and (4) the obtained testing error. 

In [None]:
# Some useful hint. Please feel free to program without the hint
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from tensorflow.keras import layers, models, optimizers
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split 
# Import Data
data=pd.read_csv('sample_data/Lab1_3.csv')
x_data = data[['Open','High','Low']]
y_data = data['Close']
# Generate training data (70% of the given data samples) and the testing data (30% of the given data samples). You can change to other percentage value as long as test_size <=0.3.
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.3) 

In [None]:
# Example Code
# Gradient-descent solution
model = models.Sequential()
model.add(layers.Normalization(input_shape=(3,), axis=None))
model.add(layers.Dense(1))
model.summary()

# Build learning model by using gradient-descent method
adam = optimizers.Adam(learning_rate=0.3)
model.compile(optimizer='adam',
              loss='mean_absolute_error',
              metrics=['mean_absolute_error'])

history = model.fit(x_train, y_train, epochs=3000, 
                    validation_data=(x_test, y_test))

# Visualize performance evaluation
plt.plot(history.history['mean_absolute_error'], label='training_error')
plt.plot(history.history['val_mean_absolute_error'], label = 'testing_error')
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.xlim([0, 3000])
plt.ylim([0, 20])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=2)
print(test_acc)
W0 = model.layers[1].get_weights()
print(W0)