<h1>
<center>
Assignment 2: Classification and locally weighted regression
</center>
</h1>
<center>
CS 4262/5262 - Foundations of Machine Learning<br>
Vanderbilt University, Spring 2023<br>
Due: Check Brightspace
</center>
<hr>
<br>This assignment will focus on logistic regression (for binary classification) and locally weighted linear regression. For each algorithm, we have provided a class framework as a suggestion, but you are not required to use those in your implementation. Please use good programming practices - include informative comments and vectorize operations whenever possible. In addition to programming tasks, there are short-answer questions throughout the notebook. 

Contact: Quan Liu quan.liu@vanderbilt.edu for any clarifying questions.

### Please enter your name:  Manda Li
---

In [None]:
import csv
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
import scipy as sp
from sklearn import datasets
import pandas as pd

--- 
## Part 0: Data


You will be applying binary classification to two different datasets: the [Iris](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset) dataset and the wine quality dataset (Data Source :https://archive.ics.uci.edu/ml/datasets/Wine+Quality). The Iris dataset is smaller and simpler, and therefore may be useful for debugging. This dataset consists of measurements (septal and petal length and width) of 50 samples from each of 3 species of Iris flower. The wine quality dataset is more complex, and the classification task is to predict whether a sample should be red wine or white wine given the feature.

**Task 1**
- Load the Iris dataset from scikit-learn. (refer to [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html))
- Here, we will represent each sample by 2 of the 4 available features: petal width and petal length. 
- Display a scatterplot of the data, such that: 
    * the x- and y- axes correspond to the two features (petal width, petal length)
    * the axes are labelled 
    * points are colored according to class membership
    * the legend describes which iris type (class) is represented by each color

**Question 1:  Which classes appear to be linearly separable in this feature space?**

Response: The setosa class is linearly separable.

In [None]:
#TODO - Iris dataset
from sklearn.datasets import load_iris
iris_data=load_iris()

#iris_data.items()
iris_xy=iris_data.data[:, 2:]
z=iris_data.target
figure, iris = plt.subplots()
species_types = iris_data.target_names
colors = ['pink', 'blue', 'purple']
markers = ['o', 'x', '*']
for species, color, marker in zip(range(3), colors, markers):
    inx = [i for i in range(len(z)) if z[i] == species]
    x = [iris_xy[i][0] for i in inx]
    y = [iris_xy[i][1] for i in inx]
    iris.scatter(x, y, c=color, marker=marker, label=species_types[species])
iris.legend()
plt.show()

#plt.scatter(x[:,1], x[:,0],c=z,cmap=plt.cm.Set1)
#plt.xlabel("Petal width")
#plt.ylabel("Petal length")
#plt.legend(z)

**Task 2**
- Load the wine dataset given in the brightspace.
    * we have 1600 lines of white wine data and 1599 lines of red wine data
    * white/red wine is labeled as 0/1
    * each sample has 11 dimensions of features with the same order as [fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol] and one dimension of label
- Here, we will represent each sample by 3 features (using mpl_toolkits.mplot3d.Axes3D): 
- Similar to the Iris dataset, display a scatterplot of the data such that: "volatile acidity", "fixed acidity", and "residual sugar".
    * the x-, y-, and z- axes correspond to the features
    * the axes are labelled 
    * the sample point is colored based on the class
    * the legend specifies the label associated with each color

In [None]:
#TODO - wine quality dataset
%matplotlib notebook
from mpl_toolkits.mplot3d import Axes3D

data_wine=pd.read_csv("assignment2-wine_quality.csv")
data_wine.head()
va=data_wine["volatile acidity"]
va.head(10)
fa=data_wine["fixed acidity"]
fa.head(10)
rs=data_wine["residual sugar"]
rs.head(10)
l=data_wine["label"]
figure = plt.figure()
wine = figure.add_subplot(111, projection='3d')
wine.scatter(va, fa, rs,c=l)
wine.set_xlabel("volatile acidity")
wine.set_ylabel("fixed acidity")
wine.set_zlabel("residual sugar")
wine.legend(l)
plt.show()

**Task 3**

There are many dimensions of the features, use `sns.PairGrid()` to plot out the pairwise feature relationship on both iris and wine dataset. 

In [None]:
#TODO pairwise plot on 2 dataset
import seaborn as sns

iris = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
iris.head()
iris=iris[["petal length (cm)","petal width (cm)"]]
g1 = sns.PairGrid(iris)
g1.map_upper(sns.scatterplot)
g1.map_diag(sns.histplot)
g1.map_lower(sns.lineplot)

datawine=data_wine[["volatile acidity","fixed acidity","residual sugar"]]
w1 = sns.PairGrid(datawine)
w1.map(sns.scatterplot)




**Question 2:**

 2.1 Comment on the plots from the wine dataset, compared to the kinds of plots you saw in the Iris dataset. What similarities or differences do you see? What does the PairGrid visualization help to do?

Response: The two features in the iris dataset seems dependent on each other, while the features of wine dataset is hard to see clear relationship. The parigrid compares multiple plots at the same time to see which features are more related.

 2.2 Discuss separability of the wine dataset, based on what you have seen so far. Is it separable in two features? Do you think multiple features would change this outcome?

Response: It is not separable in two features, if multiple feature added, it will change the outcome depends on what features are considered.

---
## Part 1: Logistic Regression

The first classification algorithm you will implement is Logistic Regression (for binary classification). You do not have to use the class framework provided below, but please make sure to organize and comment your code clearly. 

**Task 4**
Write a LogisticRegression class such that:
 - parameters ($\theta$) are optimized using gradient descent 
 - there is an `evaluate` method that returns the model's accuracy on a given set of data
 - there is a `learning curve` method that plots the cost function against the number of iterations
 - there is a `decision boundary` method that renders a plot of the training data with the decision boundary overlayed (note: this code is provided for you below - make sure you understand how it works) 
 - please vectorize operations as much as possible

In [None]:
#TODO - implement LogisticRegression class

class LogisticRegression():
    
    def __init__(self, X, y, theta, alpha):
        self.X = X
        self.y = y
        self.theta = theta 
        self.alpha = alpha
    
    #  h (hypothesis): returns p(y=1|x) on inputs contained in the design matrix X
    def sigmoid(self, X): 
        #h=np.dot(X, self.theta)
        return 1 / (1 + np.exp(-X))
    
    # return predictions of class membership (0,1) of the datapoints in an input matrix X
    def predict(self, X):
        z=np.dot(X,self.theta)
        h=self.sigmoid(z)
        #print(h.shape)
        return np.round(h)
    
    # cost function J()
    def cost(self):
        y=self.y
        h=self.predict(self.X)
        h = np.clip(h, 1e-7, 1 - 1e-7)
        return -y * np.log(h) - (1 - y) * np.log(1 - h)
    
    # update theta 
    def gradient_descent_step(self):
        X=self.X
        y=self.y
        h=self.predict(self.X)
        #print(h.shape)
        #print(y.size)
        h=h.reshape((-1,1))
        si=np.dot(X.T, (h - y))
        #print((h-y).shape)
        return np.dot(X.T, (h - y)) / y.size
    
    # define a convergence criterion 
    # run gradient descent until convergence 
    def run_gradient_descent(self):
        cost=[]
        print("start descent")
        for iteration in range(10000):
            gradient = self.gradient_descent_step()
            #print(gradient.shape)
            #print((self.alpha * gradient).shape)
            self.theta=self.theta.reshape(-1,1)
            #print(self.theta.shape)
            self.theta -= self.alpha * gradient
            current_cost=self.cost()
            cost.append(current_cost)
            #self.decision_boundary('iris')
        #print("Cost",cost[0:5])
        #dset="iris"
        #self.decision_boundary(dset)
        #print("called")
        accuracy=self.evaluate(self.X,self.y)
        print("Training accuracy:",accuracy)
        return self.theta,np.array(cost)
    
    # return the model's accuracy on an input (X,y) dataset 
    def evaluate(self, X, y):
        y_pred=self.predict(X)
        accuracy = np.mean(y_pred == y)
        #print("Test Accuracy:",accuracy)
        return accuracy
    
    # plot cost function over num gradient descent steps
    def learning_curve(self, losses):
        #losses=losses.reshape(10000,100)
        plt.figure()
        plt.plot(losses)
        plt.xlabel('Number of gradient descent steps')
        plt.ylabel('Cost')
        plt.title('Learning curve')
        plt.show()
        return
    
    # plot decision boundary, based on current model parameters
    # you may edit or add cases to this, to accommodate plotting the Iris data too
    def decision_boundary(self, dset):
        self.evaluate(self.X,self.y)
        X = self.X[:,1:]
        #print("X in")
        theta = [t[0] for t in self.theta]
        y = np.reshape(self.y, (-1))
        xax = [np.min(X[:, 0]), np.max(X[:, 0])]
        yax = -1.0*(theta[0] + np.dot(theta[1], xax)) / theta[2]
        plt.figure()
        plt.scatter(x=X[y==0,0],y=X[y==0,1],c='red',edgecolor='black')
        plt.scatter(x=X[y==1,0],y=X[y==1,1],c='blue',edgecolor='black')
        plt.plot(xax, yax)
        if dset=='wine':
            plt.legend(['red','white', 'decision boundary'])
            plt.xlabel('fixed acidity')  # name it as the your input x- and y-
            plt.ylabel('residual sugar')
            plt.title('Wine')
        elif dset=='iris':
            plt.legend(['decision boundary','setosa','versicolor'])
            plt.xlabel('petal width')
            plt.ylabel('petal length')
            plt.title("Iris Dataset")            
        plt.show()

**Task 5**
Verify that your method works on the Iris dataset. The Iris dataset is originally a 3-class dataset, but for this purpose, please select two of the 3 classes on which to perform binary classification (and again, use the 2 features "petal length" and "petal width"). You do not have to split this dataset further into training and testing sets.
 - Display the decision boundary, superimposed on the scatterplot of the data
 - Add/modify the `decision_boundary` function if needed to accommodate changes in plotting for the Iris dataset.

In [None]:
# TODO Task 5
#iris_new=iris.query("species=='setosa'")
#iris.head()
target = pd.DataFrame(z)
#target.head()
target = target.rename(columns = {0: 'species'})
target.head()
iris_fin = pd.concat([iris, target], axis = 1)
#iris_fin.head()
iris_final=iris_fin[iris_fin['species']<=1]
#iris_final.tail()
features=iris_final[["petal length (cm)","petal width (cm)"]]
x_input=np.array(features)
x_input = np.hstack((np.ones((x_input.shape[0], 1)), x_input))
#print(matrix)
y_input=iris_final[["species"]]
y_input=np.array(y_input)
#print(y_input)
theta = np.zeros(x_input.shape[1])
iris_lr= LogisticRegression(x_input,y_input,theta,0.01)
final_theta, cost=iris_lr.run_gradient_descent()
iris_lr.decision_boundary("iris")
print(cost.shape)
cost=cost.reshape(-1,100)
#print(cost[0:5])
#accuracy=iris_ls.evaluate()
iris_lr.learning_curve(cost)
#print(x_input.shape)
#print(theta.shape)
#print(theta)


**Task 6**
Explore your method on the wine dataset, expanding from 2 dimensions into multiple dimensions.
 - Split the wine dataset into a training set and a test set (80/20 split). We recommend shuffling the data first.
 - Then, perform feature scaling (standardizing to mean = 0 and variance = 1) on both the training and test sets. Please write your own function to perform this standardization, rather than using a module from scikit-learn. Note that it is recommended to calculate the scaling parameters (mean and variance) from the training set, and then apply those same paramters to scale the test set, so that the test set does not influence the training in any way. 
 - **we are not expecting to get 100% accuracy on any of the feature combinations**, but an empirical lower bound for the accuracy is given. That is to say, your approach is probably right, as long as your performance on the test set is higher than the number.
 - Train your model on the wine training data with the following 4 [feature combinations] : percentage to beat during test
   * [fixed acidity, volatile acidity, residual sugar] : 85%
   * [density, pH, alcohol] : 75%
   * [fixed acidity, volatile acidity, chlorides] : 85%
   * [all 11 features]: 95%
   * note that the features list is: fixed acidity/volatile acidity/citric acid/residual sugar /chlorides/free sulfur dioxide/total sulfur dioxide/density/pH/sulphates/alcohol
 - Display the decision boundary plots (plot in 2d, so please just choose any 2 of your features as x- and y-). 
 - Display plots of the learning curve 
 - Report the model's final accuracy on the test set

In [None]:
#TODO - Task 6, apply your method to the wine dataset
from sklearn.model_selection import train_test_split

data_wine = data_wine.sample(frac=1).reset_index(drop=True)
#data_wine.tail()
train_set, test_set = train_test_split(data_wine, test_size=0.2, random_state=42)
train_3features=train_set[["fixed acidity","volatile acidity", "residual sugar","label"]]
train_3featuresDPA=train_set[["density","pH", "alcohol","label"]]
train_3featuresFVC=train_set[["fixed acidity","volatile acidity", "chlorides","label"]]
train_2features=train_set[["fixed acidity", "residual sugar","label"]]
#test_set.head()
train_x3 = train_3features.drop("label", axis=1)
train_x3DPA = train_3featuresDPA.drop("label", axis=1)
train_x3FVC = train_3featuresFVC.drop("label", axis=1)
train_x2=train_2features.drop("label", axis=1)
#train_x3.head()
train_x11=train_set.drop("label", axis=1)
#train_x11.head()
train_y= train_set["label"]
test_x11 = test_set.drop("label", axis=1)
test_y = test_set["label"]
test_3features=test_set[["fixed acidity","volatile acidity", "residual sugar","label"]]
test_3featuresDPA=test_set[["density","pH", "alcohol","label"]]
test_3featuresFVC=test_set[["fixed acidity","volatile acidity", "chlorides","label"]]
test_2features=test_set[["fixed acidity", "residual sugar","label"]]

test_x3 = test_3features.drop("label", axis=1)
test_x3DPA = test_3featuresDPA.drop("label", axis=1)
test_x3FVC = test_3featuresFVC.drop("label", axis=1)
test_x2 = test_2features.drop("label", axis=1)
#test_x11.head()
#test_y.head()
#train_y.head()
#test_x3.head()
train_x11_mean = np.mean(train_x11, axis=0)
train_x11_std = np.std(train_x11, axis=0)
train_x11 = (train_x11 - train_x11_mean) / train_x11_std

test_x11 = (test_x11 - train_x11_mean) / train_x11_std

train_x3_mean = np.mean(train_x3, axis=0)
train_x3_std = np.std(train_x3, axis=0)
train_x3 = (train_x3 - train_x3_mean) / train_x3_std

test_x3 = (test_x3 - train_x3_mean) / train_x3_std

train_x3DPA_mean = np.mean(train_x3DPA, axis=0)
train_x3DPA_std = np.std(train_x3DPA, axis=0)
train_x3DPA = (train_x3DPA - train_x3DPA_mean) / train_x3DPA_std

test_x3DPA = (test_x3DPA - train_x3DPA_mean) / train_x3DPA_std

train_x3FVC_mean = np.mean(train_x3FVC, axis=0)
train_x3FVC_std = np.std(train_x3FVC, axis=0)
train_x3FVC = (train_x3FVC - train_x3FVC_mean) / train_x3FVC_std

test_x3FVC = (test_x3FVC - train_x3FVC_mean) / train_x3FVC_std



train_x2_mean = np.mean(train_x2, axis=0)
train_x2_std = np.std(train_x2, axis=0)
train_x2 = (train_x2 - train_x2_mean) / train_x2_std

test_x2 = (test_x2 - train_x2_mean) / train_x2_std


In [None]:
train_x3=np.array(train_x3)
train_x3DPA=np.array(train_x3DPA)
train_x3FVC=np.array(train_x3FVC)
train_x11=np.array(train_x11)
train_x2=np.array(train_x2)
train_x3 = np.hstack((np.ones((train_x3.shape[0], 1)), train_x3))
train_x3DPA = np.hstack((np.ones((train_x3DPA.shape[0], 1)), train_x3DPA))
train_x3FVC = np.hstack((np.ones((train_x3FVC.shape[0], 1)), train_x3FVC))
train_x11 = np.hstack((np.ones((train_x11.shape[0], 1)), train_x11))
train_x2 = np.hstack((np.ones((train_x2.shape[0], 1)), train_x2))
test_x3=np.array(test_x3)
test_x3 = np.hstack((np.ones((test_x3.shape[0], 1)), test_x3))
test_x3DPA=np.array(test_x3DPA)
test_x3DPA = np.hstack((np.ones((test_x3DPA.shape[0], 1)), test_x3DPA))
test_x3FVC=np.array(test_x3FVC)
test_x3FVC = np.hstack((np.ones((test_x3FVC.shape[0], 1)), test_x3FVC))
test_x11=np.array(test_x11)
test_x11 = np.hstack((np.ones((test_x11.shape[0], 1)), test_x11))
test_x2=np.array(test_x2)
test_x2 = np.hstack((np.ones((test_x2.shape[0], 1)), test_x2))
train_y=np.array(train_y)
train_y=train_y.reshape(-1,1)
test_y=np.array(test_y)

theta3 = np.zeros(train_x3.shape[1])
theta11 = np.zeros(train_x11.shape[1])
theta2 = np.zeros(train_x2.shape[1])

print(theta3.shape)
print(theta11.shape)
print(theta2.shape)
#print(train_y)
wine_lr3= LogisticRegression(train_x3,train_y,theta3,0.01)
final_theta3, cost3=wine_lr3.run_gradient_descent()
accuracy=wine_lr3.evaluate(test_x3,test_y)
print("Test Accuracy fixed acidity, volatile acidity, residual sugar:",accuracy)
wine_lr3DPA= LogisticRegression(train_x3DPA,train_y,theta3,0.01)
final_theta3DPA, cost3DPA=wine_lr3DPA.run_gradient_descent()
accuracy=wine_lr3DPA.evaluate(test_x3DPA,test_y)
print("Test Accuracy density, pH, alcohol:",accuracy)
wine_lr3FVC= LogisticRegression(train_x3FVC,train_y,theta3,0.01)
final_theta3FVC, cost3FVC=wine_lr3FVC.run_gradient_descent()
accuracy=wine_lr3FVC.evaluate(test_x3FVC,test_y)
print("Test Accuracy fixed acidity, volatile acidity, cholrides:",accuracy)
wine_lr11= LogisticRegression(train_x11,train_y,theta11,0.01)
final_theta11, cost11=wine_lr11.run_gradient_descent()
accuracy=wine_lr11.evaluate(test_x11,test_y)
print("Test Accuracy 11 FEATURES:",accuracy)
#wine_lr3.decision_boundary("wine")
wine_lr2= LogisticRegression(train_x2,train_y,theta2,0.01)
final_theta2, cost2=wine_lr2.run_gradient_descent()
accuracy=wine_lr2.evaluate(test_x2,test_y)
print("Test Accuracy 2 features:",accuracy)
wine_lr2.decision_boundary("wine")
#print(cost2.shape)
cost2=cost2.reshape(-1,2559)
#print(cost2.shape)
wine_lr2.learning_curve(cost2)


**Question 3:**

 3.1. Describe the convergence condition you selected.

Response: I selected theta of all 0 as begining and learning rate at 0.01 and stop the gradient descent in 10000 iterations.

 3.2. What was the model's training accuracy on the Iris dataset (for the two classes you selected)?

Response:the training accuray for iris was 1.

 3.3. What was the model's training and test accuracy on the wine quality dataset? Which one gives the best performance? Does that live up to your expectation and why?

Response: The training accuracy was over 0.85 for most cases except around 0.75 for density,ph,alcohol case, and test accuracy was about 0.5 in 10000 iterations. The 11 features give the best performance and this fits my expectation because I assumed more features considered the more accurate the result will be.

---
## Part 2: Locally Weighted Linear Regression 

In this second part, you will write a locally weighted linear regression class, and apply it to a synthetic dataset. This dataset is included as a text file on Brightspace, and is called 'LWR_samples.npy'. Each line of the text file represents one training example in the format $x^{(i)},y^{(i)}$ (i.e. the delimiter is a comma). 

#### **Task 7**
- Load the synthetic data, from the file `assignment2_LWR_samples.npy`
- Interpret the $(x^{(i)},y^{(i)})$ pairs, and plot them with a scatter plot.
- Implement a LocallyWeightedLR class (example framework below). To make a prediction at input $x$, weight each training example according to the function we discussed in lecture: 
$$ w^{(i)} = \exp\big(-\frac{(x^{(i)} - x)^2}{2\tau^2} \big), $$
where $\tau$ is a bandwidth parameter that you will experiment with.
- To compute the local linear regression parameters ($\theta$) at each query point, use the closed-form solution. The formula is:
$$ \theta = (X^TWX)^{-1} X^TWy, $$
where $X$ is the design matrix formed by your training inputs (make sure to include the intercept term), $W$ is a diagonal matrix whose $i^{th}$ diagonal entry corresponds to the weight of the $i^{th}$ training example (which depend on the point at which you are making a prediction), and $y$ is a column vector containing the target labels of the training examples.

- Run this regression model to make predictions at the specific input points x = 4, x = 0.5, and x = -3. Use $\tau$ = 0.5. Report the values of the local regression parameters $\theta$ obtained for each of these 3 points.
- Now, generate an array of predictions corresponding to equally spaced input points (in the range of [-4.5, 4.5] in steps of 0.05), again using $\tau$ = 0.5. Generate a plot showing the predictions from Locally Weighted Linear Regression on each of these input points, superimposed on (and colored differently from) the training data.
- Repeat the previous step, now using bandwidth parameters $\tau = 0.1$ and $\tau = 1.5$. Plot the results, again superimposed on the training data (and in a different color).

In [None]:
#TODO - Implement Locally-Weighted Linear Regression class

class LocallyWeightedLR():
    
    def __init__(self, X, y, tau):
        self.X = X
        self.y = y
        self.tau = tau 
        
    # use bandwidth variable tau to compute weights for each training point.  
    # return a diagonal matrix with w_i on the diagonal (for vectorization)
    # note that the values of w_i depend upon the value of the input query point x.
    def compute_weights(self, x):
        n = self.X.shape[0]
        W = np.zeros((n, n))
        for i in range(n):
            diff = self.X[i] - x
            W[i, i] = np.exp(-diff.dot(diff) / (2 * self.tau * self.tau))
   
        return W
    
    # analytical solution for the local linear regression parameters at the input query point x.
    # this should involve calling the above method compute_weights.
    def compute_theta(self,x):
        W = self.compute_weights(x)
        X_tilde = np.hstack((np.ones((self.X.shape[0], 1)), self.X))
        theta = np.linalg.inv(X_tilde.T.dot(W).dot(X_tilde)).dot(X_tilde.T).dot(W).dot(self.y)
        return theta
    
    # prediction for an input x
    # also return the local linear regression parameters (theta) for this x.
    def predict(self, x):
        theta = self.compute_theta(x)
        x_tilde = np.hstack((1, x))
        y_pred = x_tilde.dot(theta)
        return y_pred,theta

In [None]:
#TODO - Read in the artificial dataset, plot it, and run the code according to the above instructions.
data=np.load("assignment2-LWR_samples.npy")
xi=data[:,0]
yi=data[:,1]
#print(data)

plt.figure()
plt.scatter(xi,yi)
plt.xlabel('xi')
plt.ylabel('yi')
plt.title('Data')
plt.show()
#print(xi.shape)
xi=xi.reshape(-1,1)
yi=yi.reshape(-1,1)
#print(yi.shape)
lwlr=LocallyWeightedLR(xi,yi,0.5)
y4,theta4=lwlr.predict(4)
print("y prediction",y4,"theta when x=4",theta4)
y5,theta5=lwlr.predict(0.5)
print("y prediction",y5,"theta when x=0.5",theta5)
y6,theta6=lwlr.predict(-3)
print("y prediction",y6,"theta when x=-3",theta6)
#x = np.linspace(-4.5, 4.5, num=181)
#print(x.shape)
#x=x.reshape(-1,1)
#y7,theta7=lwlr.predict(x)
predictions = []
for x in np.arange(-4.5, 4.55, 0.05):
    y_pred, _ = lwlr.predict(x)
    predictions.append(y_pred)
predictions = np.array(predictions)
plt.figure()
plt.plot(predictions)
plt.xlabel('x')
plt.ylabel('y_prediction')
plt.title('Prediction')
predictions2 = []
lwlr2=LocallyWeightedLR(xi,yi,0.1)
for x in np.arange(-4.5, 4.55, 0.05):
    y_pred, _ = lwlr2.predict(x)
    predictions2.append(y_pred)
predictions2 = np.array(predictions2)
plt.plot(predictions2)
predictions3 = []
lwlr3=LocallyWeightedLR(xi,yi,1.5)
for x in np.arange(-4.5, 4.55, 0.05):
    y_pred, _ = lwlr3.predict(x)
    predictions3.append(y_pred)
predictions3 = np.array(predictions3)
plt.plot(predictions3)
#plt.legend()


**Question 4**: 
 - Do the local linear regression parameters $\theta$ returned for the 3 input points (4, 0.5, -3) agree with what you expect, based on the training data in the neighborhood of those points? Why or why not?
 
Response: Yes,because the tend of slope increasing or decreasing was obvious in the neiborhood points, increasing at x=0.5 and decreasing at x=4,x=-3.


**Question 5:**  
 - Based on your observations, describe the effect of increasing and decreasing $\tau$, in the context of over/underfitting.
 
Response: When tau decrease, the model fits the training data better but have a risk of overfitting. When tau increase, the prediction tent to be general but have a risk of underfitting.

---
## Part 3: Submission 

Please upload a clean version of your work to Brightspace by the deadline.

Below, please acknowledge your collaborators as well as any resources/references (beyond guides to Python syntax) that you have used in this assignment: