# Project 2 Report - Nick Vega

# Task 2A

In this task, we will use a perceptron to deal with a 2D classification problem.

Code was provided for generating the training set, test set, and plot.

![Given Data]()

## Task 2A.1 Perceptron

In this section, we are going to implement a binary classifier with the Perceptron class.

`class Perceptron(object)`

The methods in this class include the following:
- `__init__(self, T=1)`: In this method, the number of iterations T is defined.

- `fit(self, X, y)`: In this method, we train the perceptron model on data X with labels y and iteration T. We are given initailization code for the number of samples, the number of features, a weight vector, and a bias variabel. In the code we implemented, we conduct an iteration up to the number of iterations T. Within each iteration, we conduct an iteration up to the number of samples. In the inner iterations, we check if the predicted class for X[i] is not equal to y[i]. If this is true we add y[i] * X[i] to the weight and y[i] to the bias. 

- `project(self, X)`: In this method, we project data X onto the learned hyperplane with weights w and bias b. The code works by computing the dot product of X and the weight vector and adding the result to the bias vector.

- `predict(self, X)`: In this method, we predict class labels for samples in X using the project method defined above. The code works by calling the project function above which returns a projection vector. We then loop through each element in the vector and if the result is greater than or equal to 0, we set our prediciton to be 1, otherwise we set our prediction to be -1. We then return our predictions. 


Using the Perceptron class we have defined above, we set T=5 and fit the model using the given Xtrain and ytrain sets. We then predict on the given Xtest. We achieve an accuracy of 100.00% and plot the decision boundary below

![Initial Perceptron]()


This dataset is linearly separable and we are able to classify the data with 100% accuracy given the testing dataset. 

## Task 2A.2 Kernel Trick

In this section, we will build kernel functions and a kernel perceptron. We will then plot the decision boundary using the kernel perceptron class we created and for their corresponding kernel functions.

The decision function for the Kernel Perceptron is given by 

$$
f(\mathbf{x}) = \text{sign}\left(\sum_{i=1}^{n} \alpha_i y_i k(\mathbf{x}_i, \mathbf{x})\right)
$$

where k is the kernel function y_i are the labels and alpha_i are the learend weights. 

The kernel (Gram) matrix induced by the kernel function k over n data points is defined as 

$$
\mathbf{K}=
\left(\begin{array}{ccc} 
k(\mathbf{x}_1,\mathbf{x}_1) & \dots & k(\mathbf{x}_1,\mathbf{x}_n)\\
\vdots & \ddots & \vdots \\
k(\mathbf{x}_n,\mathbf{x}_1) & \dots & k(\mathbf{x}_n,\mathbf{x}_n)
\end{array}\right)
$$ 

Given a test data point **x**, the predicted label is

$$
\hat{y} = \text{sign}\left(\sum_{i=1}^{n} \alpha_i y_i k(\mathbf{x}_i, \mathbf{x})\right)
$$

We are given the following generated dataset below:

![Given Data 2]()


### KernelPerceptron

We first created a Kernel Perceptron class described below:

`class KernelPerceptron(object)`

The methods in this class include the following:

`__init__(self, kernel=PolynomialKernel(p = 1), T=1)`: In this method, we define a kernel, the number of iterations T, alpha, Xtrain, ytrain. 

`fit(self, X, y)`: In this method, we fill the Gram matrix defind below
$$
\mathbf{K}=
\left(\begin{array}{ccc} 
k(\mathbf{x}_1,\mathbf{x}_1) & \dots & k(\mathbf{x}_1,\mathbf{x}_n)\\
\vdots & \ddots & \vdots \\
k(\mathbf{x}_n,\mathbf{x}_1) & \dots & k(\mathbf{x}_n,\mathbf{x}_n)
\end{array}\right)
$$ 
by calling the kernel objected that is passed through the initialization of the KernelPerceptron. 

We then set the alpha values using one outer iteration up to the number of iterations T and one inner iteration up the number of samples. We then check if the function defined below 

$$
f(\mathbf{x}) = \text{sign}\left(\sum_{i=1}^{n} \alpha_i y_i k(\mathbf{x}_i, \mathbf{x})\right)
$$

is not equal to y[i]. If this is the case, we add 1 to alpha[i].

`project(self, X)`: In this method, we are creating a vector that stores the following result

$$
\text{projection} = \left(\sum_{i=1}^{n} \alpha_i y_i k(\mathbf{x}_i, \mathbf{x})\right)
$$

In the code, we achieve this using two iterations, the outer iteration being up to the number of rows in X and the inner iteration being up the number of rows in Xtrain. In each inner iteration we then compute the following 

$$
\alpha_i y_i k(\mathbf{x}_i, \mathbf{x})
$$

and add the result to projection[i].

`predict(self, X)`: In this method, we call the project method we have just described above with X. We then take the resulting projection vector and iterate through each element. For each element, we check if the projection of that element is greater than or equal to 0 and if so set the prediction for this element in our y_hat vector to be 1. Otherwise, we set the prediction for this element in our y_hat vector to be -1. 

### Kernel Functions

We will now create three Kernel functions for a `PolynomialKernel`, `GaussianKernel`, and `LaplaceKernel`. Each KernelFunction and their methods are defined below.

`PolynomialKernel`. The polynomial kernel function has two methods:

`__init__(self, p=1)`: In this method, p is the degree of the polynomial and set to 1 when not specified

`__call__(self, x, y)`: In this method, we implement the polynomial kernel function which is defined below

$$
k_{\text{poly}}(\mathbf{x},\mathbf{x}', d) = (1+\mathbf{x}^\top \mathbf{x}')^d
$$

In the code, the dot product of x transpose and y is added to 1, and then set to the pth exponent.

`GaussianKernel`. The gaussian kernel function has two methods:

`__init__(self, sigma=1)`: In this method, sigma is an important parameter in the RBF kernel function and is set to 5 as default. 

`__call__(self, x, y)`: In this method, we implement the gaussian kernel function which is defined below 

$$
k_{\text{RBF}}(\mathbf{x},\mathbf{x}', \sigma) = \exp\left(-\frac{\lVert \mathbf{x}-\mathbf{x'} \rVert^2_2}{2\sigma^2}\right)
$$

In the code, np.linalg.norm is used to subtract x and y. The norm is then divided by the denominator of 2 * sigma ** 2. Then, np.exp of the negative result is returned.

`LaplaceKernel`. The Laplace Kernel function has two methods: 

`__init__(self, sigma=1)`: In this method, sigma is an important parameter in the laplace kernel function and is set to 5 as default. 

`__call__(self, x, y)`: In this method, we implement the laplace kernel function which is defined below 

$$
k_{\text{laplace}}(\mathbf{x},\mathbf{x}', \sigma) = \exp\left(-\frac{\lVert \mathbf{x}-\mathbf{x'} \rVert _1}{\sigma}\right)
$$

In the code, np.linalg.norm with ord=1 is used to subtract x and y. Then the norm is divided by sigma, and then np.exp of the negative result is returned. 

### Kernel Perceptron Results

#### Is the data linearly separable?

We will know check if the given data is linearly separable by calling the KernelPerceptron class with the PolynomialKernel function where p=1. We train the KernelPerceptron using Xtrain and ytrain and predicton on Xtest.

Our model achieved an accuracy of :
- Accuracy: 47.50%

We then plot the decision boundary on the dataset as seen below

![Is Lin Sep?]()

#### More powerful kernels

We will now perform regression with the Polynomial Kernel, Gaussian Kernel, and Laplace Kernel functions.

**Polynomial Kernel**

We called the Kernel Perceptron class with the Polynomial Kernel function and set p = 2. The accuracy was 92.50% and the decision boundary is below:

![Poly p 2]()

We called the Kernel Perceptron class with the Polynomial Kernel function and set p = 3. The accuracy was 97.50% and the decision boundary is below:

![Poly p 3]()

We called the Kernel Perceptron class with the Polynomial Kernel function and set p = 4. The accuracy was 97.50% and the decision boundary is below:

![Poly p 4]()

**Gaussian Kernel**

We called the Kernel Perceptron class with the Gaussian Kernel function and set sigma = 1. The accuracy was 97.50% and the decision boundary is below:

![Gaus s 1]()

We called the Kernel Perceptron class with the Gaussian Kernel function and set sigma = 3. The accuracy was 97.50% and the decision boundary is below:


![Gaus s 3]()

We called the Kernel Perceptron class with the Gaussian Kernel function and set sigma = 5. The accuracy was 97.50% and the decision boundary is below:


![Gaus s 5]()

**Laplace Kernel**

We called the Kernel Perceptron class with the Laplace Kernel function and set sigma = 0.001. The accuracy was 97.50% and the decision boundary is below:

![Lap s 0.001]()

We called the Kernel Perceptron class with the Laplace Kernel function and set sigma = 0.0015. The accuracy was 97.50% and the decision boundary is below:


![Lap s 0.0015]()

We called the Kernel Perceptron class with the Laplace Kernel function and set sigma = 0.01. The accuracy was 97.50% and the decision boundary is below:

![Lap s 0.01]()

# Task 2B: Real-World Data Analysis: Seoul Bike Rental Data

In this task, we will analyze the SeoulBikeData.csv dataset, which provides information about bike rentals in Seoul. The dataset includes:
- **6 Features**: Weather-related conditions like temperature, humidity, and wind speed.
- **1 Time Feature**: Hour of the day.
- **Target**: The number of rented bikes, with the objective of predicting whether `Rented Bike Count > 500`.

## Step 1

1. **Load and Explore the Dataset**:
   - Load the `SeoulBikeData.csv` file using `pandas`.
   - Display descriptive statistics and visualize feature distributions (e.g., histograms, pair plots).


In this step, we loaded the SeoulBikeData.csv file using pandas.