# Gradient Descent Visualizer for Multi-Linear Regression

## Introduction

In this project, we hope to create a visualizer for gradient descent with multilinear regression. In order to complete this project, we must:

    - Obtain a dataset. In order to make using the tool a simple experience, we will simply generate a dataset 
    based on how many features the user wishes to visualize. There will be 100 training examples by default,
    however, the user may choose how many training examples they wish to go through
    
    - Gather values for our parameters in order to use them for our equations (such as the constant bias, 
    factor (explained later), and the variability of our factor
    
    - Create various graphs for the user and animate them. If the user wishes to watch a single step, 
    they may pause the animation and go step by step. The graphs shown will be the cost function graph, the linear regression plot, the topological cost graph, and the individual graph for each variable with the other 
    variable remaining constant (more on that later).
    
When plotting our current $w$ and $b$ values on the cost function graph, we will display the current point as a red ball of sorts, in order to display how gradient descent works, similar to a ball rolling down a hill.

Our goal is to allow anyone to easily visualize how gradient descent is carried out.

## Generating a dataset

We must begin by generating a dataset to perform gradient descent on. Given the amount of features the user desires, we will generate a dataset with that many inputs. In order to generate our inputs, we will generate random numbers between a given range by the user (or a randomly generated range if no user input is given). We will also generate random values for $\vec{w}$, a vector with length $j$, which is the number of features in our dataset.

Once we generate our factors, we will create a $i\times j$ matrix, where $i$ is the number of training examples, and $j$ is the number of features. When going through each feature in a specific training example, we will generate the random x values and place them in our dataset. Additionally, we will generate a constant bias between 0 and 100 if no user input for it is given. For our outputs, $y=\vec{w} \cdot \vec{x} + b$.

In [61]:
import pandas as pd
import numpy as np
import sklearn as sk
import sympy as smp
import random as rd

def generate_current_vector_w (vector_w, variability):
    current_vector_w = []
    for w in vector_w:
            mu, sigma = w, w * variability
            new_factor = np.random.normal(mu, sigma, 1)
            current_vector_w.append(new_factor)
    return current_vector_w

def generate_dataset (number_training_examples, number_features, variability, bias):
    ## We begin by generating a range for us to generate random x-values with, as well as generating a vector for
    ## all of our w values, and by generating an empty dataset
    
    ranges = np.round(np.random.uniform(1, 10000, number_features))
    vector_w = np.round(np.random.uniform(0, 100, number_features), 1)
    dataset = np.empty((number_training_examples, number_features))
    
    ## We now generate the values for our training examples and assign them into our dataset
    for i in range (number_training_examples):
        for j in range (number_features):
            dataset[i][j] = rd.uniform(0, ranges[j])
        
    ## We now create our vector for our outputs
    vector_y = []
    
    for training_example in dataset:
        current_vector_w = generate_current_vector_w (vector_w, variability)
        y = np.dot(training_example, current_vector_w) + bias
        vector_y.append(y)
        
    vector_y = np.array(vector_y)
        
    return dataset, vector_y

## Regression Line and Cost Function

We must now create our algorithm for our cost function. Our linear regression line equation is as follows: <h3><center>$$f_{\vec{w},b}(\vec{x})=\vec{w}\cdot\vec{x} + b$$</center></h3>

For this cost function, we will use the squared-error cost function. This function is as follows, where $i$ is the current training example, and $m$ is the total number of training examples: <h1><center>$$J(\vec{w},b)=\frac{1}{2m}\sum_{i=1}^{m}(f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^2$$</center></h1>

In [60]:
def linear_regression_line (w, x, b):
    """
    Returns our predicted value of y
    """
    return np.dot(w, x) + b

def cost_function(w, b, dataset, output):
    """
    Calculates the cost of our current w and b
    """
    cost = 0
    shape = dataset.shape
    rows = shape[0]
    for i in range (rows):
        cost = cost + (linear_regression_line(w, dataset[i], b) - output[i]) ** 2
    return cost/2



## Gradient Descent

We must now create our algorithm for gradient descent in order to find optimal values for $\vec{w}$ and $b$. We will first begin with our gradient descent, which is as follows, where $\alpha$ is our learning rate, and $w_{j}$ is equal to the $j^{th}$ feature's $w$ value: 

<h1><center>$$w_{j}=w_{j}-\alpha\frac{\partial}{\partial w_j}J(\vec{w},b)$$</center></h1> 

<h1><center>$$b=b-\alpha\frac{\partial}{\partial b}J(\vec{w},b)$$</center></h1> 

Once we expand our equation by substituting our cost function into the formulas above, we get:

<h1><center>$$w_{j}=w_{j}-\alpha\frac{1}{m}\sum_{i=1}^{m}(f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})x_j^{(i)}$$</center></h1> 

<h1><center>$$b=b-\alpha\frac{1}{m}\sum_{i=1}^{m}(f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})$$</center></h1>

We are now able to program this function.

In [None]:
def gradient_descent (vector_w, bias, dataset, output):
    