<a href="https://colab.research.google.com/github/iacisme/Colab_Notebooks/blob/main/Regression_Notebook_P1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An introduction to regression

The purpose of this notebook is to gain an understanding of how `regression` works. Having an understanding of how `regression` works will give you the foundational knowledge needed to understand all other machine learning models. Almost all models used both in `classification` and `regression` are based on the principles of `regression`. 

Normally, in `machine learning` courses, you start off by learning `classification` models. Once we have a good understanding of regression, we will look at classification.

# Example 1 - Drawing a line segment

## Python Libraries

The following python libraries are required to run this example:

In [None]:
# Library to handle array and array-like objects
import numpy as np

In [None]:
# Library to create and use inter-active tools
import ipywidgets as widgets

In [None]:
# Plotly Express graphing library
import plotly.express as px

## Helper functions

Helper functions are used to create interactives controls and layout that encourages user interaction. 

### Interactive Controls

The following code is for a series of sliders that will be created, so the user can generate points to create a line.

In [None]:
# Slider widget to change the value of X1
x1 = widgets.IntSlider(value = 0,
                       min = -10,
                       max = 10,
                       step = 1,
                       description = 'X1:',
                       disabled = False,
                       continuous_update = False,
                       orientation = 'horizontal',
                       readout = True,
                       readout_format = 'd'
                      )

In [None]:
# Slider widget to change the value of Y1
y1 = widgets.IntSlider(value = 0,
                       min = -10,
                       max = 10,
                       step = 1,
                       description = 'Y1',
                       disabled = False,
                       continuous_update = False,
                       orientation = 'vertical',
                       readout = True,
                       readout_format = 'd'
                      )

In [None]:
# Slider widget to change the value of X2
x2 = widgets.IntSlider(value = 0,
                       min = -10,
                       max = 10,
                       step = 1,
                       description = 'X2:',
                       disabled = False,
                       continuous_update = False,
                       orientation = 'horizontal',
                       readout = True,
                       readout_format = 'd'
                      )

In [None]:
# Slider widget to change the value of Y2
y2 = widgets.IntSlider(value = 0,
                       min = -10,
                       max = 10,
                       step = 1,
                       description = 'Y2',
                       disabled = False,
                       continuous_update = False,
                       orientation = 'vertical',
                       readout = True,
                       readout_format = 'd'
                      )

### Organizing controls

These objects organizes the sliders, so it's clear to the user what slider does what

In [None]:
# Bundle controls for X1 and Y1 together
ui_1 = widgets.VBox([x1, y1])

In [None]:
# Bundle controls for X2 and Y2 together
ui_2 = widgets.VBox([x2, y2])

In [None]:
# Combine these controls into on common grouping
ui = widgets.HBox([ui_1, ui_2])

### Functions

This function will generate a graph, and allow the user to interact with it by allowing them to change the values of some points.

In [None]:
# Function draws a line segment based on the inputs provided by the user
def draw_simple_line(x_1, y_1, x_2, y_2):
    
    # Store the x variables in a list
    x = [x_1, x_2]
    # Store the y variables in a list
    y = [y_1, y_2]
    
    # Graph the line
    fig = px.line(x = x,
                  y = y,
                  
                  markers = True,
                 )
    
    # Mark the line points with a dot
    fig.update_traces(marker = dict(size = 12,
                                    line = dict(width = 2,
                                                color = 'DarkSlateGrey'
                                               )
                                   ),
                      
                       selector = dict(mode = 'markers')
                     )
    
    # Set the range of the X-axis
    fig.update_xaxes(range = [-10, 10])
    
    # Set the range of the Y-axis
    fig.update_yaxes(range = [-10, 10])
    
    # Annotate X1, Y1 point
    fig.add_annotation(x = x_1 -1, 
                       y = y_1,
                       text = "X1, Y1",
                       showarrow = False,
                       arrowhead = 5,
                      )
    
    # Annotate X2, Y2 point
    fig.add_annotation(x = x_2 + 1, 
                       y = y_2,
                       text = "X2, Y2",
                       showarrow = False,
                       arrowhead = 5,
                      )
    
    # Updated the gridspace
    fig.update_layout(height = 600,
                      width = 800,
                      
                      xaxis = dict(tickmode = 'linear',
                                   tick0 = -10,
                                   dtick = 1
                                  ),
                      
                      yaxis = dict(tickmode = 'linear',
                                   tick0 = -10,
                                   dtick = 1
                                  ),
                      
                      title_text = 'The shortest distance between two points is called a line',
                      
                      dragmode = 'drawline',
                     )
    
    # Add annotation capabilities to the visualization
    fig.show(config = {'modeBarButtonsToAdd':['drawline',
                                              'drawopenpath',
                                              'drawclosedpath',
                                              'drawcircle',
                                              'drawrect',
                                              'eraseshape'
                                             ]
                      }
            )

### Creating an output widget

The output widget organizes the inter-active controls and graph for user interaction.

In [None]:
# Object requires the function, and a dictionary that maps the controls
# to the inputs needed in the function
# https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html?highlight=interactive_output#More-control-over-the-user-interface:-interactive_output
out_line_segment = widgets.interactive_output(draw_simple_line, {'x_1' : x1, 'y_1' : y1, 'x_2' : x2, 'y_2' : y2})

## What is a line?

In `Euclidian` geometry, a line is defined as follows:

> A line has length but no width or thickness. 

Our study is on straight line only.

### A line segment

A line segment is a straight line bewteen two points. Run the following code to run an interactive graph that creates line segments based on user inputs.

As a minimum, in order to define a line, you need 2 points:

A = ($x_1$, $y_1$)

B = ($x_2$, $y_2$)

A line segments can be defined as the shortest distance bewteen these two points.

The following script will show you how to draw a line using python:

## Example 1 Drawing a line segment

In [None]:
# Display the graph where a user can enter points and it displays a line
display(ui, out_line_segment)

### Observable pattern

By knowing the `rise` and `run` of the line, better known as `the slope` of the line, in addition to the y-intercept (where the line crosses the y-axis) we can predict which points fall on this line.

Let's first understand the line equation.

# Example 2 - Drawing a line using the line equation

## Python Libraries

The following python libraries are required to run this example:

In [None]:
# Library to handle array and array-like objects
import numpy as np

In [None]:
# Library to create and use inter-active tools
import ipywidgets as widgets

In [None]:
# Plotly Express graphing library
import plotly.express as px

## Helper functions

### Interactive Controls

The following `widgets` will be used for Example \#2, in creating a line from a set of points:

In [None]:
# Slider widget to change the value of the slope
slope_slider = widgets.FloatSlider(value = 0,
                                   min = -10,
                                   max = 10,
                                   step = 0.5,
                                   description = 'Slope: ',
                                   disabled = False,
                                   continuous_update = True,
                                   orientation = 'horizontal',
                                   readout = True,
                                   readout_format = '.1f'
                                  )

In [None]:
# Slider widget to change the value of the slope
intercept_slider = widgets.FloatSlider(value = 0,
                                       min = -10,
                                       max = 10,
                                       step = 0.5,
                                       description = 'Intercept: ',
                                       disabled = False,
                                       continuous_update = True,
                                       orientation = 'vertical',
                                       readout = True,
                                       readout_format = '.1f'
                                      )

### Organizing controls

In [None]:
# Combine these controls into on common grouping
ui_line = widgets.VBox([slope_slider, intercept_slider])

### Function

In [None]:
# Function will generate a line 
# based on the value of m and b
def generate_a_line(m, b):
    
    # Generate 100 points on a scale between -10 and 10
    x = np.linspace(-10, 10, 100)
    
    # Calculate the value of y
    y = m * x + b
    
    # Graph the line
    fig = px.line(x = x,
                  y = y,
                  
                  #markers = True,
                 )
    
    # Set the range of the X-axis
    fig.update_xaxes(range = [-10, 10])
    
    # Set the range of the Y-axis
    fig.update_yaxes(range = [-10, 10])
    
    # Updated the gridspace
    fig.update_layout(height = 600,
                      width = 800,
                      
                      xaxis = dict(tickmode = 'linear',
                                   tick0 = -10,
                                   dtick = 1
                                  ),
                      
                      yaxis = dict(tickmode = 'linear',
                                   tick0 = -10,
                                   dtick = 1
                                  ),
                      
                      title_text = 'Using the equation of a line',
                      
                      dragmode = 'drawline',
                     )
    
    # Add annotation capabilities to the visualization
    fig.show(config = {'modeBarButtonsToAdd':['drawline',
                                              'drawopenpath',
                                              'drawclosedpath',
                                              'drawcircle',
                                              'drawrect',
                                              'eraseshape'
                                             ]
                      }
            )

### Creating an output widget

In [None]:
# Object requires the function, and a dictionary that maps the controls
# to the inputs needed in the function
# https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html?highlight=interactive_output#More-control-over-the-user-interface:-interactive_output
out_line = widgets.interactive_output(generate_a_line, {'m' : slope_slider, 'b' : intercept_slider})

## Example 2 Drawing a line

### Equation of a line

The best way to understand regression is to first have a basic review of the equation of a line. 

#### Line Equation

A line is defined by the following equation:

$y = mx + b$

Where the variables are:

$y$ = dependant variable. The value of this variable depends on the value of x

$m$ = slope of line

$x$ = independant variable (a valid number that satisfies the line equation)

$b$ = y - intercept (where the line crosses the y-axis)

In [None]:
# Display the graph where a user can enter points and it displays a line
display(ui_line, out_line )

## Observations

1. You need two points in order to calculate a line segment
2. Once you have a line segment, you can calculate the slope and y-intercept
3. The slope and y-intercept are co-efficients that can be used to generate any valid point `y`, for any valid `x` found on that line
4. Once you know the slope and y-intercept, you can calculate any point along that line, or predict if a point is on the line or not

**Congradulations** You've identified your first regression pattern. How do you use this information to make predictions on what points lie on this line?

## Additional Resources

The following are some additional resources to further investigate lines, and the line equation:


* [Equation of a Straight Line](https://www.mathsisfun.com/equation_of_line.html)
* [Explore the Properties of a Straight Line Graph](https://www.mathsisfun.com/data/straight_line_graph.html)
* [Equation of a Line](https://byjus.com/maths/general-equation-of-a-line/)
* [How to Find the Equation of a Line](https://www.wikihow.com/Find-the-Equation-of-a-Line)

# Example 3 - An Introduction into linear regression

## Python Libraries

The following python libraries are required to run this notebook:

In [None]:
# Library to handle array and array-like objects
import numpy as np

In [None]:
# Library to handle tabluar data
import pandas as pd

In [None]:
# Library to create and use inter-active tools
import ipywidgets as widgets

In [None]:
# Plotly Express graphing library
import plotly.express as px

In [None]:
# Library used to generate a linear regression dataset
from sklearn.datasets import make_regression

## Helper functions

### Interactive Controls

The following `widgets` will be used for Example \#3, in creating a line from a set of points:

#### Sliders to Control Dataset Generation

In [None]:
# Slider to control the number of samples
samples_slider = widgets.IntSlider(value = 100,
                                   min = 100,
                                   max = 500,
                                   step = 5,
                                   description = '# of Samples: ',
                                   disabled = False,
                                   continuous_update = False,
                                   orientation = 'horizontal',
                                   readout = True,
                                   readout_format = 'd'
                                  )

In [None]:
# Slider to control the amount of "noise" in the dataset
noise_slider = widgets.FloatSlider(value = 0,
                                   min = 0,
                                   max = 100,
                                   step = 0.1,
                                   description = 'Noise: ',
                                   disabled = False,
                                   continuous_update = False,
                                   orientation = 'horizontal',
                                   readout = True,
                                   readout_format = '.1f',
                                  )

In [None]:
# Checkbox that shows a trendline when selected
trend_line_selector = widgets.Checkbox(value = False,
                                       description = 'Trendline',
                                       disabled = False,
                                       indent = False
                                      )

### Organizing controls

In [None]:
# Combine these controls into on common grouping
ui_regression = widgets.VBox([samples_slider, noise_slider, trend_line_selector])

### Function

In [None]:
# Function to generate a data set and then plot
# that a user can select number of samples and noise
def linear_regression(noise, samples, trend_line):
    
    data, target, coefficients = make_regression(n_samples = samples,        # The number of samples
                                                 n_features = 1,         # The number of features
                                                 n_informative = 2,      # The number of informative features, i.e., the number of features used to build the linear model used to generate the output
                                                 n_targets = 1,          # The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar
                                                 bias = 0,               # The bias term in the underlying linear model
                                                 effective_rank = None,
                                                 tail_strength = 0.5,    # The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None. When a float, it should be between 0 and 1
                                                 noise = noise,            # The standard deviation of the gaussian noise applied to the output
                                                 shuffle = True,         # Shuffle the samples and the features
                                                 coef = True,            # If True, the coefficients of the underlying linear model are returned
                                                 random_state = 42       # Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls
                                                )       
    
    # Convert the dataset into a dataframe
    df = pd.DataFrame({'data': data[:, 0],
                       'Cofficient': coefficients,
                       'target': target
                      }
                     )
    # Display a trend line
    # when selected
    if trend_line == False:
        trend = None
    else:
        trend = 'ols'
    
    # Plot the dataset
    fig = px.scatter(df,
                     x = 'data',
                     y = 'target',
                     trendline = trend, # 'ols', 'rolling' 
                     opacity = 0.5,
                     color_discrete_sequence = ["black"],
                     )
    
    # Updated the gridspace
    fig.update_layout(height = 600,
                      width = 800,
                                        
                      title = 'Linear Regression',   
                      
                      dragmode = 'drawline',
                     )
    
    # Add annotation capabilities to the visualization
    fig.show(config = {'modeBarButtonsToAdd':['drawline',
                                              'drawopenpath',
                                              'drawclosedpath',
                                              'drawcircle',
                                              'drawrect',
                                              'eraseshape'
                                             ]
                      }
            )

### Creating an output widget

In [None]:
# Object requires the function, and a dictionary that maps the controls
# to the inputs needed in the function
# https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html?highlight=interactive_output#More-control-over-the-user-interface:-interactive_output
out_regression = widgets.interactive_output(linear_regression, {'noise' : noise_slider, 'samples' : samples_slider, 'trend_line' : trend_line_selector})

## Example 3 Linear Regression

The purpose of `linear regression` is to find a **line of best fit**. This **line** can then be used to predict future values of $\hat{y}$ (Y-hat) based on determinine the values of the `y-intercept` and `slope`.  

### Regression Equation

$\hat{y} = b_0 + b_1x$

Where:

$\hat{y}$ = Predicted target value (dependant variable)

$b_0$ = Intercept of the regression line

$b_1$ = Slope of regression line

$x$ = (independant variable)

$b_0$ and $b_1$ are the co-efficients that are determined in order to find the **line of best fit**.

In [None]:
# Display the graph where a user can enter points and it displays a line
display(ui_regression, out_regression)

## How does linear regression work?

Linear regression works by utilizing the `ordinary least square` method of determing the line of best fit. This uses `estimators` to determine the value of the `coefficients`. The `Trendline` function found on the graph uses `OLS` to draw the best fit line.

## Additional Resrouces

* [An Introduction to Linear Regression Analysis - statisticfun](https://youtu.be/zPG4NjIkCjc)
* [Linear Regression and Linear Models - StatQuest](https://youtube.com/playlist?list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU)
* [How to perform simple linear regression in python - Statology](https://www.statology.org/simple-linear-regression-in-python/)
* [Intuitions on linear models - scikit-learn video](https://inria.github.io/scikit-learn-mooc/linear_models/linear_models_slides.html)
* [Ordinary Least Square (OLS) Method for Linear Regression - Medium Article](https://medium.com/analytics-vidhya/ordinary-least-square-ols-method-for-linear-regression-ef8ca10aadfc)

# Example 4 - An Introduction into linear regression

## Python Libraries

The following python libraries are required to run this notebook:

In [None]:
# Library to handle array and array-like objects
import numpy as np

In [None]:
# Library to handle tabluar data
import pandas as pd

In [None]:
# Library to create and use inter-active tools
import ipywidgets as widgets

In [None]:
# Plotly Express graphing library
import plotly.express as px

In [None]:
import plotly.graph_objects as go

In [None]:
# Creates the diagrams of the ML diagrams
from sklearn import set_config

set_config(display = "diagram")

In [None]:
# Library used to generate a linear regression dataset
from sklearn.datasets import make_regression

In [None]:
# Library used in ML pipeline building
from sklearn.pipeline import make_pipeline

In [None]:
# Library for feature pre-processing
from sklearn.preprocessing import StandardScaler

In [None]:
# Library used to create a linear regresion line
from sklearn.linear_model import LinearRegression

## Helper functions

### Interactive Controls

The following `widgets` will be used for Example \#3, in creating a line from a set of points:

#### Sliders to Control Dataset Generation

In [None]:
# Slider to adjust the sample size
samples_slider1 = widgets.IntSlider(value = 100,
                                   min = 100,
                                   max = 500,
                                   step = 5,
                                   description = '# of Samples: ',
                                   disabled = False,
                                   continuous_update = False,
                                   orientation = 'horizontal',
                                   readout = True,
                                   readout_format = 'd'
                                  )

In [None]:
# Slider to adjust the amount of noise in the dataset
noise_slider1 = widgets.FloatSlider(value = 0,
                                   min = 0,
                                   max = 100,
                                   step = 0.1,
                                   description = 'Noise: ',
                                   disabled = False,
                                   continuous_update = False,
                                   orientation = 'horizontal',
                                   readout = True,
                                   readout_format = '.1f',
                                  )

In [None]:
# Allows the user to select seeing a trendline
trend_line_selector1 = widgets.Checkbox(value = False,
                                       description = 'Trendline',
                                       disabled = False,
                                       indent = False
                                      )

#### Regression trendline

In [None]:
# Allows the user to select seeing a trendline
# when displaying the regression lines
trend_line_selector2 = widgets.Checkbox(value = False,
                                       description = 'Trendline',
                                       disabled = False,
                                       indent = False
                                      )

### Organizing controls

In [None]:
# Combine these controls into on common grouping
ui_regression = widgets.VBox([samples_slider1, noise_slider1, trend_line_selector1])

### Function

In [None]:
# Function will generate a dataset that a user can 
# interact with by changing the number of samples
# or the noise.
def linear_regression(noise, samples, trend_line):
    # Create a dataset based on user input on samples and noise
    data, target, coefficients = make_regression(n_samples = samples,        # The number of samples
                                                 n_features = 1,         # The number of features
                                                 n_informative = 2,      # The number of informative features, i.e., the number of features used to build the linear model used to generate the output
                                                 n_targets = 1,          # The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar
                                                 bias = 0,               # The bias term in the underlying linear model
                                                 effective_rank = None,
                                                 tail_strength = 0.5,    # The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None. When a float, it should be between 0 and 1
                                                 noise = noise,            # The standard deviation of the gaussian noise applied to the output
                                                 shuffle = True,         # Shuffle the samples and the features
                                                 coef = True,            # If True, the coefficients of the underlying linear model are returned
                                                 random_state = 42       # Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls
                                                )       
    
    # Convert the dataset into a dataframe
    df = pd.DataFrame({'data': data[:, 0],
                       'Cofficient': coefficients,
                       'target': target
                      }
                     )
    
    # Display a trend line
    # when selected
    if trend_line == False:
        trend = None
    else:
        trend = 'ols'
     
    # Generate a scatterplot
    fig = px.scatter(df,
                     x = 'data',
                     y = 'target',
                     trendline = trend, # 'ols', 'rolling'
                     title = 'Linear Regression',   
                     opacity = 0.5,
                     color_discrete_sequence = ["black"],
                     height = 600,
                     width = 800,
                    )

    fig.show()

### Creating an output widget

In [None]:
# Object requires the function, and a dictionary that maps the controls
# to the inputs needed in the function
# https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html?highlight=interactive_output#More-control-over-the-user-interface:-interactive_output
out_regression = widgets.interactive_output(linear_regression, {'noise' : noise_slider1, 'samples' : samples_slider1, 'trend_line' : trend_line_selector1})

## Example 4 Linear Regression using `scikit-learn`

### Generating a Dataset

The purpose of `linear regression` is to find a **line of best fit**. This **line** can then be used to predict future values of $\hat{y}$ (Y-hat) based on determinine the values of the `y-intercept` and `slope`.  


In [None]:
# Display the graph where a user can enter points and it displays a line
display(ui_regression, out_regression)

VBox(children=(IntSlider(value=100, continuous_update=False, description='# of Samples: ', max=500, min=100, s…

Output(outputs=({'output_type': 'display_data', 'data': {'text/html': '        <script type="text/javascript">…

If you're happy with the dataset you want generated, run the code below to use this dataset in training a machine learning model.

In [None]:
# Generate the dataset that will be used to train a model
data, target, coefficients = make_regression(n_samples = samples_slider1.value,        # The number of samples
                                             n_features = 1,         # The number of features
                                             n_informative = 2,      # The number of informative features, i.e., the number of features used to build the linear model used to generate the output
                                             n_targets = 1,          # The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar
                                             bias = 0,               # The bias term in the underlying linear model
                                             effective_rank = None,
                                             tail_strength = 0.5,    # The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None. When a float, it should be between 0 and 1
                                             noise = noise_slider1.value,            # The standard deviation of the gaussian noise applied to the output
                                             shuffle = True,         # Shuffle the samples and the features
                                             coef = True,            # If True, the coefficients of the underlying linear model are returned
                                             random_state = 42       # Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls
                                            )   

### Convert data into a dataframe

This step is to inspect the data, you don't really need to do this step

In [None]:
# Convert the dataset into a dataframe
df = pd.DataFrame({'data': data[:, 0],
                   'Cofficient': coefficients,
                   'target': target
                  }
                 )

df

Unnamed: 0,data,Cofficient,target
0,-0.309212,37.558295,2.947671
1,-0.074446,37.558295,-9.750643
2,0.812526,37.558295,32.113964
3,0.227460,37.558295,17.683108
4,-0.562288,37.558295,32.634748
...,...,...,...
210,-0.192361,37.558295,-2.638739
211,1.083051,37.558295,37.593307
212,-0.115648,37.558295,-25.326865
213,2.190456,37.558295,66.917071


### Build a regression model

In [None]:
# Build a linear regression model
lr_rgrs_model = make_pipeline(StandardScaler(),
                              LinearRegression()
                              )

lr_rgrs_model

### Train a Regression Model

In [None]:
# Fit the model to the data and target
lr_rgrs_model.fit(data, target)

### Get model co-efficient and intercept

In [None]:
# Get the model calculated co-efficient. We only have one feature, therefore there is only one co-efficient
model_coefficient = lr_rgrs_model.named_steps.linearregression.coef_[0]

model_coefficient

35.376856157675476

In [None]:
# Get the model caluclated intercept.
model_intercept = lr_rgrs_model.named_steps.linearregression.intercept_

model_intercept

3.2601074088143323

### Generate an array of x and y values

In [None]:
# Create an object to store the range of flipper lengths in ascending order
x_range = np.linspace(data.min(), # Set the min number
                      data.max(), # Set the max number
                      num = samples_slider1.value,   # Total number of data points
                     )

You can now use these x-values to calculate the line of best fit for both the linear regression model, and based on the co-efficient generated by the dataset.

#### Linear Regression Model

In [None]:
# store the predicted body masses in an object that can be used in plotting results
y_hat_model = (model_coefficient * x_range + model_intercept)

#### Dataset generated co-efficient

In [None]:
# store the predicted body masses in an object that can be used in plotting results
y_hat_dataset = (coefficients * x_range + 0)

### Graphing the regression lines

The following graph will allow you to compare the lines generated by the linear regression model, in addition to the line generated using the coefficient provided by the dataset generator.

In [None]:
# Graph that displays each line of best fit, for both the regression model 
# and the co-efficient provided by the dataset generator
@widgets.interact
def regressor_outputs(trend_line = trend_line_selector2):
    #Show trendline if selected
    if trend_line == False:
        trend = None
    else:
        trend = 'ols'
    
    # Create a scatterplot of the data
    fig = px.scatter(df,
                     x = 'data',
                     y = 'target',
                     trendline = trend, # None, `ols', 'lowess', 'rolling', 'expanding', 'ewm'
                     #trendline_color_override = 'darkblue',
                     color = None,
                     opacity = 0.5,
                     color_discrete_sequence = ["black"],
                     height = 800,
                     width = 1200,
                     title = "Lines generated using Co-efficients",
                    )

    # Add the best fit based on the regresion model co-efficients
    fig.add_trace(go.Scatter(x = np.ravel(x_range),
                             y = np.ravel(y_hat_model),
                             mode = 'lines',
                             name = 'Model Generated Regression Line',
                             showlegend = True,
                             line = dict(color = "Red", width = 3)
                             )
                 )

    # Add the best fit based on the dataset generated co-efficient
    fig.add_trace(go.Scatter(x = np.ravel(x_range),
                             y = np.ravel(y_hat_dataset),
                             mode = 'lines',
                             name = 'Dataset generaged coefficients',
                             showlegend = True,
                             line = dict(color = "Green", width = 3)
                             )
                 )


    fig.show()

interactive(children=(Checkbox(value=False, description='Trendline', indent=False), Output()), _dom_classes=('…

# Conclusion

The goal of `regression` is to find the `line of best fit` through the data that satisfies the following equation:

$\hat{y} = b_0 + b_1x$

$\hat{y}$ = Predicted target value (dependant variable)

$b_0$ = Intercept of the regression line

$b_1$ = Slope of regression line

$x$ = (independant variable)

This is done by determining the values of two co-efficients:

> $b_0$ = Intercept of the regression line

> $b_1$ = Slope of regression line

In the next noteook, we will futher explore how a regression model works.

# Additional Resources

* [Linear Models - scikit-learn](https://inria.github.io/scikit-learn-mooc/linear_models/linear_models_module_intro.html)
* [Crash Course Regression - CrashCourse](https://www.youtube.com/watch?v=WWqE7YHR4Jc)
* [Introduction to Linear Regression - CFA](https://youtu.be/6ouI_27iDVM)
* [The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression) - Statquest](https://youtu.be/PaFPbb66DxQ)