# Linear Regression 
## Part 1 - What is it?
<hr>

<img style="float:left" src="images/LinearRegressionMeme.jpg">

## Introduction

Understanding linear regression is a prerequisite to understanding more complex machine learning models.  
This is because linear regression is the simplest model-based machine learning method.  
But what exactly is a "model"?

### What are models?  

A model is an (often simplified) representation of phenomena that occurs in the real world.  
A statistical or machine learning model often requires data related to the phenomena that it is trying to model.

### What is regression?  

Simply put, regression is the study of dependance between variables.  
More often than not, we are more interested in a specific variable which we call the **response variable**.  
We often use one or more **explanatory variables** in order to study their relationship to the response variable, the one we are more interested in.  

### Example

Suppose I am interested in understanding why trees have a specific volume.  
The relationship between the volume and other properties of the tree can be modeled.  
There are many forces of nature such as gravity, carbon dioxide levels, minerals in the soil, etc that affects what a tree's volume might be. In "modeling" this relationship, we are greatly simplifying the true relationship by only trying to describe a tree's volume using the data that we have at our disposal.  


In this case, since I am specifically interested in the volume, we would choose this as our **response variable**.  
The **explanatory variables** would be whatever data I have related to the girth, such as the height or volume of the tree.  

### What does regression help us do?  

Regression helps us understand the relationship between the response variable and the explanatory variables.  
After we have created the linear regression model, we can use it to predict response variables using the explanatory variables.

<br>

## What is Linear?

Now we understand that models help us create simpler versions of things that exist in the real world using data.  
We also understand that Linear Regression is the simplest model that we can create.  
But what does the **"linear"** part in linear regression mean?  

<small><i>You can probably go on wikipedia and read a much more detailed explanation of the distinction, but we will briefly talk about it here.</i></small>

### Linear Functions

In the simplest of terms, linear functions are functions whose graph is a straight line.  
This means that the function will have a fixed "slope" or derivative as the x value changes.  

<small><a href="https://en.wikipedia.org/wiki/Linear_function" target="_blank">Read more</a></small>  

<img src="images/LinearFunctionsExamples.png">

Here are some examples of linear functions.  
You can see that all these functions essentially are straight lines.  
The amount that the y value changes with respect to the x value is constant throughout the entire graph. 

<br>

### NonLinear Functions

With the lack of a better definition, NonLinear functions are all functions that are not linear.  
A more mathematical description would be functions where the output variable y is not proportional to the input variable.  
This means that the change in y per change in x is different on different points of the graph, resulting in a graph that does not look like a line.  

<small><a href="https://en.wikipedia.org/wiki/Nonlinear_system" target="_blank">Read more</a></small>  

<img src="images/NonLinearFunctionsExamples.png">

<br>

### What does this mean for Linear Regression?

This means that we can use linear regression in order to model relationships between variables that are approximately linear.  
Of course in practice, we never see data that is perfectly linear. However, whether or not a linear model is feasible for your use case shouldn't be too hard to determine as seeing a linear pattern is not too hard.

<br>

## Lines

You might remember the concept of a line equation from middle school.  
It has the following formula

### y = mx + b

The **(m)** represents the slope of the line.  
This describes how steep the curve is.  
<pre>
0 is a straight line (not steep at all)
1 is the vertical line ( the most steep case)
</pre>  
The negative slopes are flipped with respect to to the x axis.    

<img src="images/SlopeExamples.png">

This formula is deeply important when we are talking about regression because any model we end up creating with linear regression will use this formula with the **parameters** it has learned. This process is called **training** the model.   

Models have parameters which they use to calculate the predicted response.  
In the case of linear regression, we have 2 parameters: the slope (m), and the intercept (b).  
These are also the 2 constant values we need in our well known equation y = mx + b



**Parameters**

Parameters are tricky because they mean different things in different parts of the process.  
when **training** the model,
<pre>The parameters are values we want to find using the explanatory and response variables.</pre>