# Single Variable Regression

In this assignment you will learn how to 
1. Use professional data science tools to perform a single variable linear regression
2. Use matrix math to write your own version of a single variable linear regression

### Run Each Code box as you go through the note 
- Click the arrow 
- Or Shift+Enter on the box
- If things don't look right or there are errors look at the menu Kernal->Restart & Clear Output to Start Over

## Using SKLEARN Library
In order to access some function in our code we need to load the libraries into our program
- Numpy Library so we can use matrices
- Sklearn Libary so we can do the linear regression
- matplotlib so we can plot the data



In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
%matplotlib inline 

- Here is some sample data that we are going to use in our program
- The data is stored in regular python lists

In [None]:
x = [0,1,2,3,4,5,6]
y = [4.2,7.5,9,12.3,14,19.2, 22]
print(x)
print(y)


It is usually a good idea to look at the data before you do anything else to determine if the data is even a good candidate for a linear regression. 

We can use the plot command from the matplotlib to accomplish our task.  You need to give it two lists with the same size, and if you want a scatter plot, put the 'o' as the marker after the list variable



In [None]:
plt.plot(x,y, 'o')

Looking at the data below it looks like a line of best fit will give us pretty good results if trying to predict future values.


Before we can use the linear regression functions from the SKLEARN Library, we need to convert the data from a list to a Column Matrix.

np.matrix(list) will convert to a row matrix, and its transpose np.matrix(list).T will make it a column matrix



In [None]:
x = np.matrix(x).T
y = np.matrix(y).T
print(x)
print(y)

We start the linear regression using a function called LinearRegression and store all its properties in a variable called lm.  We can then use that variable to perform tasks on the linear regression

In [None]:
lm = LinearRegression()

Now we can build the line of best fit and get the theta values.

In [None]:
lm.fit(x,y)
theta0 = lm.intercept_[0]
theta1 = lm.coef_[0][0]
print("The equation that best models the data set is:")
print("y=",theta0, "+", theta1,"X")

The following code will allow you to draw the line of best fit on the data.  
- This is accomplished by creating a list of x and y values for the equation y = theta0 + theta1*x

We need to know the range of our data to build that list which is why we find the minimum and max values for our data set

In [None]:
lbfY = []
lbfX = []
xMin = int(np.min(x))
xMax = int(np.max(x))
for i in range(xMin,xMax+1):
    lbfY.append(theta0 + theta1*i)
    lbfX.append(i)
plt.plot(x,y,'o')
plt.plot(lbfX,lbfY)


Make some Predictions about future data using the model

In [None]:
#Find the y when x = 7
p1 = lm.predict([[7]])
print(p1)

#Find the y when x = 3
p2 = lm.predict([[3]])
print(p2)

# Question #1

Fill in the code block below so it accomplishes the following tasks

1.  Perform a linear regression on the data and print out the equation (y= theta0 + theta1x)
2.  Plot the original data as a scatter plot and then the line of best fit on the same axis
5.  Comment in a few words on how well you think your model does
6.  Predict some future values of y when x = 30, x = 55, x = 75
  

In [None]:
#Code to read data from a text file
#Make sure that text file is in the same directory as this notebook file
import os
path = os.getcwd() + '\data.txt'

x=[]
y=[]
file = open(path,"r")
while True:
    line = file.readline()
    if not line:
        break
    else:
        values = line.split(",")
        x.append(float(values[0]))
        y.append(float(values[1].rstrip("\n")))
file.close()

#---Type your solution here-------------------------------

#Code for linear Regression



#Code for Graphs



"""
Comments on how well the model does


"""

#Code for Predictions



# Question #2 - Food Truck Data

Writing our own version of a Linear Regression Algorithm using Matrices

Make sure you have **looked at the Presentation** online and also **Read the pdf file** that you downloaded with this assignment that explains the matrix math in a little more detail

Follow along the steps below and complete the code when asked


About The Data Set:

Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. The data represents 1000's of people and 1000's of dollars

### Instructions


Fill two lists named x and y with the data from **FoodTruck.txt**
- x represents Population
- y respresents Profit



In [None]:
#---Type your solution here-------------------------------



Make a scater plot of the data

In [None]:
#---Type your solution here-------------------------------


Make an initial guess for the value of theta0 and theta1 by looking at your plot

In [None]:
#--- Model Parameters ---------------------------------------------------
#THESE VALUES YOU CAN CHANGE TO SEE HOW THE RESULTS OF YOUR REGRESSION CHANGE

#Initial Guess for Weights in hypothesis function
theta0 = 
theta1 = 

#learning rate 
alpha = 0.01

#convergence so we know when to stop (0.1 -> converage after 1 decimal spot, 0.01 converage after 2 decimal spots)
conv = 0.001

#-----------------------------------------------------------------------

Find the length of the x list using the len(listName) command.  Store the result in a variable called m

In [None]:
#---Type your solution here-------------------------------
m = 

Convert the x and y lists to a column matrix

In [None]:
#---Type your solution here-------------------------------
x = 
y =

Create a (2 x 1) column matrix containing the values from theta0 and theta1.  Call that matrix theta

In [None]:
#---Type your solution here-------------------------------
theta = 

Create an (m x 1) column matrix filled with ones.  Call the matrix ones.
- You can use the function np.ones( (#ofRows, #ofColumns) ) to accomplish this

In [None]:
#---Type your solution here-------------------------------
ones =

Stack the ones column matrix before the x matrix to make a new (m x 2) matrix called x
- You can use the function np.hstack(col1MatrixName, col2MatrixName) to accomplish this
- Note:  If you run this code more than once you will keep stacking 1's into the matrix
- Print x after you run it to make sure everything is ok

In [None]:
#---Type your solution here-------------------------------
x = 

This next section is going to do the gradient descent algorithm to minimize the cost.  Fill in the missing code under the comments

In [None]:
previousCost = 0
numSteps = 0
while True:
    
    #Counts how many times the gradient descent takes to reach the minimum cost
    numSteps = numSteps + 1
    
    #create the hypothesis matrix
    #Its the dot product betwween the x matrix and the Theta matrix
    hypothesis =   
    
    #create the error matrix
    #Its the hypothesis matrix subtracted by the y matrix
    error = 
    
    #Calculate the sum of the errors squared
    #Its the dot product between the transpose of the error matrix and the non transposed error matrix
    totalError = 
    
    #Calculate the cost
    #Its 1/(2m) multiplied by the total error
    cost = 
    
    #Checking if the cost has converged to less than the conversion parameter
    if abs(previousCost - cost) <= conv:
        break
    else:
        #Calculate new alpha error for theta0
        #Its the dot product of the transposed error matrix with the first column of the x matrix x[:,0]
        alphaError0 = 
        
        #Calculate new alpha error for theta1
        #Its the dot product of the transposed error matrix with the second column of the x matrix x[:,1]        
        alphaError1 = 
        
        #Calculate the new theta0 value
        #Its the theta0 value - alpha/m multiplied by alphaError0
        theta0 = 
        
        #Calculate the new theta1 value
        #Its the theta1 value - alpha/m multiplied by alphaError1
        theta1 =
    
        #Put the new theta values into the theta matrix
        theta = np.matrix([[theta0[0,0]],[theta1[0,0]]])
        
        #reset the previous cost for next comparison
        previousCost = cost

Print the value of:  
-  numSteps
-  theta0
-  theta1
-  cost

In [None]:
#---Type your solution here-------------------------------





Make comments on what happens when you change the parameters at the start of the program
- What happens when you change theta0 and theta1?
- What happens when you increase alpha?  decrease alpha?
- What happens when you increase conv?  decrease conv?

You will have to rerun all the code cells in Question2 to see the results of your changes

In [None]:
"""
---Type your comments here-------------------------------



"""

Choose acceptable initial parameters and then make a prediction of the Profit for a Population of 55000 people 
- Remember the population # in the data set is in 1000's of people
- Remember the model is theta0 + theta1*x

Print the answer

In [None]:
#---Type your solution here-------------------------------





You've Finished Your First Regression Coding Assignment
- I recommend going to the Kernal Menu -> Restart & Run All
    - This will make sure there is no weird errors going on that sometimes occur with Jupyter Notebooks
- Save and upload this document to be graded