# Machine Learning Tutorial: Linear Regression Prediction

In this tutorial I am predicting home prices using linear regression. We use training data that has home areas in square feet and corresponding prices and train a linear regression model using sklearn linear regression class. 

Later, I used a linear regression model to predict the per capita income in Canada in 2020! 

In [None]:
from urllib.request import urlretrieve
urlretrieve("https://raw.githubusercontent.com/codebasics/py/master/ML/1_linear_reg/homeprices.csv", "homeprices.csv")

In [None]:
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('homeprices.csv')

In [None]:
df

In [None]:
%matplotlib inline
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='o')

In [None]:
# help(df.drop)

In [None]:
new_df = df.drop('price',axis='columns')
new_df
# This is my predictor variable. 

In [None]:
price = df.price
price

In [None]:
help(linear_model.LinearRegression())

In [None]:
# help(reg.fit)

In [None]:
# Create linear regression object
reg = linear_model.LinearRegression()
# for fit function, fit(X, y, sample_weight=none)
# Where X is the training data and y are target values
reg.fit(new_df,price)

#### Now let's do a few examples: 
Predict the price of a home with area of 3300 square feet:

In [None]:
help(reg.predict)
reg.predict([[3300]])

For a house of 3300 square feet, I'm predicting that the price is $628,715

### Predict the price of a home with area = 5000 sq ft

In [None]:
reg.predict([[5000]])

### Predict prices of homes for an array of multiple areas

In [None]:
# First read this in:
urlretrieve("https://raw.githubusercontent.com/codebasics/py/master/ML/1_linear_reg/areas.csv", "areas.csv")
areas = pd.read_csv('areas.csv')

In [None]:
areas

In [None]:
p = reg.predict(areas)
p

This is messy to look at, so let's organize all our data into neatly organized CSVs:

In [None]:
# Insert a new column with the predicted prices of these data:
areas['prices'] = p
areas
# If I wanted to continue on to save these: 
# areas.to_csv("prediction.csv")

### Exercise
Predict canada's per capita income in year 2020. There is an exercise folder here on github at same level as this notebook, download that and you will find canada_per_capita_income.csv file. Using this build a regression model and predict the per capita income fo canadian citizens in year 2020

In [None]:
urlretrieve("https://raw.githubusercontent.com/codebasics/py/master/ML/1_linear_reg/Exercise/canada_per_capita_income.csv", "canada_per_capita_income.csv")


In [None]:
canada_per_capita_income = pd.read_csv("canada_per_capita_income.csv")
# Now let's take a quick look:
canada_per_capita_income

In [None]:
# I was annoyed because one of the column names has spaces and it makes it annoying to deal with that variable, so changing spaces to underscores:
canada_per_capita_income.columns = [c.replace(' ', '_') for c in canada_per_capita_income.columns]
canada_per_capita_income.columns

# This is still difficult to work with....

In [None]:
canada_per_capita_income.rename(columns={'per_capita_income_(US$)': 'income'}, inplace=True)
canada_per_capita_income.columns

In [None]:
# A quick visualizaiton with variables that are finally super easy to access:
%matplotlib inline
plt.xlabel('year')
plt.ylabel('per capita income (US$)')
plt.scatter(canada_per_capita_income.year,canada_per_capita_income.income,color='red',marker='o')

In [None]:
NEW_canada = canada_per_capita_income.drop('income',axis='columns')
# Now New_canada should be a dataframe with only the years. Let's see the first 3 entries:
NEW_canada.head(3)

In [None]:
income = canada_per_capita_income.income
# First three entries of the new variable, 'income'
income.head(3)

### To recall, my goal is to predict income in the year 2020 using linear regression:

In [None]:
help(reg.fit)
# where the first entry is the training data (here my years) 
# And the second entry is what I am trying to predict, Income

In [None]:
# Create a linear regression object
reg_canada = linear_model.LinearRegression()
reg_canada.fit(NEW_canada,income)

In [None]:
reg_canada.predict([[2020]])

### Answer: 
In 2020, the linear model predicts that the per capita income in Canada will be $41,288