# Linear Regression with Python

Your neighbor is a real estate agent and wants some help predicting housing prices for regions in the USA. It would be great if you could somehow create a model for her that allows her to put in a few features of a house and returns back an estimate of what the house would sell for.

She has asked you if you could help her out with your new data science skills. You say yes, and decide that Linear Regression might be a good path to solve this problem!

Your neighbor then gives you some information about a bunch of houses in regions of the United States,it is all in the data set: USA_Housing.csv.

The data contains the following columns:

'Avg. Area Income': Avg. Income of residents of the city house is located in.
'Avg. Area House Age': Avg Age of Houses in same city
'Avg. Area Number of Rooms': Avg Number of Rooms for Houses in same city
'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for Houses in same city
'Area Population': Population of city house is located in
'Price': Price that the house sold at
'Address': Address for the house

### Data Aquisition and Clean-up

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("USA_Housing.csv")
df.head()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.columns

### Exploratory Data Analysis

In [None]:
sns.pairplot(df, palette='viridis')

In [None]:
sns.distplot(df['Price'])

In [None]:
sns.heatmap(df.corr())

### Training the Linear Regression Model

In [None]:
columns = df.columns
columns

##### Setting the target and predictors

In [None]:
# Set the target column
target = 'Price'
y = df[target]


In [None]:
# Get the predictor columns
droplist = [target, 'Address']
predictors = df.drop(droplist, axis=1).columns
X = df[predictors]


In [None]:
X.head()

##### Splitting the data into training and test data

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=100)

In [None]:
X_train

In [None]:
y_train

In [None]:
X_test

In [None]:
y_test

##### Build and training the Linear Regression Model

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lm = LinearRegression()

In [None]:
lm.fit(X_train, y_train)

In [None]:
lm.get_params(deep=True)

In [None]:
lm.intercept_

In [None]:
lm.coef_

In [None]:
coeff_df = pd.DataFrame(lm.coef_, predictors, columns=['Coefficients'])
coeff_df

##### Testing the model by using the test data for predictions

In [None]:
predictions = lm.predict(X_test)


In [None]:
plt.scatter(y_test, predictions)

In [None]:
sns.displot((y_test-predictions), bins=50, kde=True)

##### Evaluation metrics

In [None]:
from sklearn import metrics

In [None]:
print('MAE   --> ', metrics.mean_absolute_error(y_test, predictions))
print('MSE   --> ', metrics.mean_squared_error(y_test, predictions))
print('RMSE  --> ', np.sqrt(metrics.mean_squared_error(y_test, predictions)))