# Linear Regression

### USA housing data
##### The data contains the following columns:
 - 'Avg. Area Income': Avg. Income of residents of the city house is located in.
 - 'Avg. Area House Age': Avg Age of Houses in same city
 - 'Avg. Area Number of Rooms': Avg Number of Rooms for Houses in same city
 - 'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for Houses in same city
 - 'Area Population': Population of city house is located in
 - 'Price': Price that the house sold at
 - 'Address': Address for the house

### Import data from CSV file

In [None]:
import pandas as pd
USAhousing = pd.read_csv('USA_Housing.csv')

In [None]:
USAhousing.head()
USAhousing.info()
USAhousing.describe()
USAhousing.columns

### Visualize data

In [None]:
import seaborn as sns
sns.pairplot(USAhousing)
sns.distplot(USAhousing['Price'],kde=False)
sns.heatmap(USAhousing.corr())

### Training a Linear Regression Model

##### define training data set and test data set

In [None]:
from sklearn.model_selection import train_test_split
X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms', 'Avg. Area Number of Bedrooms', 'Area Population']]
y = USAhousing['Price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

##### create model

In [None]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)

### View the result

In [None]:
print(lm.intercept_)

In [None]:
print(lm.coef_)

In [None]:
coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])

In [None]:
coeff_df

##### predict

In [None]:
predictions = lm.predict(X_test)

In [None]:
import matplotlib.pyplot as plt
plt.scatter(y_test,predictions)

In [None]:
sns.distplot((y_test-predictions),kde=False,bins=10)

##### calculate error

In [None]:
import numpy as np
from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))