## Practica de regresion lineal utilizando Scikit Learn

Cargar y explorar el dataset housing.txt disponible en la carpeta data, la descripcion del conjunto de datos se encuentra disponible en: https://archive.ics.uci.edu/ml/datasets/Housing

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('data/housing.data.txt',header=None, sep='\s+')

In [None]:
df.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']

In [None]:
df.head()

### Realizar Analisis exploratorio de los datos

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
import matplotlib.pyplot as plt

Instalar Seaborn, una libreria de python basado en matplotlib para graficos.

In [None]:
!pip install seaborn

In [None]:
import seaborn as sns

In [None]:
sns.pairplot(df[df.columns], size=2.5)

In [None]:
columnas = ['LSTAT', 'INDUS', 'NOX', 'RM', 'MEDV']

In [None]:
sns.pairplot(df[columnas], size=2.5)

In [None]:
sns.distplot(df['MEDV'])

In [None]:
sns.heatmap(df.corr())

In [None]:
cm = np.corrcoef(df[columnas].values.T)

In [None]:
sns.set(font_scale=1.5)

In [None]:
hm = sns.heatmap(cm, cbar= True, annot=True, square=True, fmt='.2f', annot_kws= {'size': 15}, yticklabels= columnas, xticklabels=columnas)

### Implementar el estimador y obtener los coeficientes Scikit Learn

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
rl = LinearRegression()

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x = df.iloc[:,:-1].values

In [None]:
columnas = ['LSTAT', 'INDUS', 'NOX', 'RM']

In [None]:
x = df[columnas]

In [None]:
y = df['MEDV']

In [None]:
type(y)

In [None]:
y = df['MEDV'].values

In [None]:
type(y)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y,test_size = 0.3, random_state = 101)

In [None]:
rl.fit(X_train, y_train)

In [None]:
predicciones = rl.predict(X_test)

In [None]:
list(zip(columnas, rl.coef_))

In [None]:
columnas

In [None]:
df_coeficientes = pd.DataFrame(rl.coef_,columnas, columns=['Coeficiente'])
df_coeficientes

¿Como se interpretan los coeficientes?

### Evaluacion del modelo

In [None]:
from sklearn import metrics

In [None]:
print('MAE:', metrics.mean_absolute_error(y_test, predicciones))
print('MSE:', metrics.mean_squared_error(y_test, predicciones))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predicciones)))

Implementado regresion lineal simple

In [None]:
x = df[['RM']]

In [None]:
lr = LinearRegression()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y,test_size = 0.3, random_state = 101)

In [None]:
lr.fit(x, y)

In [None]:
y_train_pred = lr.predict(X_train)

In [None]:
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(X_train, y_train, color='green')
plt.plot(X_train, y_train_pred, color='black', linewidth=4)
plt.title('Training data')
plt.show()

In [None]:
y_test_pred = lr.predict(X_test)

In [None]:
plt.figure()
plt.scatter(X_test, y_test, color='green')
plt.plot(X_test, y_test_pred, color='black', linewidth=4)
plt.title('Test data')
plt.show()

### Ejercicio

- Implementar un modelo de regresion lineal para el conjunto de datos "USA_Housing.csv"
- Realizar una prediccion del valor de una casa para una sola observacion

### Enlaces de interes

- https://seaborn.pydata.org/