# Regresión lineal múltiple con TensorFlow
<img src="https://raw.githubusercontent.com/fhernanb/fhernanb.github.io/master/docs/logo_unal_color.png" alt="drawing" width="200"/>

Aquí se muestra un ejemplo de como usar TensorFlow para ajustar un modelo de regresión lineal múltiple. 

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

### Los datos

Los datos a usar en esta aplicación provienen del libro [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/), la variable dependiente es `sales` de un negocio que invirtió dinero en publicidad en tres medios `TV`, `radio` y `newspaper`.

El objetivo del ejercicio es ajustar el siguiente modelo de regresión

\begin{equation} \label{eq1}
\begin{split}
sales &\sim N(\mu_i, \sigma^2) \\
\mu_i &= \beta_0 + \beta_1 TV + \beta_2 radio + \beta_3 newspaper
\end{split}
\end{equation}

In [0]:
dataset = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0)

In [0]:
print(dataset.shape) # para ver las dimensiones
dataset.head()

Seleccionando solo las variables de interés

### Limpiando los datos

Revisando para ver si hay NA's

In [0]:
dataset.isna().sum()

### Creando train y test datasets

In [0]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

In [0]:
print(train_dataset.shape)
print(test_dataset.shape)

In [0]:
test_dataset.head()

### Inspeccionando los datos

Una mirada rápida a los datos.

In [0]:
sns.pairplot(train_dataset[["sales", "TV", "radio", "newspaper"]], diag_kind="kde")

Explorando algunas estadísticas:

In [0]:
train_stats = train_dataset.describe()
train_stats.pop("sales")
train_stats = train_stats.transpose()
train_stats

### Separando la variable respuesta $y$


In [0]:
train_labels = train_dataset.pop('sales')
test_labels = test_dataset.pop('sales')

### Normalizando los datos

Se deben normalizar los datos para asegurar que la red neuronal funciona bien.

In [0]:
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']

normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

## El modelo

### Construyendo el modelo

Vamos a construir el modelo. Vamos a usar un `Sequential` modelo con dos capas ocultas y una neurona de salida.

In [0]:
def build_model():
  model = keras.Sequential([
    layers.Dense(3, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
    layers.Dense(3, activation=tf.nn.relu),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mean_squared_error',
                optimizer=optimizer,
                metrics=['mean_absolute_error', 'mean_squared_error'])
  return model

In [0]:
model = build_model()

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [0]:
model.summary()

### Entrenando el modelo

Entrenemos el modelo con 500 épocas y guardemos la exactitud con el conjunto de train y test en el objeto `history`

In [0]:
# Display training progress by printing a single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')

EPOCHS = 500

history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0,
  callbacks=[PrintDot()])

Miremos el progreso del entrenamiento del modelo, veamos los últimos valores de `history`

In [0]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [0]:
def plot_history(history):
  hist = pd.DataFrame(history.history)
  hist['epoch'] = history.epoch

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [MPG]')
  plt.plot(hist['epoch'], hist['mean_absolute_error'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mean_absolute_error'],
           label = 'Val Error')
  plt.ylim([0, 15])
  plt.legend()

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Square Error [$MPG^2$]')
  plt.plot(hist['epoch'], hist['mean_squared_error'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mean_squared_error'],
           label = 'Val Error')
  plt.ylim([0, 250])
  plt.legend()
  plt.show()


plot_history(history)

In [0]:
model = build_model()

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])

plot_history(history)

In [0]:
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)

print("Testing set Mean Abs Error: {:5.2f} sales".format(mae))

### Creando predicciones

Finally, predict $y$ values using data in the testing set:

In [0]:
test_predictions = model.predict(normed_test_data).flatten()

plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [y]')
plt.ylabel('Predictions [y]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])


In [0]:
import numpy as np
np.corrcoef(test_labels, test_predictions)