## Implementación de una regresión “stepwise” con eliminación hacia atrás.

#### Utilizando como referencia el código del algoritmo “stepwise” con selección hacia adelante (Fordward Stepwise Regression) que se encuentra en el archivo “CIF005_02_06_Stepwise.ipynb” realizar una implementación del del algoritmo con eliminación hacia atrás (Backward Stepwise Regression). 
#### En este caso la selección de las variables se realiza empezando con un modelo que utiliza todas la variables disponibles para ir eliminando en cada paso la produce el modelo menos significativo.

Importamos las librerías que vamos a necesitar:

In [2]:
import pandas as pd
import numpy as np
import math
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import train_test_split

Creamos la función de regresión con eliminación hacia atrás.

In [57]:
def backward_regression(x, y):
    # Obtencion del conjunto de datos para validación
    x_train, x_test, y_train, y_test = train_test_split(x, y)

    # Modelo para realizar los ajustes
    model = LinearRegression()

    # Variable para almecena los índices de la lista de atributos usados
    feature_list = list(x.columns)
    feature_order = range(len(feature_list))
    feature_error = []
    feature_names = []

    # Iteración sobre todas las variables
    for i in range(len(feature_list)-1):
        idx_try = [val for val in range(len(feature_list)) if val in feature_order]
        iter_error = []

        for i_try in idx_try:
            useRow = feature_order[:]
            useRow.append(i_try)

            use_train = x_train[x_train.columns[useRow]]
            use_test = x_test[x_train.columns[useRow]]

            model.fit(use_train, y_train)
            rmsError = np.linalg.norm((y_test - model.predict(use_test)), 2)/math.sqrt(len(y_test))
            iter_error.append(rmsError)

        pos_best = np.argmin(iter_error)
        
        feature_order.remove(idx_try[pos_best])
        feature_names.append(idx_try[pos_best])
        feature_error.append(iter_error[pos_best])
        
        print "Paso", len(feature_error), "Eliminamos variable", feature_list[idx_try[pos_best]], "con RMS", iter_error[pos_best]
        
    return feature_names, feature_order, feature_error

Preparamos los datos que vamos a usar en el modelo

In [7]:
wine = pd.read_csv('winequality-white.csv', sep = ';')
wine.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


In [8]:
# Separación de la variable objetivo y las explicativas
target = 'quality'
features = list(wine.columns)
features.remove('quality')

x = wine[features]
y = wine[target]

# Obtencion del conjunto de datos para validación
x_train, x_test, y_train, y_test = train_test_split(x, y)

Probamos la regresión backward stepwise que hemos creado

In [56]:
backward_regression(x,y)

Paso 1 Eliminamos variable residual sugar con RMS 0.770455861154
Paso 2 Eliminamos variable density con RMS 0.779193141125
Paso 3 Eliminamos variable chlorides con RMS 0.783659181401
Paso 4 Eliminamos variable total sulfur dioxide con RMS 0.784867265279
Paso 5 Eliminamos variable fixed acidity con RMS 0.784934413738
Paso 6 Eliminamos variable free sulfur dioxide con RMS 0.784938445501
Paso 7 Eliminamos variable citric acid con RMS 0.793297062448
Paso 8 Eliminamos variable volatile acidity con RMS 0.793131963239
Paso 9 Eliminamos variable pH con RMS 0.812245177509
Paso 10 Eliminamos variable sulphates con RMS 0.812330591092


([3, 7, 4, 6, 0, 5, 2, 1, 8, 9],
 [10],
 [0.77045586115442499,
  0.77919314112464799,
  0.78365918140119184,
  0.78486726527931461,
  0.78493441373754336,
  0.78493844550051428,
  0.7932970624483473,
  0.79313196323929591,
  0.81224517750863878,
  0.81233059109196593])