# Instituto Tecnológico y de Estudios Superiores de Occidente
## Maestria en Ciencia de Datos
## Optimizacion Convexa
## HW 04: Constrained optimization and regularization
## Symbolic Regressor
### Professor: 
- Dr. Juan Diego Sanchez Torres
### Team: 
- María Elisa Vaca Gómez 
- Alejandra Paola Galindo Hernández
- Jesús Rodrigo Ponce González
- Aldo Emmanuel Villareal Palomino

# Example 2: Symbolic Transformer

This example demonstrates using the **SymbolicTransformer** to generate new non-linear features automatically.

Let’s load up the Boston housing dataset and randomly shuffle it:

In [11]:
from gplearn.genetic import SymbolicTransformer
from sklearn.utils import check_random_state
from sklearn.datasets import load_boston
import numpy as np

We’ll use **Ridge Regression** for this example and train our regressor on the first 300 samples, and see how it performs on the unseen final 200 samples. The benchmark to beat is simply Ridge running on the dataset as-is:

In [26]:
rng = check_random_state(0)
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

The boston data-set has 506 observations and **13 variables**.

In [27]:
boston.data.shape

(506, 13)

For this model obtained a **$R^2= 0.76$** 

In [57]:
from sklearn.linear_model import Ridge
est = Ridge()
est.fit(boston.data[:300, :], boston.target[:300])
R1=est.score(boston.data[300:, :], boston.target[300:])
print(est.score(boston.data[300:, :], boston.target[300:]))

0.7593194530498838


So now we’ll train our transformer on the same first 300 samples to generate some new features. Let’s use a large population of 2000 individuals over 20 generations. We’ll select the best 100 of these for the `hall_of_fame`, and then use the least-correlated 10 as our new features. A little parsimony should control bloat, but we’ll leave the rest of the evolution options at their defaults. The default `metric='pearson'` is appropriate here since we are using a linear model as the estimator. If we were going to use a tree-based estimator, the Spearman correlation might be interesting to try out too:

Using the `SymbolicTransformer` the algorithm over 20 gerenations, will create 2,000 new variables, then will select the best 100 and consider the 10 least correlated variables as new features. 

The way that the algorithm create this variables is using the next functions:

`function_set = ['add', 'sub', 'mul', 'div', 'sqrt', 'log',
                'abs', 'neg', 'inv', 'max', 'min']`

In [28]:
function_set = ['add', 'sub', 'mul', 'div', 'sqrt', 'log',
                'abs', 'neg', 'inv', 'max', 'min']
gp = SymbolicTransformer(generations=20, population_size=2000,
                         hall_of_fame=100, n_components=10,
                         function_set=function_set,
                         parsimony_coefficient=0.0005,
                         max_samples=0.9, verbose=1,
                         random_state=0)

gp.fit(boston.data[:300, :], boston.target[:300])
gp_features = gp.transform(boston.data)

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
   0    11.04         0.339876        6         0.822502         0.675124     28.38s
   1     6.91         0.593562        7         0.836993         0.602468     27.51s
   2     5.07         0.730093        8          0.84063         0.704017     26.97s
   3     5.22         0.735525        5         0.847019         0.628351     27.51s
   4     6.24         0.734679       10         0.856612         0.565138     25.03s
   5     8.23         0.721433       18          0.85677         0.728095     27.27s
   6    10.20         0.717937       14         0.875233         0.619693     22.76s
   7    11.84         0.720667       14         0.875927         0.609363     20.67s
   8    12.56         0.733019       27         0.881705         0.390121  

As a result we have a new array with 10 new variables for the 506 observations.

In [34]:
gp_features.shape

(506, 10)

Then, we'll combine the original data-set (506 observations and 13 variables) and the one obtained with the `SymbolicTransformer` (506 observations and 10 variables), we'll save this in a array named `new_boston` (506 observations and 23 varaibles)

In [17]:
new_boston = np.hstack((boston.data, gp_features))
new_boston.shape

(506, 23)

We will train a **Ridge Regression** with this new data-set using the first 300 observations and tested with the rest.

In [56]:
est = Ridge()
est.fit(new_boston[:300, :], boston.target[:300])
R2=est.score(new_boston[300:, :], boston.target[300:])
print(est.score(new_boston[300:, :], boston.target[300:]))

0.841837210518192


As a result we obtained a **$R^2 = 0.84$**, we can conclude that using tha data adding the variables created with the `SymbolicTransformer` we increase the score of the model by **0.08**.

In [58]:
R2-R1

0.08251775746830825