Reducción de dimensionalidad usando SelectFromModel() --- 3:33 min
===

* 3:33 min | Última modificación: Octubre 11, 2021 | [YouTube](https://youtu.be/QnU0A914CW8)

Modelos lineales
---

Los modelos lineales penalizados con una norma L1 tienen a hacer muchos de los coeficientes de las características iguales a cero, por lo que pueden ser usados para la reducción de la dimensionalidad de los datos (selección de variables). Se recomiendan los siguientes tipos de modelos:

* Lasso()

* LogisticRegress()

* LinearSVC()

In [1]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
X.shape

(150, 4)

In [2]:
from sklearn.feature_selection import SelectFromModel
from sklearn.svm import LinearSVC

#
# Crea y entrena un estimador
#
linearSVC = LinearSVC(
    C=0.01,
    penalty="l1",
    dual=False,
)

linearSVC.fit(X, y)

#
# Selector
#
model = SelectFromModel(
    # -------------------------------------------------------------------------
    # The base estimator from which the transformer is built. This can be both
    # a fitted (if prefit is set to True) or a non-fitted estimator.
    estimator=linearSVC,
    # -------------------------------------------------------------------------
    # The threshold value to use for feature selection. Features whose
    # importance is greater or equal are kept while the others are discarded.
    threshold=None,
    # -------------------------------------------------------------------------
    # Whether a prefit model is expected to be passed into the constructor
    # directly or not.
    prefit=True,
    # -------------------------------------------------------------------------
    # Order of the norm used to filter the vectors of coefficients below
    # threshold in the case where the coef_ attribute of the estimator is of
    # dimension 2.
    norm_order=1,
    # -------------------------------------------------------------------------
    # The maximum number of features to select.
    max_features=None,
)

X_new = model.transform(X)
X_new.shape

(150, 3)

Usando árboles 
---

In [3]:
from sklearn.ensemble import ExtraTreesClassifier

treeClassifier = ExtraTreesClassifier(n_estimators=50)
treeClassifier = clf.fit(X, y)
treeClassifier.feature_importances_

array([0.09897853, 0.05853639, 0.3586455 , 0.48383957])

In [4]:
from sklearn.feature_selection import SelectFromModel

model = SelectFromModel(
    estimator=treeClassifier,
    prefit=True,
)

X_new = model.transform(X)
X_new.shape

(150, 2)