# digits con clasificadores discriminativos

Al igual que en iris, sklearn facilita el aprendizaje y evaluación de clasificadores discriminativos en digits.

In [6]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron, LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import make_pipeline

Lectura del corpus digits:

In [7]:
digits = load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.3, shuffle=True, random_state=23)

Veamos Perceptrón con penalización L2 en función de alpha:

La precisión de Perceptron(alpha=0.00013621602035512732, l1_ratio=0.9, penalty='elasticnet') es 96.3%

In [None]:
for l in ['l1', 'l2', 'elasticnet']:
    for a in np.logspace(-3,-4,150):
        for l2 in [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:
            clf = Perceptron(penalty=l, alpha=a, l1_ratio= l2 ,random_state=0).fit(X_train, y_train)
            acc = accuracy_score(y_test, clf.predict(X_test))
            print('La precisión de {0!s} es {1:.1%}'.format(clf, acc))

Veamos regresión logística con varios solvers (y max_iter=10000).

In [8]:
for solver in ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga']:
    clf = LogisticRegression(solver=solver, max_iter=10000).fit(X_train, y_train)
    acc = accuracy_score(y_test, clf.predict(X_test))
    print('La precisión de {0!s} es {1:.1%}'.format(clf, acc))

La precisión de LogisticRegression(max_iter=10000) es 95.9%
La precisión de LogisticRegression(max_iter=10000, solver='liblinear') es 94.1%
La precisión de LogisticRegression(max_iter=10000, solver='newton-cg') es 95.9%
La precisión de LogisticRegression(max_iter=10000, solver='sag') es 96.5%
La precisión de LogisticRegression(max_iter=10000, solver='saga') es 96.3%


**Ejercicio:** Aparte de características polinómicas, el preproceso de características puede incluir estandarización (StandardScaler) y reducción de la dimensión (PCA). Trata de mejorar la precisión de regresión logística mediante pipelines con
**ingeniería de características**.

In [9]:
def lr_exp(standardize=True, degree=1, n_components=0):
  clf = make_pipeline(StandardScaler() if standardize else None, PolynomialFeatures(degree=degree), PCA(n_components=n_components) if n_components>0 else None,
                      LogisticRegression(solver='sag', max_iter=10000, n_jobs=8)).fit(X_train, y_train)
  y_pred = clf.predict(X_test)
  return accuracy_score(y_test, y_pred)

for n_components in [8,16,32,64,128,256,512,1024,0]:
  for standardize in [False,True]:
    acc =lr_exp(standardize=standardize, degree=2, n_components = n_components)
    print('standardize {0:} components {1:}: {2:.1%} acc'.format(standardize, n_components, acc))

standardize False components 8: 86.9% acc
standardize True components 8: 28.0% acc
standardize False components 16: 94.1% acc
standardize True components 16: 67.2% acc
standardize False components 32: 95.6% acc
standardize True components 32: 89.4% acc
standardize False components 64: 98.0% acc
standardize True components 64: 94.6% acc
standardize False components 128: 97.8% acc
standardize True components 128: 96.1% acc
standardize False components 256: 98.0% acc
standardize True components 256: 96.9% acc
standardize False components 512: 98.1% acc
standardize True components 512: 96.9% acc
standardize False components 1024: 98.3% acc
standardize True components 1024: 96.9% acc
standardize False components 0: 98.1% acc
standardize True components 0: 97.0% acc
