# Le gradient et le discret

Les méthodes d'optimisation à base de gradient s'appuie sur une fonction d'erreur dérivable qu'on devrait appliquer de préférence sur des variables aléatoires réelles. Ce notebook explore quelques idées.

In [None]:
from jyquickhelper import add_notebook_menu
add_notebook_menu()

## Un petit problème simple

On utilise le jeu de données *iris* disponible dans [scikit-learn](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html).

In [None]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
Y = iris.target

On cale une régression logistique. On ne distingue pas apprentissage et test car ce n'est pas le propos de ce notebook.

In [None]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X, Y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

Puis on calcule la matrice de confusion.

In [None]:
from sklearn.metrics import confusion_matrix
pred = clf.predict(X)
confusion_matrix(Y, pred)

array([[49,  1,  0],
       [ 2, 21, 27],
       [ 1,  4, 45]])

## Multiplication des observations

Le paramètre ``multi_class='ovr'`` stipule que le modèle cache en fait l'estimation de 3 régressions logistiques binaire. Essayons de n'en faire qu'une seule en ajouter le label ``Y`` aux variables. Soit un couple $(X_i \in \mathbb{R^d}, Y_i \in \mathbb{N})$ qui correspond à une observation pour un problème multi-classe. Comme il y a $C$ classes, on multiplie cette ligne par le nombre de classes $C$ pour obtenir :

$$\forall c \in \mathbb{[}1, ..., C\mathbb{]}, \; \left\{ \begin{array}{ll} X_i' = (X_{i,1}, ..., X_{i,d}, Y_{i,1}, ..., Y_{i,C}) \\ Y_i' = \mathbb{1}_{Y_i = c} \\ Y_{i,k} = \mathbb{1}_{c = k}\end{array} \right.$$

Voyons ce que cela donne sur un exemple :

In [None]:
import numpy
import pandas

def multiplie(X, Y, classes=None):
    if classes is None:
        classes = numpy.unique(Y)
    XS = []
    YS = []
    for i in classes:
        X2 = numpy.zeros((X.shape[0], 3))
        X2[:,i] = 1
        Yb = Y == i
        XS.append(numpy.hstack([X, X2]))
        Yb = Yb.reshape((len(Yb), 1))
        YS.append(Yb)

    Xext = numpy.vstack(XS)
    Yext = numpy.vstack(YS)
    return Xext, Yext

x, y = multiplie(X[:1,:], Y[:1], [0, 1, 2])
df = pandas.DataFrame(numpy.hstack([x, y]))
df.columns = ["X1", "X2", "Y0", "Y1", "Y2", "Y'"]
df

Unnamed: 0,X1,X2,Y0,Y1,Y2,Y'
0,5.1,3.5,1.0,0.0,0.0,1.0
1,5.1,3.5,0.0,1.0,0.0,0.0
2,5.1,3.5,0.0,0.0,1.0,0.0


Trois colonnes ont été ajoutées côté $X$, la ligne a été multipliée 3 fois, la dernière colonne est $Y$ qui ne vaut 1 que lorsque le 1 est au bon endroit dans une des colonnes ajoutées. Le problème de classification qui été de prédire la bonne classe devient : est-ce la classe à prédire est $k$ ? On applique cela sur toutes les lignes de la base et cela donne :

In [None]:
Xext, Yext = multiplie(X, Y)
numpy.hstack([Xext, Yext])
df = pandas.DataFrame(numpy.hstack([Xext, Yext]))
df.columns = ["X1", "X2", "Y0", "Y1", "Y2", "Y'"]
df.iloc[numpy.random.permutation(df.index), :].head(n=10)

Unnamed: 0,X1,X2,Y0,Y1,Y2,Y'
239,5.5,2.5,0.0,1.0,0.0,1.0
402,7.1,3.0,0.0,0.0,1.0,1.0
40,5.0,3.5,1.0,0.0,0.0,1.0
56,6.3,3.3,1.0,0.0,0.0,0.0
408,6.7,2.5,0.0,0.0,1.0,1.0
145,6.7,3.0,1.0,0.0,0.0,0.0
184,4.9,3.1,0.0,1.0,0.0,0.0
71,6.1,2.8,1.0,0.0,0.0,0.0
105,7.6,3.0,1.0,0.0,0.0,0.0
427,6.1,3.0,0.0,0.0,1.0,1.0


In [None]:
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf.fit(Xext, Yext.ravel())

GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=100, presort='auto', random_state=None,
              subsample=1.0, verbose=0, warm_start=False)

In [None]:
pred = clf.predict(Xext)
confusion_matrix(Yext, pred)

array([[278,  22],
       [ 24, 126]])

## Introduire du bruit

Un des problèmes de cette méthode est qu'on ajoute est variable binaire pour un problème résolu à l'aide d'une optimisation à base de gradient. C'est moyen. Pas de problème, changeons un peu la donne.

In [None]:
def multiplie_bruit(X, Y, classes=None):
    if classes is None:
        classes = numpy.unique(Y)
    XS = []
    YS = []
    for i in classes:
        # X2 = numpy.random.randn((X.shape[0]* 3)).reshape(X.shape[0], 3) * 0.1
        X2 = numpy.random.random((X.shape[0], 3)) * 0.2
        X2[:,i] += 1
        Yb = Y == i
        XS.append(numpy.hstack([X, X2]))
        Yb = Yb.reshape((len(Yb), 1))
        YS.append(Yb)

    Xext = numpy.vstack(XS)
    Yext = numpy.vstack(YS)
    return Xext, Yext

x, y = multiplie_bruit(X[:1,:], Y[:1], [0, 1, 2])
df = pandas.DataFrame(numpy.hstack([x, y]))
df.columns = ["X1", "X2", "Y0", "Y1", "Y2", "Y'"]
df

Unnamed: 0,X1,X2,Y0,Y1,Y2,Y'
0,5.1,3.5,1.124579,0.190073,0.17895,1.0
1,5.1,3.5,0.185584,1.152029,0.016291,0.0
2,5.1,3.5,0.066786,0.026165,1.011093,0.0


Le problème est le même qu'avant excepté les variables $Y_i$ qui sont maintenantt réel. Au lieu d'être nul, on prend une valeur  $Y_i < 0.4$.

In [None]:
Xextb, Yextb = multiplie_bruit(X, Y)
df = pandas.DataFrame(numpy.hstack([Xextb, Yextb]))
df.columns = ["X1", "X2", "Y0", "Y1", "Y2", "Y'"]
df.iloc[numpy.random.permutation(df.index), :].head(n=10)

Unnamed: 0,X1,X2,Y0,Y1,Y2,Y'
306,4.6,3.4,0.159336,0.171828,1.132019,0.0
370,5.9,3.2,0.05353,0.158136,1.173998,0.0
75,6.6,3.0,1.032254,0.014031,0.107197,0.0
349,5.0,3.3,0.143188,0.052256,1.047003,0.0
147,6.5,3.0,1.043403,0.125372,0.123491,0.0
366,5.6,3.0,0.016536,0.108395,1.04663,0.0
187,4.9,3.1,0.138196,1.048607,0.035669,0.0
397,6.2,2.9,0.184663,0.083806,1.131662,0.0
2,4.7,3.2,1.167549,0.116552,0.103541,1.0
393,5.0,2.3,0.070453,0.033981,1.062573,0.0


In [None]:
from sklearn.ensemble import GradientBoostingClassifier
clfb = GradientBoostingClassifier()
clfb.fit(Xextb, Yextb.ravel())

GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=100, presort='auto', random_state=None,
              subsample=1.0, verbose=0, warm_start=False)

In [None]:
predb = clfb.predict(Xextb)
confusion_matrix(Yextb, predb)

array([[298,   2],
       [ 16, 134]])

C'est un petit peu mieux.

## Comparaisons de plusieurs modèles

On cherche maintenant à comparer le gain en introduisant du bruit pour différents modèles.

In [None]:
def error(model, x, y):
    p = model.predict(x)
    cm = confusion_matrix(y, p)
    return (cm[1,0] + cm[0,1]) / cm.sum()

def comparaison(model, X, Y):

    if isinstance(model, tuple):
        clf = model[0](**model[1])
        clfb = model[0](**model[1])
        model = model[0]
    else:        
        clf = model()
        clfb = model()
    
    Xext, Yext = multiplie(X, Y)
    clf.fit(Xext, Yext.ravel())
    err = error(clf, Xext, Yext)
    
    Xextb, Yextb = multiplie_bruit(X, Y)
    clfb.fit(Xextb, Yextb.ravel())
    errb = error(clfb, Xextb, Yextb)
    return dict(model=model.__name__, err1=err, err2=errb)

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier, RadiusNeighborsClassifier
from sklearn.svm import SVC, LinearSVC

models = [LinearSVC, LogisticRegression, GradientBoostingClassifier,
          RandomForestClassifier, DecisionTreeClassifier, ExtraTreeClassifier, ExtraTreesClassifier,
          (MLPClassifier, dict(activation="logistic")),
          GaussianNB, KNeighborsClassifier,  SVC,
          (AdaBoostClassifier, dict(base_estimator=LogisticRegression(), algorithm="SAMME"))]

res = [comparaison(model, X, Y) for model in models]
df = pandas.DataFrame(res)
df.sort_values("model")

Unnamed: 0,err1,err2,model
11,0.333333,0.333333,AdaBoostClassifier
4,0.048889,0.0,DecisionTreeClassifier
5,0.048889,0.0,ExtraTreeClassifier
6,0.048889,0.0,ExtraTreesClassifier
8,0.333333,0.333333,GaussianNB
2,0.102222,0.048889,GradientBoostingClassifier
9,0.104444,0.106667,KNeighborsClassifier
0,0.333333,0.333333,LinearSVC
1,0.333333,0.333333,LogisticRegression
7,0.333333,0.333333,MLPClassifier


L'ajout ne semble pas décroître la performance et l'améliore dans certains cas. C'est une piste à suivre. Reste à savoir si les modèles n'apprennent pas le bruit.

## Avec une ACP

On peut faire varier le nombre de composantes, j'en ai gardé qu'une. L'ACP est appliquée après avoir ajouté les variables binaires ou binaires bruitées. Le résultat est sans équivoque. Aucun modèle ne parvient à apprendre sans l'ajout de bruit.

In [None]:
from sklearn.decomposition import PCA

def comparaison_ACP(model, X, Y):

    if isinstance(model, tuple):
        clf = model[0](**model[1])
        clfb = model[0](**model[1])
        model = model[0]
    else:        
        clf = model()
        clfb = model()
    
    axes = 1
    solver = "full"
    Xext, Yext = multiplie(X, Y)
    Xext = PCA(n_components=axes, svd_solver=solver).fit_transform(Xext)
    clf.fit(Xext, Yext.ravel())
    err = error(clf, Xext, Yext)
    
    Xextb, Yextb = multiplie_bruit(X, Y)
    Xextb = PCA(n_components=axes, svd_solver=solver).fit_transform(Xextb)
    clfb.fit(Xextb, Yextb.ravel())
    errb = error(clfb, Xextb, Yextb)
    return dict(modelACP=model.__name__, errACP1=err, errACP2=errb)

res = [comparaison_ACP(model, X, Y) for model in models]
dfb = pandas.DataFrame(res)
pandas.concat([ df.sort_values("model"), dfb.sort_values("modelACP")], axis=1)

Unnamed: 0,err1,err2,model,errACP1,errACP2,modelACP
11,0.333333,0.333333,AdaBoostClassifier,0.333333,0.333333,AdaBoostClassifier
4,0.048889,0.0,DecisionTreeClassifier,0.333333,0.0,DecisionTreeClassifier
5,0.048889,0.0,ExtraTreeClassifier,0.333333,0.0,ExtraTreeClassifier
6,0.048889,0.0,ExtraTreesClassifier,0.333333,0.0,ExtraTreesClassifier
8,0.333333,0.333333,GaussianNB,0.333333,0.333333,GaussianNB
2,0.102222,0.048889,GradientBoostingClassifier,0.333333,0.235556,GradientBoostingClassifier
9,0.104444,0.106667,KNeighborsClassifier,0.351111,0.326667,KNeighborsClassifier
0,0.333333,0.333333,LinearSVC,0.333333,0.333333,LinearSVC
1,0.333333,0.333333,LogisticRegression,0.333333,0.333333,LogisticRegression
7,0.333333,0.333333,MLPClassifier,0.333333,0.333333,MLPClassifier


## Base d'apprentissage et de test

Cette fois-ci, on s'intéresse à la qualité des frontières que les modèles trouvent en vérifiant sur une base de test que l'apprentissage s'est bien passé.

In [None]:
from sklearn.model_selection import train_test_split

def comparaison_train_test(models, X, Y, mbruit=multiplie_bruit, acp=None):

    axes = acp
    solver = "full"        
        
    ind = numpy.random.permutation(numpy.arange(X.shape[0]))
    X = X[ind,:]
    Y = Y[ind]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=2.0/3)
    
    res = []
    for model in models:
    
        if isinstance(model, tuple):
            clf = model[0](**model[1])
            clfb = model[0](**model[1])
            model = model[0]
        else:        
            clf = model()
            clfb = model()

        Xext_train, Yext_train = multiplie(X_train, Y_train)
        Xext_test, Yext_test = multiplie(X_test, Y_test)
        if acp:
            Xext_train_ = Xext_train
            Xext_test_ = Xext_test
            acp_model = PCA(n_components=axes, svd_solver=solver).fit(Xext_train)
            Xext_train = acp_model.transform(Xext_train)
            Xext_test = acp_model.transform(Xext_test)        
        clf.fit(Xext_train, Yext_train.ravel())

        err_train = error(clf, Xext_train, Yext_train)
        err_test = error(clf, Xext_test, Yext_test)

        Xextb_train, Yextb_train = mbruit(X_train, Y_train)
        Xextb_test, Yextb_test = mbruit(X_test, Y_test)
        if acp:
            acp_model = PCA(n_components=axes, svd_solver=solver).fit(Xextb_train)
            Xextb_train = acp_model.transform(Xextb_train)
            Xextb_test = acp_model.transform(Xextb_test)        
            Xext_train = acp_model.transform(Xext_train_)
            Xext_test = acp_model.transform(Xext_test_)        
        clfb.fit(Xextb_train, Yextb_train.ravel())

        errb_train = error(clfb, Xextb_train, Yextb_train)
        errb_train_clean = error(clfb, Xext_train, Yext_train)
        errb_test = error(clfb, Xextb_test, Yextb_test)
        errb_test_clean = error(clfb, Xext_test, Yext_test)
        
        res.append(dict(modelTT=model.__name__, err_train=err_train, err2_train=errb_train,
               err_test=err_test, err2_test=errb_test, err2_test_clean=errb_test_clean,
               errb_train_clean=errb_train_clean))
        
    dfb = pandas.DataFrame(res)
    dfb = dfb[["modelTT", "err_train", "err2_train", "errb_train_clean", "err_test", "err2_test", "err2_test_clean"]]
    dfb = dfb.sort_values("modelTT")        
    return dfb

dfb = comparaison_train_test(models, X, Y)
dfb

Unnamed: 0,modelTT,err_train,err2_train,errb_train_clean,err_test,err2_test,err2_test_clean
11,AdaBoostClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
4,DecisionTreeClassifier,0.04,0.0,0.333333,0.186667,0.34,0.333333
5,ExtraTreeClassifier,0.04,0.0,0.25,0.153333,0.253333,0.32
6,ExtraTreesClassifier,0.04,0.0,0.146667,0.173333,0.193333,0.213333
8,GaussianNB,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
2,GradientBoostingClassifier,0.096667,0.01,0.226667,0.206667,0.253333,0.3
9,KNeighborsClassifier,0.113333,0.11,0.11,0.146667,0.133333,0.14
0,LinearSVC,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
1,LogisticRegression,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
7,MLPClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333


Les colonnes *errb_train_clean* et *errb_test_clean* sont les erreurs obtenues par des modèles appris sur des colonnes bruitées et testées sur des colonnes non bruitées ce qui est le véritable test. On s'aperçoit que les performances sont très dégradées sur la base d'apprentissage. Une raison est que le bruit choisi ajouté n'est pas centré. Corrigeons cela.

In [None]:
def multiplie_bruit_centree(X, Y, classes=None):
    if classes is None:
        classes = numpy.unique(Y)
    XS = []
    YS = []
    for i in classes:
        # X2 = numpy.random.randn((X.shape[0]* 3)).reshape(X.shape[0], 3) * 0.1
        X2 = numpy.random.random((X.shape[0], 3)) * 0.2 - 0.1
        X2[:,i] += 1
        Yb = Y == i
        XS.append(numpy.hstack([X, X2]))
        Yb = Yb.reshape((len(Yb), 1))
        YS.append(Yb)

    Xext = numpy.vstack(XS)
    Yext = numpy.vstack(YS)
    return Xext, Yext

dfb = comparaison_train_test(models, X, Y, mbruit=multiplie_bruit_centree, acp=None)
dfb

Unnamed: 0,modelTT,err_train,err2_train,errb_train_clean,err_test,err2_test,err2_test_clean
11,AdaBoostClassifier,0.246667,0.333333,0.333333,0.253333,0.333333,0.333333
4,DecisionTreeClassifier,0.013333,0.0,0.116667,0.213333,0.213333,0.24
5,ExtraTreeClassifier,0.013333,0.0,0.19,0.233333,0.293333,0.246667
6,ExtraTreesClassifier,0.013333,0.0,0.103333,0.226667,0.186667,0.253333
8,GaussianNB,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
2,GradientBoostingClassifier,0.083333,0.013333,0.146667,0.186667,0.253333,0.273333
9,KNeighborsClassifier,0.103333,0.086667,0.09,0.226667,0.226667,0.233333
0,LinearSVC,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
1,LogisticRegression,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
7,MLPClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333


C'est mieux mais on en conclut que dans la plupart des cas, le bruit ajouté est dû au fait que les modèles apprennent par coeur. Sur la base de test, les performances ne sont pas meilleures. Une erreur de 33% signifie que la réponse du classifieur est constante. On multiplie les exemples.

In [None]:
def multiplie_bruit_centree_duplique(X, Y, classes=None):
    if classes is None:
        classes = numpy.unique(Y)
    XS = []
    YS = []
    for i in classes:
        
        for k in range(0,5):
            #X2 = numpy.random.randn((X.shape[0]* 3)).reshape(X.shape[0], 3) * 0.3
            X2 = numpy.random.random((X.shape[0], 3)) * 0.8 - 0.4
            X2[:,i] += 1
            Yb = Y == i
            XS.append(numpy.hstack([X, X2]))
            Yb = Yb.reshape((len(Yb), 1))
            YS.append(Yb)
                
    Xext = numpy.vstack(XS)
    Yext = numpy.vstack(YS)
    return Xext, Yext

dfb = comparaison_train_test(models, X, Y, mbruit=multiplie_bruit_centree_duplique, acp=None)
dfb

Unnamed: 0,modelTT,err_train,err2_train,errb_train_clean,err_test,err2_test,err2_test_clean
11,AdaBoostClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
4,DecisionTreeClassifier,0.026667,0.0,0.113333,0.226667,0.193333,0.2
5,ExtraTreeClassifier,0.026667,0.0,0.146667,0.22,0.226667,0.166667
6,ExtraTreesClassifier,0.026667,0.0,0.116667,0.2,0.165333,0.153333
8,GaussianNB,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
2,GradientBoostingClassifier,0.113333,0.106667,0.173333,0.153333,0.205333,0.193333
9,KNeighborsClassifier,0.113333,0.092667,0.123333,0.1,0.14,0.133333
0,LinearSVC,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
1,LogisticRegression,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
7,MLPClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333


Cela fonctionne un peu mieux le fait d'ajouter du hasard ne permet pas d'obtenir des gains significatifs à part pour le modèle [SVC](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

In [None]:
def multiplie_bruit_centree_duplique_rebalance(X, Y, classes=None):
    if classes is None:
        classes = numpy.unique(Y)
    XS = []
    YS = []
    for i in classes:
        
        X2 = numpy.random.random((X.shape[0], 3)) * 0.8 - 0.4
        X2[:,i] += 1  # * ((i % 2) * 2 - 1)
        Yb = Y == i
        XS.append(numpy.hstack([X, X2]))
        Yb = Yb.reshape((len(Yb), 1))
        YS.append(Yb)
                                  
                
    Xext = numpy.vstack(XS)
    Yext = numpy.vstack(YS)
    return Xext, Yext

dfb = comparaison_train_test(models, X, Y, mbruit=multiplie_bruit_centree_duplique_rebalance)
dfb

Unnamed: 0,modelTT,err_train,err2_train,errb_train_clean,err_test,err2_test,err2_test_clean
11,AdaBoostClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
4,DecisionTreeClassifier,0.033333,0.0,0.153333,0.233333,0.24,0.286667
5,ExtraTreeClassifier,0.033333,0.0,0.353333,0.22,0.313333,0.36
6,ExtraTreesClassifier,0.033333,0.0,0.13,0.206667,0.2,0.2
8,GaussianNB,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
2,GradientBoostingClassifier,0.086667,0.016667,0.21,0.126667,0.293333,0.246667
9,KNeighborsClassifier,0.103333,0.116667,0.116667,0.24,0.173333,0.186667
0,LinearSVC,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
1,LogisticRegression,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
7,MLPClassifier,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333


## Petite explication

Dans tout le notebook, le score de la régression logistique est nul. Elle ne parvient pas à apprendre tout simplement parce que le problème choisi n'est pas linéaire séparable. S'il l'était, cela voudrait dire que le problème suivant l'est aussi.

In [None]:
M = numpy.zeros((9, 6))
Y = numpy.zeros((9, 1))
for i in range(0, 9):
    M[i, i//3] = 1
    M[i, i%3+3] = 1
    Y[i] = 1 if i//3 == i%3 else 0
M,Y

(array([[ 1.,  0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  1.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  1.,  0.,  0.,  1.,  0.],
        [ 0.,  1.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  1.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.]]), array([[ 1.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 1.],
        [ 0.],
        [ 0.],
        [ 0.],
        [ 1.]]))

In [None]:
clf = LogisticRegression()
clf.fit(M, Y.ravel())

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [None]:
clf.predict(M)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

A revisiter.