# How to Scikit-Learn


# Dimension Reduction

https://scikit-learn.org/stable/modules/decomposition.html <br>

### [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

The point is to find the successive orthogonal components that explain most of the variance of the centered data set.
Here is a very simple video on the Topic https://www.youtube.com/watch?v=FgakZw6K1QQ

you can specify in n_components
* number of features to keep
* 'mle' to let Minka's MLE algorithm fit it for you https://vismod.media.mit.edu/tech-reports/TR-514.pdf
* a percentage between 0 and 1 that represents the amount of total variance that should be explained by your features

Useful attributes
* components_ : array, shape (n_components, n_features) -- Gives you the n_components components (rows) and the contribution of each feature (columns)
* explained_variance_ (ratio_) : array, shape (n_components,) -- Gives you the variance explained by each component

Some Methods
* fit(X) : fits the model with X
* fit_transform(X) : fits AND returns the transformed data
* transform(X) : returns the transformed data using the fitted model
* inverse_transform(X) : transform your data back to the original space
* get_covariance() : computes the covariance matrix $cov \in \mathscr{M}_{n_{features}}$  
$$cov =  components^T * S^2 * components + \boldsymbol{\sigma_2} * I_{n_{features}}$$ 
where $S^2$ contains the explained variances, and $\boldsymbol{\sigma_2}$ contains the noise variances.
* get_precision() : computes the precision (inverse of the covariance)

If you're inteerested in only a certain part of the whole dataset you can use the 
* svd_solver='randomized' : it only uses the right amount of data to predict the n_features wanted

In [5]:
from sklearn.decomposition import PCA

## X is the dataset : lines are instances, columns are features ##

pca = PCA(n_components).fit(X)
X_pca = PCA(n_components).transform(X)

X_pca = PCA(n_components).fit_transform(X)

# This function plots an elbow curve representing the variance explained by components
def plot_elbow(X,n_components=10):
    pca = PCA(n_components).fit(X)
    plt.plot(np.cumsum(pca.explained_variance_ratio_))
    plt.xlabel('number of components')
    plt.ylabel('cumulative explained variance')
    plt.title('Ratio of variance explained by the number components')
    plt.show()
    
#A more general implementation for visualizing data is available under Kernel PCA

NameError: name 'n_components' is not defined

#### Incremental PCA

For big sized data you would want to use chunks of data.
It computes estimates of components and naoise variances from a batch and then updates them with the next batch <br>
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.IncrementalPCA.html

#### Kernel PCA

Documentation : https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html

- You can use a special kernel to separate non linear datasets : https://scikit-learn.org/stable/modules/metrics.html

    - Linear : $$ K(x,x') = x^Tx' $$
    - poly : $$ K(x,x') = ( \color {green} \gamma x^T x' + \color {blue} c_0)^\color {red}d $$
    - sigmoid : $$ K(x,x') = tanh( \color {green} \gamma x^T x' + \color {blue} c_0 ) \;\;\; $$
    - Radial basis function (RBF) : $$ K(x,x') = exp(- \color {green} \gamma \|{x-x'}\|^2) $$
    - cosine : $$ K(x,x') = \frac {x^T x'}{\|x^T\| \|x'\|} $$

You can tune some Hyper parameter

$\color {green} \gamma $ <br>
`gamma  (default = 1/n_features) is used by poly / sigmoid / rbf`<br>
$\color {blue} {c_0} $ <br>
`coef0  (default = 1)            is used by poly / sigmoid` <br>
$\color {red} d $ <br>
`degree (default = 3)            is used by poly`<br>


More info on kernels : http://crsouza.com/2010/03/17/kernel-functions-for-machine-learning-applications/

In [4]:
from sklearn.decomposition import KernelPCA

# This function plots the projection of the data on the 1 2 or 3 main components and returns the PCA
#using whichever kernel and parameter you give it

def plot_pca (X,y,kernel='linear',n_components=2,gamma=None,coef0=None,degree=None):
    pca = KernelPCA(n_components,kernel, gamma=gamma, degree=degree, coef0=coef0)
    X_pca = pca.fit_transform(X)
    print("original shape:   ", X.shape)
    print("transformed shape:", X_pca.shape)
    if n_components==1:
        plt.scatter(X_pca[:,0],np.zeros(len(X_pca),),alpha=0.2,c=y.values,vmin=-3,vmax=3,)
        plt.xlabel('Component 1')
        plt.title("data projected on the main component \n using " + kernel + " kernel")
    elif n_components==2:
        plt.scatter(X_pca[:,0],X_pca[:,1],alpha=0.2,c=y.values,vmin=-3,vmax=3)
        plt.xlabel('Component 1')
        plt.ylabel('Component 2')
        plt.title("data projected on the 2 main components \n using " + kernel + " kernel")
    elif n_components==3:
        from mpl_toolkits.mplot3d import Axes3D
        fig=plt.figure()
        ax = fig.add_subplot(111, projection='3d')
        ax.scatter(X_pca[:,0],X_pca[:,1],X_pca[:,2],alpha=0.2,c=y.values,vmin=-3,vmax=3)
        ax.set_xlabel('Component 1')
        ax.set_ylabel('Component 2')
        ax.set_zlabel('Component 3')
        plt.title("data projected on the 3 main components \n using " + kernel + " kernel")
        return pca
    else :
        print("how am I supposed to show you that with your 2-D eyes, beta !")
        return pca
    plt.colorbar()

    plt.show()
    return pca

#### Sparse PCA

You can use Sparse PCA to yield sparse component, this is used via a Lasso ($l_1$) regularization
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html#sklearn.decomposition.SparsePCA



#### Truncated SVD

If you have a large sparse dataset that you don't want to center (because of Out Of Memory Error) use this algorithm (ex : tf-idf count matrices)
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html



### Locally Linear Embedding (LLE)

Manifold Learning technique : Learning locally linear space for k closest neighbor $LLE : x^{(i)} \to z^{(i)}$
- Selects k closest neighbors 
$$ K_\boldsymbol{x^{(i)}} = \underset{(\boldsymbol x^{(j)})_{j\in[[1:k]]}}{\operatorname{argmin}}
(\sum\limits_{j=1}^k d(x^{(i)}-x^{(j)})) $$
- Optimizes the weights for the locally linear relations (constructing linear model for each k subset)
$$ \boldsymbol {\hat W} = \underset{\boldsymbol W \in \mathscr M_m}{\operatorname{argmin}}
\sum\limits_{i=1}^m \| \boldsymbol {x^{(i)}} - \sum\limits_{j=1}^m w_{i,j} \boldsymbol{x^{(j)}} \|^2  $$ 
$$ \text{ where } w_{i,j}=0 \text{ if } \boldsymbol{x^{(j)}} \not\in K_\boldsymbol{x^{(i)}} 
\text{ and } \sum\limits_{j=1}^m w_{i,j}=1 $$
- Minimizes the distance between the closest neighbourg (constructing low dimensional representation)
$$ \boldsymbol {\hat Z} = \underset{\boldsymbol Z \in \mathscr M_m}{\operatorname{argmin}}
\sum\limits_{i=1}^m \| \boldsymbol {z^{(i)}} - \sum\limits_{j=1}^m \hat{w}_{i,j} \boldsymbol{z^{(j)}} \|^2  $$ 


### MultiDimensional Scaling (MDS)



### Isomap

Creates a graph and reduces dimensionality by preserving geodesic distance

### t-Distributed Stochastic Neighbor Embedding (t-SNE)



### Linear Discriminant Analysis (LDA)



# Model Selection

Documentation :https://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

## Grid Search

Documentation : https://scikit-learn.org/stable/modules/grid_search.html

A Grid Search is used to fine the best hyperparameter for your model.

* a parameter space (which parameters of your are you gonna tune)
* a method for searching and sampling candidates (which values are the parameters gonna take)
* an estimator (what regressor or classifier will make the predictions)
* a score function (how are you gonna measure which model is better)
* a cross validation scheme (for unbiased estimator you have to cross validate)

### Defining a grid of parameters

Here is a standard parameter grid for a kernel PCA decomposition problem

In [None]:
param_grid = [
    {'pca__kernel': ['linear','cosine'],
     'pca__n_components':[1,2,3,4,5,6]},
    {'pca__kernel': ['rbf'], 
     'pca__gamma':[10**-6, 10**-5, 10**-4, 0.001, 0.01, 0.1, 1, 10],
     'pca__n_components':[1,2,3,4,5,6]},
    {'pca__kernel': ['sigmoid'],
     'pca__gamma':[-10**-6,-10**-5,-10**-4,-0.001,-0.01,-0.1,-1,-10,
                    10**-6, 10**-5, 10**-4, 0.001, 0.01, 0.1, 1, 10],
     'pca__coef0':[-100,-10,-5,-1,-0.1,0,
                    100, 10, 5, 1, 0.1],
     'pca__n_components':[1,2,3,4,5,6]},
    {'pca__kernel': ['poly'],
     'pca__gamma':[10**-6, 10**-5, 10**-4, 0.001, 0.01, 0.1, 1, 10],
     'pca__coef0':[-100,-10,-5,-1,-0.1,0,
                    100, 10, 5, 1, 0.1],
     'pca__degree':[-5,-4,-3,-2,-1,-0.5,0.5,2,3,4,5],
     'pca__n_components':[1,2,3,4,5,6]},
 ]

### Applying the Grid Search

[GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV)
lets you choose several of the options you want for your Grid Search

* estimator (object with a score function)
* param_grid (dict)
* n_jobs (int) : number of jobs to run in parallel : -1 sets maximum
* cv (int) : number of fold for the Kfold or Stratified Kfold (default) or cv method
* verbose (int) : 0 (no output) 1(some outout) 2(every CV time output) 3(CV time + score output)

Useful attribute :
* cv_results : Dict with results
* best_estimator_ : object estimator with the parameters that yielded the best score

Methods :
* fit(X,y) : Runs fits for all the parameters
* transform(X) : Runs transform of X for the best estimator
* predict(X) : Runs predict of X using the best estimator

## Cross validation

Documentation https://scikit-learn.org/stable/modules/cross_validation.html

In [None]:
import numpy as np
from sklearn.model_selection import KFold

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9,10], [11,12]])  #your EARLY dataset
y = np.array([0, 1, 2, 3, 4, 5])                              #your PREDICTED dataset
kf = KFold(n_splits=3)   #do a 3 fold
print(X.shape, y.shape)
scores=list()

for train_index, test_index in kf.split(X,y):
    print("TRAINindex:", train_index, "TESTindex:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    print("TrainSet: \n", X_train, "\n", y_train,"\n TestSet: \n",X_test, "\n",y_test)
    
    # DEFINE A MODEL HERE
    
    # FIT A MODEL HERE ON X_TRAIN + y_train
    
    # EVALUATE MODEL HERE X_TEST + y_test
    
    # STORE THE RESULTS in a list scores=list() scores.append(accuracy,loss)
    
print('Estimated Accuracy %.3f (%.3f)' % (np.mean(scores), np.std(scores)))
