***SkLearn Reference***
=======================

**Standard Import**
-------------------
```
import sklearn.model_selection as ms
import sklearn.impute as impute
import sklearn.preprocessing as pp
import sklearn.pipeline as pp
import sklearn.compose as compose

import sklearn.metrics as metrics 
```
---
<br>

**Processing Data**
-------------------

*Handle Missing Data*
---------------------
[API - simpleimputer](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)

```
imputer = impute.SimpleImputer(strategy = {'mean', 'median', 'most_frequent', 'constant', fill_value= 10, copy= True)
imputer.fit_transform(X) --- return a numpy array
```

*Transformation*
----------------
[API - preprocessing](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing)<br>
[API - polyfeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)<br>
[API - minmax](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)<br>
[API - standard](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)<br>
[API - oneHot](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)<br>
[API - getdummies](https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html)

```
pp.PolynomialFeatures(degree, include_bias)
pp.MinMaxScaler()
pp.StandardScaler()

one_hot = pp.OneHotEncoder(sparse= False) --- return a numpy array if sparse = False else return a sparse matrix
one_hot.get_feature_names() --- return the feature names in the transformed data

pd.get_dummies(X) --- one-hot encode
```

*Splitting Data*
----------------
[API - train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

```
ms.train_test_split(X, y, train_size, random_state, shuffle, stratify)
```

*Data Pipeline*
---------------
[API - pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)
[API - columntransformer](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html)

```
data_pipe = pipe.Pipeline([
                            ( 'scaler', pp.StandardScaler() ), 
                            ( 'impute', impute.SimpleImputer(strategy='median') ),
                            ( 'tree_classifier', tree.DecisionTreeClassifier(random_state=42) )
                            ])
----------
num_pipeline = pipe.Pipeline([
                        ( 'impute', impute.SimpleImputer(strategy='mean') ),
                        ( 'scaler', pp.StandardScaler() ),
							])
								
full_pipeline = compose.ColumnTransformer([
                        ( 'num', num_pipeline, lst_num ),
                        ( 'cat', pp.OneHotEncoder(), lst_cat ),
                        ], remainder='passthrough')

lst_num is the list variable containing all the variable name of the columns to be transformed.

full_pipeline.fit_transform(X)
full_pipeline.named_transformers_['cat'].get_feature_names()
```
---
<br>

**Classifiers**
---------------

*Linear Classifiers*
--------------------
```
import sklearn.linear_model as lm
```
<strong>Logistic Regression</strong>

[API](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

```
lmLogisticRegression() --- this is a native binary classifier
lm.LogisticRegression(
        multi_class="multinomial", 
        solver="lbfgs", --- must be stated for a softmax regression 
        C=10, --- applied regularization, inverse C aka higher value lowwer regularization
        ) --- this is a multiclass classifier aka softmax regression

estimator.fit(X, y)
estimator.predict(X)
estimator.predict_proba(X) --- return probability estimates
estimator.decison_function(X)
estimator.score(X, y) --- return the mean accuracy
```

<u>SGD Classifier</u>

[API](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)

```
lm.SGDClassifier() --- this is a native multi-class classifier
```

*NN Classifiers*
----------------
```
import sklearn.neural_network as nn
```

<u>Multi-Layer Perceptron  Classifier</u>

[API](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)

```
nn.MLPClassifier(
        hidden_layer_sizes= 10, 
        activation= {'identity', 'logistic', 'tanh', 'relu'}, 
        solver= {'lbfgs', 'sgd', 'adam'}
        )
```

*Tree Classifiers*
------------------
```
import sklearn.tree as tree
```

<u>Decision Tree Classifier</u>

[API - dtclassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)
[API - plottree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html)

```
tree.DecisionTreeClassifier(
        max_depth= 10,
        cp_alpha= [0,1],
        max_featurs,
        max_leaf_nodes,
        min_samples_split,
        min_sample_leaf
        )

estimator.cost_complexity_pruning_path(X, y) --- generate path that contain alpha (pruning values) and impurity score

tree.plot_tree(estimator, 
                filled=True, 
                rounded=True, 
                class_name=['negaitve_class', 'positive_class'], 
                ) --- to generate tree diagram 

```

*Ensemble Classifiers*
----------------------
```
import sklearn.ensemble as ensemble
```

<u>Random Forest Classifier</u>

[API - rfclassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

```
ensemble.RandomForestClassifier(
       		n_estimators,
		bootstrap= True
		max_samples,
                oob_score, ---if True, uses out-of-bag samples to estimate the generalization accuracy
                n_jobs ---> number of CPU cores to use for training and prediction
                )     
```

<u>Voting Classifier</u>

[API - votingclassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html)

```
ensemble.VotingClassifier(
                estimators=[ ('name', estimator),.... ],
                voting= {'hard', 'soft'},
                flatten_transform= {True, False}
                )

```

| flatten_transform 	| voting 	| What it return after calling transform method on X                      	|
|-------------------	|--------	|-------------------------------------------------------------------------	|
| True              	| hard   	| Return predicted class label                                            	|
| {False, True}     	| soft   	| Return the probabilities of class labels for all estimators in ensemble 	|


<u>Bagging Classifier</u>

[API - baggingclassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)

```
ensemble.bagging()



```

















