<center><img src=img/MScAI_brand.png width=70%></center>

# Scikit-Learn: Summary

In this notebook we will:

* Summarise very briefly some methods provided by Scikit-Learn
* Look at the Estimator and related APIs.

I'm not recording a video for this notebook - it is just for reading and reference.


### Scikit-Learn Summary Table

Problem | Example | Technique | Create | Fit | Evaluate | Use
------------|---------|-----------|--------|-----|-------|-----
**Unsupervised**|
Clustering  |Customer segmentation|$k$-means|`km = KMeans(nclusters=2)`|`km.fit(X)`|`km.score(X)`|`km.labels_`
Density estimation|Plotting a distribution smoothly|Kernel density estimation|`kde = KernelDensity()`|`kde.fit(X)`| (none)|`kde.score_samples(new_X)`
Representation learning|Visualising data|Multi-dimensional scaling|`mds = MDS()`|`mds.fit(X)`|(none)|`mds.embedding_`
**Supervised**|
Regression  |Predict car values | Linear regression|`lr = LinearRegression()`|`lr.fit(train_X, train_y)`|`lr.score(test_X, test_y)`|`lr.predict(new_X)`
Classification|Predict customer churn|Support vector machines|`svm = SVC()`|`svm.fit(train_X, train_y)`|`svm.score(test_X, test_y)`|`svm.predict(new_X)`

And here's a decision tree from a Scikit-Learn collaborator. As a decision aid (to help you choose an algorithm) it's of dubious value, but as a summary it's nice!

<img src=img/sklearn-tree.png width=95%>

<font size=1>https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html</font>



The user guide is very good:

https://scikit-learn.org/stable/user_guide.html#user-guide

Finally, this page lists all the main packages and so gives an overview of what is available:

https://scikit-learn.org/stable/modules/classes.html

### The Estimator, Predictor, Transformer, and Model APIs

We have already seen the central APIs of Scikit-Learn when studying supervised learning. 



"the main API implemented by scikit-learn is that of the estimator. An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data."


An object is an **estimator** if it has a `fit` method, where `fit` may accept `X` or `X` and `y`:

```python
estimator.fit(X)
# OR...
estimator.fit(X, y)
```

That is **duck typing**!

A **predictor** object is an estimator which also has a `predict` method:

```python
estimator.predict(X)
```

Some predictors whose `predict` output is discrete (e.g. clustering or classification) will implement either `predict_proba` or `decision_function`, which return real values:

```python
estimator.predict_proba(X)
```

The `predict` method is then usually implemented as a threshold over the result, e.g.:

```python
    def predict(self, X):
        return self.predict_proba(X) > self.threshold
```

A **transformer** is an estimator with either `transform` or `fit_transform`. Often transformers are representation learning approaches, ie they output new features after training on the data. `transform` is the method which outputs the new features. `fit_transform` is just a shortcut to calling `fit` and then `transform` on the same data.

**Don't confuse a Scikit-Learn transformer with a deep learning transformer, which is a different concept.**

A **model** is an object with a `score` method which evaluates how good it is, e.g. $R^2$ or classification accuracy. Higher is always better. 

```python
score(X)
```

or

```python
score(X, y)
```

### Semantics

Calling `predict` **before** `fit` is disallowed.

After fitting, the estimator object will usually have some new attributes named with a trailing underscore, e.g. `lr.coef_` and `lr.intercept_` for linear regression, or `support_vectors_` and some others for an SVM.

Also, a call to `fit` over-writes the result of any previous call. 

```python
lr.fit(X1, y1)
lr.fit(X2, y2)
```

has the same effect as just: `lr.fit(X2, y2)`.

(A few estimators allow `warm_start=True` in the constructor, or `partial_fit(X, y)`, so that we can pick up training where we left off. But we won't cover these.)