# Estimators

- What is an estimator in scikit learn
    - explain their methods (fit, predict, etc.)


## Linear Regression

- Description of linear regression

Let's create a fake dataset for the linear regression problem, with 400 samples and 100 features. We will also define 10 of these features as informative, and add some gaussian noise.

In [51]:
from sklearn.datasets import make_regression

X, y = make_regression(
    n_samples=400, n_features=100, n_informative=20, noise=10, random_state=0
)

Let's create a linear regression model. You can read the full documentation of this estimator [here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html#sklearn.datasets.make_regression).

In [52]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()

Explain `fit`

In [53]:
reg = reg.fit(X, y)

Explain `score`:

In [54]:
reg.score(X, y)

0.9991810538118069

Explain underscore in the end:

In [55]:
coefs = reg.coef_
intercept = reg.intercept_

print(f"Model coefficients (first 10):\n {coefs[:10]} \n")
print(f"Model intercept: \n {intercept}")

Model coefficients (first 10):
 [-1.13225733 -0.37761699 -0.55009782  0.11689833 -0.3783587   0.28209964
 -0.11756948 96.29011361 -1.00224858 81.99109074] 

Model intercept: 
 -0.49212396559277516


Explain `predict`:

In [56]:
reg.predict(X[:10])

array([-269.24013835, -349.57619949,  106.0034781 ,  -31.19370111,
       -240.33134406,  -44.64621891,    3.24037414, -819.22611213,
        322.17453375,  158.47068373])

(go over other methods)

### Exercises

1. What happens if you try to call `score` without running `fit` first?

## Logistic regression
- Example of logistic regression as estimator

In [58]:
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=400, n_features=100, n_informative=20, random_state=0
)

Read the documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)

In [61]:
from sklearn.linear_model import LogisticRegression

# Create model
clf = LogisticRegression()

# Fit model
clf = clf.fit(X, y)

# Score predictions
clf.score(X, y)


0.86

### Exercise

Inspect the coefficients and intercept of the fitted logistic regression.

## Support Vector Machine
- Example of SVM as estimator

In [64]:
from sklearn.svm import SVC

# Create model
svc = SVC()

# Fit model
svc = svc.fit(X, y)

# Score predictions
svc.score(X, y)

0.9875

## KMeans
- Example of KMeans as estimator


In [65]:
from sklearn.datasets import make_blobs

X, y = make_blobs(
    n_samples=400, n_features=100, random_state=0
)

In [66]:
from sklearn.cluster import KMeans

# Create model
kmeans = KMeans()

# Fit model
kmeans = kmeans.fit(X, y)

# Score predictions
kmeans.score(X, y)

-38160.49282823233

Explain output of clusters

### Exercise 

Read the documentation and write code that tells us which cluster the samples belong to.

# Performance metrics

- How to choose the performance metrics
- Example of accuracy, roc_auc, confusion matrix

In [76]:
# Create dataset
X, y = make_classification(
    n_samples=400, n_features=100, n_informative=20, 
    weights=[0.8, 0.2], random_state=0
)

# Create model
clf = LogisticRegression()

# Fit model
clf = clf.fit(X, y)

Explain `roc_auc`:

In [77]:
from sklearn.metrics import roc_auc_score

# Predict labels
y_pred = clf.predict(X)

# Use another scoring method
print(clf.score(X, y))
print(roc_auc_score(y, y_pred))

0.935
0.8703124999999999


In [85]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

conf_matrix = confusion_matrix(y, y_pred, labels=clf.classes_)
conf_matrix_disp = ConfusionMatrixDisplay(
    confusion_matrix=conf_matrix, display_labels=clf.classes_
    #display_labels=clf.classes_
)
conf_matrix_disp.display.plot()

AttributeError: 'ConfusionMatrixDisplay' object has no attribute 'display'

In [86]:
# TODO: add classification report

### Exercise

Can you do the same for a Linear Regression problem?

# Additional reading

- [Choosing the right estimator](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)