# What we're covering in the Scikit-Learn Introduction

This notebook outlines the content convered in the Scikit-Learn Introduction.

It's a quick stop to see all the Scikit-Learn functions and modules for each section outlined.

What we're covering follows the following diagram detailing a Scikit-Learn workflow.

<img src="../images/sklearn-workflow-title.png"/>

## 0. Standard library imports

For all machine learning projects, you'll often see these libraries (Matplotlib, NumPy and pandas) imported at the top.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## 1. Get the data ready

In [None]:
# Split the data into training and test sets
from sklearn.model_selection import train_test_split
# Example use case (requires X & y)
X_train, X_test, y_train, y_test = train_test_split(X, y)

## 2. Pick a model/estimator (to suit your problem)
To pick a model we use the [Scikit-Learn machine learning map](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html).

<img src="../images/sklearn-ml-map.png" width=400/>

**Note:** Scikit-Learn refers to machine learning models and algorithms as estimators.

In [None]:
# Random Forest Classifier (for classification problems)
from sklearn.ensemble import RandomForestClassifier
# Instantiating a Random Forest Classifier (clf short for classifier)
clf = RandomForestClassifier()

In [2]:
# Random Forest Regressor (for regression problems)
from sklearn.ensemble import RandomForestRegressor
# Instantiating a Random Forest Regressor
model = RandomForestRegressor()

## 3. Fit the model to the data and make a prediction


In [None]:
# All models/estimators have the fit() function built-in
clf.fit(X_train, y_train)

# Once fit is called, you can make predictions using predict()
y_preds = clf.predict(X_test)

# You can also predict with probabilities
y_probs = clf.predict_proba(X_test)

## 4. Evaluate the model


In [None]:
# All models/estimators have a score() function
clf.score(X_test, y_test)

In [None]:
# Evaluting a model using cross-validation is possible with cross_val_score
from sklearn.model_selection import cross_val_score
cross_val_score(estimator=clf, X, y, scoring=None) # scoring=None means default score() metric is used

# Evaluate a model with a different scoring method
cross_val_score(estimator=clf, X, y, scoring="precision")

In [None]:
# Different classification metrics

# Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_preds)

# Reciver Operating Characteristic (ROC curve)/Area under curve (AUC)
from sklearn.metrics import roc_curve, roc_auc_score
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_probs[:, 1])
roc_auc_score(y_test, y_preds)

# Confusion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_preds)

# Classification report
from sklearn.metrics import classification_report
classification_report(y_test, y_preds)

In [None]:
# Different regression metrics

# R^2 (pronounced r-squared) or coefficient of determination
from sklearn.metrics import r2_score
r2_score(y_test, y_preds)

# Mean absolute error (MAE)
from sklearn.metrics import mean_absolute_errror
mean_absolute_error(y_test, y_preds)

# Mean square error (MSE)
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_preds)

## 5. Improve through experimentation


## 6. Save and reload your trained model


## 7. Putting it all together (not pictured)