# Deepchecks tutorial
[Deepchecks](https://docs.deepchecks.com/stable/getting-started/welcome.html) is a Python library for validating ML data and models. In the example below, you'll see how to use Deepchecks to evaluate a trained model before deploying the model to production (offline model evaluation). 

## Deepchecks example
Similar to the MLflow tutorial, we'll also train a ElasticNet model for red wine quality prediction in this example. 

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

from deepchecks.tabular.checks import TrainTestPerformance, DatasetsSizeComparison
from deepchecks.tabular import Dataset, Suite



Let's first train a model.

In [2]:
# Train a red wine model

csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    

data = pd.read_csv(csv_url, sep=";")

# Split the data into training and test sets. (0.6, 0.4) split.
train, test = train_test_split(data, test_size=0.4, random_state=42)
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

# Just use hard-coded parameters
alpha = 0.5
l1_ratio = 0.5

lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

Next, we use Deepchecks to evaluate how the trained model performs on both training and testing dataset. Specifically, we'll evaluate the model's mean absolute error (MAE) and root mean square error (RMSE) on the training and testing dataset. 

As per the Deepchecks documentation, Deepchecks follows the convention that greater metric value represent better performance. Therefore, it is recommended to only use metrics that follow this convention, for example, negative MAE instead of MAE.

In [3]:
# Convert the training and testing dataset into a format that Deepchecks can handle
train_dataset = Dataset(df=train_x, label=train_y, cat_features=[])
test_dataset = Dataset(df=test_x, label=test_y, cat_features=[])

# Define a test, the "scorers" parameter specifies the metrics (negative MAE and negative RMSE) to be used in the test
check = TrainTestPerformance(scorers=["neg_mae", "neg_rmse"])
result = check.run(train_dataset, test_dataset, lr)
result.show()

VBox(children=(HTML(value='<h4><b>Train Test Performance</b></h4>'), HTML(value='<p>Summarize given model perf…

Expected output:

<img src="../images/deepchecks-single-test.png" width=800>

Suppose we have a baseline model that produces MAE and RMSE of 0.9 on the same testing dataset, we may draw two conclusions from the evaluation result:
1) This model seems to be able to make better prediction than the baseline model.
2) The MAE and RMSE don't seem to drop significantly on testing dataset, which can be seen as a good signal: the model isn't overfitting the training dataset. 

Besides a single test, Deepchecks provides a convenient way of grouping multiple tests into a collection, namely a Suite. In the example below, we specify a Suite containing two tests: 1) the familiar MAE and RMSE tests on both training and testing dataset, 2) comparing the size of testing dataset against the training dataset.

In [4]:
# Specify a suite
suite = Suite("Example test suite", 
    TrainTestPerformance(scorers=["neg_mae", "neg_rmse"]),
    DatasetsSizeComparison()
    # More tests can be added here
)
result = suite.run(model=lr, train_dataset=train_dataset, test_dataset=test_dataset)
result.show()

Accordion(children=(VBox(children=(HTML(value='\n<h1 id="summary_3LY9JSSPAZH9L53JHCNBL4S69">Example test suite…

Expected output:

<img src="../images/deepchecks-suite.png" width=800>

If you click the "Other" part in the output result above, you can first see the same result of MAE and RMSE you got when running the second-to-last code cell. Moreover, you can also see the results of dataset size comparison between the training and testing dataset: 

<img src="../images/deepchecks-suite-dataset-size.png" width=500>

The "Didn't pass" and "Passed" parts are empty because we haven't specify [conditions](https://docs.deepchecks.com/stable/general/usage/customizations/auto_examples/plot_configure_check_conditions.html) that determine whether a test succeeds or not. You'll explore more about the test conditions in this week's assignments. 