# Quickstart for Classification Models

This notebooks provides a quick introduction to documenting a model using the ValidMind developer framework. We will use sample datasets provided by the library and train a simple classification model.

## Initialize ValidMind

In [1]:
%load_ext dotenv
%dotenv dev.env
%matplotlib inline

import validmind as vm
import xgboost as xgb

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  project = "clhmm220k0006558hv20npy8z"
)

Connected to ValidMind. Project: Customer Churn Model - Initial Validation (clhmm220k0006558hv20npy8z)


## Load the Demo Dataset

In [2]:
# You can also import taiwan_credit like this:
# from validmind.datasets.classification import taiwan_credit as demo_dataset
from validmind.datasets.classification import customer_churn as demo_dataset

df = demo_dataset.load_data()

In [3]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels
)

Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...


## Run the Data Validation Test Plan

In [4]:
tabular_suite = vm.run_test_suite("tabular_dataset", dataset=vm_dataset)

HBox(children=(Label(value='Running test suite...'), IntProgress(value=0, max=24)))

VBox(children=(HTML(value='<h2>Test Suite Results: <i style="color: #DE257E">Tabular Dataset</i></h2><hr>'), H…

## Run the Model Validation Test Plan

We will need to preprocess the dataset and produce the training, test and validation splits first.

### Prepocess the Raw Dataset

In [None]:
train_df, validation_df, test_df = demo_dataset.preprocess(df)

In [None]:
x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

### Run the Binary Classification Test Plan

In [None]:
model_suite = vm.run_test_suite("binary_classifier_model_validation", model=vm_model)