# Tabular ML End-to-End Demo (Object-Oriented Edition)

This notebook provides a compact walkthrough of the tabular machine learning
workflow using the :class:`TabulaML` utility class. Each stage writes
artefacts to disk so the notebook can focus on orchestration and inspection.


## 1. Environment setup

Import the workflow helper and create an instance that controls where
outputs such as previews, metrics, and trained models are stored.


In [None]:
from pathlib import Path

import pandas as pd

from tabula_ml import TabulaML

OUTPUT_DIR = Path("tmp/demo_run")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

pipeline = TabulaML(output_dir=OUTPUT_DIR)
print(f"Artefacts will be written to: {OUTPUT_DIR.resolve()}")

## 2. Configure the workflow

Update the configuration dictionary to fine-tune preprocessing choices,
model export behaviour, and the target column. The helper returns the merged
configuration for quick inspection.


In [None]:
CONFIG = {
    "target_column": "Survived",
    "test_size": 0.25,
    "scaling": "standard",
    "categorical_encoding": "onehot",
    "outlier_method": None,
    "allow_export": False,
}

pipeline.set_config(**CONFIG)
CONFIG

## 3. Load data and resolve target

Provide a path to a CSV (or dataframe) and let the helper infer or confirm
the target column. The preview CSV is saved to ``OUTPUT_DIR`` for reference.


In [None]:
DATA_PATH = Path("data/sample_datasets/titanic_sample.csv")
raw_df = pipeline.load_data(DATA_PATH, name="Titanic sample")
X, y = pipeline.resolve_target()

print(f"Loaded dataset with shape {raw_df.shape} and target column '{pipeline.config['target_column']}'.")
print("Dropped redundant columns:", pipeline.state["dropped_columns"])

## 4. Explore the dataset

The helper produces summary tables for numeric and categorical variables and
a missingness report. Artefact paths are returned for quick opening.


In [None]:
eda_outputs = pipeline.perform_eda()
eda_outputs

## 5. Optional: detect outliers

Configure the detection strategy via ``CONFIG``. When enabled, the helper
returns a boolean mask that can be used to inspect or filter rows.


In [None]:
outlier_mask = pipeline.detect_outliers()
print(f"Detected {outlier_mask.sum()} potential outliers out of {len(outlier_mask)} rows using method: {pipeline.config['outlier_method']}.")

## 6. Split, train, and evaluate models

The training stage automatically selects estimators suited for the task
(classification for Titanic). Metrics are written to disk and surfaced here.


In [None]:
X_train, X_test, y_train, y_test = pipeline.split_data()

trained_models = pipeline.train_models()
print(f"Trained {[model.name for model in trained_models]}")

evaluated = pipeline.evaluate_models()
{model.name: model.metrics for model in evaluated}

## 7. Summarise the run

Collate the high-level details of the workflow. The JSON file includes
pointers to artefacts for reproducibility.


In [None]:
summary = pipeline.summarise_run()
summary