A generic, reusable PyTorch Lightning pipeline for training classification models on tabular data. This package provides a fully config-driven framework that can be used for any classification task by simply providing a YAML configuration file.
- 🚀 Fully Config-Driven: All settings (features, hyperparameters, paths) controlled via YAML files
- 🔄 Generic & Reusable: Use the same codebase for any classification task (stress levels, sentiment, quality ratings, etc.)
- 🤖 Auto-Dimension Detection: Automatically calculates input dimensions and number of classes from feature lists and target column
- 📊 Categorical Target Support: Automatically handles both integer and categorical string targets (e.g., "good", "better", "best" or "yes", "no")
- 🎯 Production-Ready: Exports models to ONNX format with preprocessors and label encoders for easy deployment
- ⚡ PyTorch Lightning: Built on PyTorch Lightning for scalable, professional ML training
- 📈 Comprehensive Metrics: Tracks Accuracy, F1-Score, Precision, and Recall (macro-averaged)
Install from PyPI:
pip install cph-classificationOr install from source:
git clone https://github.com/imchandra11/cph-classification.git
cd cph-classification
pip install .pip install cph-classificationCreate a CSV file with your features and target column. For example, data/myproject.csv:
feature1,feature2,target
value1,123.45,class_a
value2,234.56,class_b
...Create a YAML configuration file, e.g., configs/myproject.yaml:
# My Classification Project Configuration
seed_everything: true
trainer:
callbacks:
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: "{epoch}-{val_loss:.2f}.best"
monitor: "val_loss"
mode: "min"
save_top_k: 1
- class_path: cph_classification.classification.callbacks.ONNXExportCallback
init_args:
output_dir: "models"
model_name: "my_model"
input_dim: null # Auto-detected
logger:
class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: "lightning_logs"
name: "MyProjectTraining"
max_epochs: 30
accelerator: auto
devices: auto
precision: 16-mixed
model:
class_path: cph_classification.classification.modelmodule.ModelModuleCLS
init_args:
lr: 0.0001
model:
class_path: cph_classification.classification.modelfactory.ClassificationModel
init_args:
input_dim: 0 # Auto-set from datamodule
num_classes: 0 # Auto-set from datamodule
hidden_layers: [128, 64, 32]
dropout_rates: [0.15, 0.1, 0.05]
activation: "relu"
optimizer:
class_path: torch.optim.Adam
init_args:
lr: 0.001
weight_decay: 0.00001
data:
class_path: cph_classification.classification.datamodule.DataModuleCLS
init_args:
csv_path: "data/myproject.csv"
batch_size: 256
num_workers: 0
val_split: 0.2
random_seed: 42
categorical_cols:
- feature1
numeric_cols:
- feature2
target_col: "target" # Can be integers or categorical strings
save_preprocessor: true
preprocessor_path: "models/preprocessor.joblib"
fit:
ckpt_path: null # Set to checkpoint path for resume training
test:
ckpt_path: best # Use "best" or "last" checkpointTrain your model with a single command:
# Train and test (fit+test workflow)
cph-classification --config configs/myproject.yaml
# Or use standard Lightning CLI subcommands
cph-classification fit --config configs/myproject.yaml
cph-classification test --config configs/myproject.yamlThat's it! The model will be trained and saved to the path specified in your config file.
Key Parameters:
csv_path: Path to your CSV filebatch_size: Batch size for training (default: 256)val_split: Validation split ratio (0.0 to 1.0, default: 0.2)categorical_cols: List of categorical feature column namesnumeric_cols: List of numeric feature column namestarget_col: Name of the target column to predict (can be integers or strings)preprocessor_path: Where to save/load the preprocessor
Preprocessing:
- Categorical columns: Automatically one-hot encoded (with
drop='first') - Numeric columns: Automatically standardized using StandardScaler
- Target column:
- If integers: Used as-is (converted to 0-indexed if needed)
- If strings: Automatically encoded to 0-indexed integers using LabelEncoder
Key Parameters:
hidden_layers: List of hidden layer sizes, e.g.,[128, 64, 32]dropout_rates: List of dropout rates matching hidden layers, e.g.,[0.15, 0.1, 0.05]activation: Activation function ("relu","tanh","gelu","sigmoid","leaky_relu","elu")input_dim: Automatically set from datamodule (set to0in config)num_classes: Automatically set from datamodule (set to0in config)
After training, you'll find:
-
Models Directory (
models/):my_model.onnx: ONNX model for inferencepreprocessor.joblib: Fitted preprocessor for data transformationlabel_encoder.joblib: Label encoder (only if target was categorical strings)
-
Checkpoints (
lightning_logs/MyProjectTraining/version_X/checkpoints/):epoch-X-val_loss=Y.best.ckpt: Best model checkpoint (based on validation loss)epoch-X.last.ckpt: Last epoch checkpoint
-
Training Logs (
lightning_logs/):- TensorBoard logs for visualization
After training, use the exported ONNX model for predictions:
import joblib
import onnxruntime as ort
import numpy as np
import pandas as pd
# Load preprocessor
preprocessor = joblib.load("models/preprocessor.joblib")
# Load label encoder (if target was categorical strings)
label_encoder = joblib.load("models/label_encoder.joblib") # Optional
# Load ONNX model
session = ort.InferenceSession("models/my_model.onnx")
# Prepare input data
input_data = pd.DataFrame({
'feature1': ['value1'],
'feature2': [123.45],
})
# Transform data
feature_cols = ['feature1', 'feature2']
transformed = preprocessor.transform(input_data[feature_cols])
# Predict
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: transformed.astype(np.float32)})
predicted_class_idx = np.argmax(output[0][0])
# Decode back to original label (if label encoder exists)
if label_encoder:
predicted_class = label_encoder.inverse_transform([predicted_class_idx])[0]
print(f"Predicted class: {predicted_class}")
else:
print(f"Predicted class index: {predicted_class_idx}")tensorboard --logdir lightning_logsThen open http://localhost:6006 in your browser.
Metrics Tracked:
train_loss,val_loss,test_loss: CrossEntropyLosstrain_acc,val_acc,test_acc: Accuracy (macro-averaged)train_f1,val_f1,test_f1: F1-Score (macro-averaged)train_precision,val_precision,test_precision: Precision (macro-averaged)train_recall,val_recall,test_recall: Recall (macro-averaged)
If your target column contains integers (e.g., 1, 2, 3, 4, 5):
data:
init_args:
target_col: "stress_level" # Contains: 1, 2, 3, 4, 5The pipeline will automatically convert to 0-indexed labels if needed (0, 1, 2, 3, 4).
If your target column contains categorical strings (e.g., "low", "medium", "high"):
data:
init_args:
target_col: "quality" # Contains: "low", "medium", "high"The pipeline will automatically encode to integers (0, 1, 2) and save the label encoder for inference.
You can use multiple config files for different environments:
# Main config + local overrides
cph-classification --config configs/myproject.yaml --config configs/myproject.local.yamlThe local config will override values from the main config.
cph-classification fit \
--config configs/myproject.yaml \
--fit.ckpt_path "lightning_logs/MyProjectTraining/version_0/checkpoints/epoch-10.last.ckpt"Override hyperparameters via command line or config files:
# myproject.local.yaml
model:
init_args:
lr: 0.0005
data:
init_args:
batch_size: 512model:
init_args:
model:
init_args:
hidden_layers: [256, 128, 64, 32] # Deeper network
dropout_rates: [0.2, 0.15, 0.1, 0.05]
activation: "gelu"- Python >= 3.8
- PyTorch >= 2.0.0
- PyTorch Lightning >= 2.1.0
- scikit-learn >= 1.3.0
- Other dependencies are automatically installed with the package
MIT License - see LICENSE file for details.
chandra
- Email: chandra385123@gmail.com
- GitHub: @imchandra11
- GitHub: https://github.com/imchandra11/cph-classification
- PyPI: https://pypi.org/project/cph-classification/
Contributions are welcome! Please feel free to submit a Pull Request.
For issues or questions:
- Check the configuration file syntax
- Verify CSV file format and column names
- Check target column type (integers or categorical strings)
- Review TensorBoard logs for training insights
- Open an issue on GitHub
If you use this package in your research, please cite:
@software{cph_classification,
title = {cph-classification: A Generic PyTorch Lightning Pipeline for Classification},
author = {chandra},
year = {2025},
url = {https://github.com/imchandra11/cph-classification}
}