Framework that helps to train models, compare them and track parameters&metrics along the way. Works with tabular data only.
pip install trava
While working on a project, we often experiment with different models looking at the same metrics. For example, we log those that can be represented as a single number, however some of them require graphs to make sense. It's also useful to save those metrics somewhere for future analysis, the list can go on.
# in this case, sk and sk_proba are just wrappers around sklearn's metrics
# but you can use any metric implementation you want
scorers = [
sk_proba(log_loss),
sk_proba(roc_auc_score),
sk(recall_score),
sk(precision_score),
]
# let's log the metrics
logger_handler = LoggerHandler(scorers=scorers)
trava = TravaSV(results_handlers=[logger_handler])
# prepare your data
X_train, X_test, y_train, y_test = ...
split_result = SplitResult(X_train=X_train,
y_train=y_train,
X_test=X_test,
y_test=y_test)
trava.fit_predict(raw_split_data=split_result,
model_type=GaussianNB, # we pass model class and parameters separately
model_init_params={}, # to be able to track them properly
model_id='gnb') # just a unique identifier for this model
fit_predict call does roughly the same as:
gnb = GaussianNB()
gnb.fit(split_result.X_train, split_result.y_train)
gnb.predict(split_result.X_test)
But now you don't care how the metrics you declared are calculated. You just get them in your console! Btw, those metrics definitely need to be improved. :]
Model evaluation nb
* Results for gnb model *
Train metrics:
log_loss:
16.755867191506482
roc_auc_score:
0.7746522424770221
recall_score:
0.10468384074941452
precision_score:
0.9122448979591836
Test metrics:
log_loss:
16.94514025416013
roc_auc_score:
0.829444814485889
recall_score:
0.026041666666666668
precision_score:
0.7692307692307693
After training multiple models you can get all metrics for all models by calling.
trava.results
Get the full picture and more examples by going through the guide notebook!
- LoggerHandler - logs metrics
- PlotHandler - plots metrics
- MetricsDictHandler - returns all metrics wrapped in a dict
Experiments tracking is a must in Data Science, so you shouldn't neglect that. You may integrate any tracking framework in Trava! Trava comes with MLFlow tracker ready-to-go. It can autotrack:
- model's parameters
- any metric
- plots
- serialized models
# get tracker's instance
tracker = MLFlowTracker(scorers=scorers)
# initialize Trava with it
trava = TravaSV(tracker=tracker)
# fit your model as before
trava.fit_predict(raw_split_data=split_result,
model_type=GaussianNB,
model_id='gnb')
Done. All model parameters and metrics are now tracked! Also supported tracking of:
- cross-validation case with nested tracking
- eval results for common boosting libraries ( XGBoost, LightGBM, CatBoost )
Checkout a detailed notebooks how to track metrics & parameters and plots & serialized models.
- highly customizable training & evaluation processes ( see trava.fit_predictor.py.FitPredictor class and its subclasses )
- built-in train/test/validation split logic
- common boosting libraries extensions ( for early-stopping with validation sets )
- tracks model parameters, metrics, plots, serialized models. it's easy to integrate any tracking framework of your choice
- you are also able to evaluate metrics after fit_predict call, if you forgot to add some metric
- you are able to evaluate metrics even when your data and even a trained model are already unloaded ( depends on a metric used, true most of the times )
- now only supervised learning problems are supported yet there is a potential to extend it to support unsupervised problems
- unit-tested
- I use it every day for my needs thus I care about the quality and reliability
Only sklearn-style model are supported for the time being. ( it uses fit, predict, predict_proba methods )
pandas
numpy
python 3.7
It's also convenient to use the lib with sklearn ( e.g. for taking metrics from there. ). Also couple of extensions are based on sklearn classes.