# Plugins

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rxavier/poniard/blob/master/examples/05._plugins.ipynb)

This notebook explains how Poniard's extensible plugin system works.

If you don't have it installed, please install from PyPI.

In [1]:
# %pip install poniard

Plugins can be thought of as callbacks in other machine learning frameworks (Keras, fastai, PyTorch Lightning, etc.). Generally, they are objects that do things at given points in time, for example, after each training epoch ends. They generally do not modify the existing behavior of the framework, but instead add new functionality.

These objects in Poniard are deliberetely not called callbacks because there is a priori no restriction over what they can do. However, the main concept remains the same.

In essence, plugin hooks are executed at different points during a Poniard estimator's lifetime, for example `self._run_plugin_method("on_setup_end")`. These hooks check which plugins, if any, were set during initialization, and run the corresponding hook for each of them (in this example, `self.on_setup_end()`).

Poniard includes 2 plugins out of the box.
1. Weights and Biases (`WandBPlugin`) logs configurations, metrics, plots and saves model and dataset artifacts.
2. Pandas Profiling (`PandasProfilingPlugin`) analyzes the dataset and generates an HTML report.

Extending Poniard by creating plugins is simple, as all they have to do is subclass `BasePlugin` and implement the hooks they need, leaving the rest as is.

In this instance we'll model the Adult Census dataset, a slightly more complicated classification task than in previous examples, and include both plugins.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

from poniard import PoniardClassifier
from poniard.plugins import WandBPlugin, PandasProfilingPlugin

# Adult Census dataset
X, y = fetch_openml(data_id=1590, return_X_y=True, as_frame=True)
category_cols = X.select_dtypes(include="category").columns
X = X.astype({col: object for col in category_cols})
y = pd.Series(np.where(y == ">50K", 1, 0), name="target")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

pnd = PoniardClassifier(numeric_threshold=80, n_jobs=-1,
                        plugins=[WandBPlugin(project="adult-demo"),
                                 PandasProfilingPlugin()])
pnd.setup(X_train, y_train)


Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mrxavier[0m. Use [1m`wandb login --relogin`[0m to force relogin


Summarize dataset: 100%|██████████| 22/22 [00:00<00:00, 187.72it/s, Completed]                      
Generate report structure: 100%|██████████| 1/1 [00:01<00:00,  1.37s/it]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  4.49it/s]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 996.98it/s]


Target info
-----------
Type: binary
Shape: (39073,)
Unique values: 2

Main metric
-----------
roc_auc

Thresholds
----------
Minimum unique values to consider a feature numeric: 80
Minimum unique values to consider a categorical high cardinality: 20

Inferred feature types
----------------------




Unnamed: 0,numeric,categorical_high,categorical_low,datetime
0,fnlwgt,age,education-num,
1,capital-gain,native-country,workclass,
2,capital-loss,,education,
3,hours-per-week,,marital-status,
4,,,occupation,
5,,,relationship,
6,,,race,
7,,,sex,






PoniardClassifier(estimators=None, metrics=['roc_auc', 'accuracy', 'precision', 'recall', 'f1'],
    preprocess=True, scaler=standard, numeric_imputer=simple,
    custom_preprocessor=None, numeric_threshold=80,
    cardinality_threshold=20, cv=StratifiedKFold(n_splits=5, random_state=0, shuffle=True), verbose=0,
    random_state=0, n_jobs=-1, plugins=[<poniard.plugins.wandb.WandBPlugin object at 0x147a76940>, <poniard.plugins.pandas_profiling.PandasProfilingPlugin object at 0x280b50eb0>],
    plot_options=PoniardPlotFactory())
            

As can be seen, Pandas Profiling already created a report and saved it to the default location. If `ipywidgets` is installed, the report will be included in the output.

Meanwhile, Weights and Biases either logged in or prompted for a login, and started logging information about the run (preprocessor HTML representation, dataset, inferred types). Also, because plugins can check whether other plugins are included in a Poniard estimator (by using `self._check_plugin_used()`), wandb also uploaded the profile report.

In [2]:
pnd.fit()
pnd.get_results()

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Completed: 100%|██████████| 9/9 [12:06<00:00, 80.75s/it]      


Unnamed: 0,test_roc_auc,test_accuracy,test_precision,test_recall,test_f1,fit_time,score_time
HistGradientBoostingClassifier,0.9256,0.871292,0.779046,0.645203,0.705791,2.544205,0.21196
XGBClassifier,0.924543,0.869194,0.770386,0.646058,0.702701,10.353929,0.138027
LogisticRegression,0.908138,0.85458,0.742586,0.600493,0.664002,1.524592,0.130031
SVC,0.905645,0.853147,0.74676,0.584555,0.65575,432.375797,8.761903
RandomForestClassifier,0.899086,0.851125,0.723473,0.611617,0.662846,154.188252,0.289342
KNeighborsClassifier,0.848205,0.827733,0.661283,0.575035,0.615041,0.433059,3.694473
GaussianNB,0.831312,0.598726,0.367244,0.93625,0.52755,0.310526,0.097686
DecisionTreeClassifier,0.744354,0.811583,0.604411,0.615467,0.609818,44.375937,0.102937
DummyClassifier,0.5,0.76073,0.0,0.0,0.0,0.324102,0.110943


After training has finished, WandB logs one plot per metric. However, we'll go ahead and run some additional plots.

In [3]:
pnd.plot.metrics(kind="strip", metrics=["accuracy", "roc_auc"])

Every plot produced by the PoniardPlotFactory will be logged to the wandb project and will remain an interactive Plotly plot.

In [4]:
candidates = ["XGBClassifier", "HistGradientBoostingClassifier",
              "LogisticRegression"]
pnd.plot.roc_curve(estimator_names=candidates)

Completed: 100%|██████████| 1/1 [00:09<00:00,  9.53s/it]
Completed: 100%|██████████| 1/1 [00:02<00:00,  2.98s/it]             
Completed: 100%|██████████| 1/1 [00:01<00:00,  1.77s/it] 


In [5]:
pnd.plot.confusion_matrix(estimator_name="XGBClassifier")

Completed: 100%|██████████| 1/1 [00:09<00:00,  9.89s/it]


In [6]:
pnd.plot.permutation_importance(estimator_name="XGBClassifier")

In [7]:
pnd.plot.partial_dependence(estimator_name="XGBClassifier", feature="marital-status")

In [8]:
pnd.plot.partial_dependence(estimator_name="XGBClassifier", feature="age")

In [9]:
xgb = pnd.get_estimator("XGBClassifier")
xgb.fit(X_train, y_train)
xgb.score(X_test, y_test)

0.8764459002968574