# Plotting

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rxavier/poniard/blob/master/examples/04._plotting.ipynb)

This notebook demonstrates Poniard's plotting capabilities.

If you don't have it installed, please install from PyPI.

In [1]:
# %pip install poniard

Poniard uses Plotly as a backend to plot metrics.

All plots are available at the `plot` accessor.

In [2]:
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from poniard import PoniardClassifier

X, y = fetch_openml("irish", return_X_y=True, as_frame=True)
y = LabelEncoder().fit_transform(y)
clf = PoniardClassifier(
    estimators=[LogisticRegression(), KNeighborsClassifier(), RandomForestClassifier()],
    numeric_threshold=0.05,
).setup(X, y)
clf.fit()
clf.get_results()

Target info
-----------
Type: binary
Shape: (500,)
Unique values: 2

Main metric
-----------
roc_auc

Thresholds
----------
Minimum unique values to consider a feature numeric: 25
Minimum unique values to consider a categorical high cardinality: 20

Inferred feature types
----------------------


Unnamed: 0,numeric,categorical_high,categorical_low,datetime
0,DVRT,,Sex,
1,Prestige_score,,Educational_level,
2,,,Type_school,






Completed: 100%|██████████| 4/4 [00:01<00:00,  3.52it/s]             


Unnamed: 0,test_roc_auc,test_accuracy,test_precision,test_recall,test_f1,fit_time,score_time
KNeighborsClassifier,0.99055,0.97,0.948261,0.986465,0.966878,0.005216,0.06991
RandomForestClassifier,0.989532,0.986,0.973816,0.995556,0.984419,0.064198,0.011983
LogisticRegression,0.987589,0.986,0.973623,0.995455,0.984369,0.009759,0.008593
DummyClassifier,0.5,0.556,0.0,0.0,0.0,0.004146,0.005129


## Plots based on the fit process.

Metrics obtained directly from a call to `fit()` can be visualized with `plot.metrics()`.

This method can plot multiple metrics, showing fold averages and error bars, or a scatter plot with the score for each fold and the mean score.

In [3]:
clf.plot.metrics(metrics=["roc_auc", "f1"])

In [4]:
clf.plot.metrics(metrics=["recall", "precision"], kind="bar")

`plot.overfitness()` shows a measure of how much each estimator is overfitting when comparing test and train scores.

In [5]:
clf.plot.overfitness(metric="f1")

## Based on cross validated predictions

Some of the plots require that predictions are obtained for the relevant estimators. These methods will automatically run the predictions using scikit-learn's `cross_val_predict()`.

Each of these will add keys to the cross validation results dict created when calling `fit()` which will be used for the plots that require them.

In [6]:
clf.plot.confusion_matrix("KNeighborsClassifier")

Completed: 100%|██████████| 1/1 [00:00<00:00, 19.09it/s]   


In [7]:
from sklearn.datasets import load_breast_cancer
from poniard import PoniardClassifier

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
clf = PoniardClassifier().setup(X, y)
clf.fit()
selected_clf = ["LogisticRegression", "XGBClassifier"]
clf.plot.roc_curve(estimator_names=selected_clf)

Target info
-----------
Type: binary
Shape: (569,)
Unique values: 2

Main metric
-----------
roc_auc

Thresholds
----------
Minimum unique values to consider a feature numeric: 56
Minimum unique values to consider a categorical high cardinality: 20

Inferred feature types
----------------------


Unnamed: 0,numeric,categorical_high,categorical_low,datetime
0,mean radius,,,
1,mean texture,,,
2,mean perimeter,,,
3,mean area,,,
4,mean smoothness,,,
5,mean compactness,,,
6,mean concavity,,,
7,mean concave points,,,
8,mean symmetry,,,
9,mean fractal dimension,,,






Completed: 100%|██████████| 9/9 [00:14<00:00,  1.57s/it]                     
Completed: 100%|██████████| 1/1 [00:00<00:00,  1.42it/s] 
Completed: 100%|██████████| 1/1 [00:00<00:00,  2.33it/s]


In [8]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from poniard import PoniardRegressor

X, y = load_diabetes(return_X_y=True, as_frame=True)
reg = PoniardRegressor(
    estimators=[LinearRegression(), RandomForestRegressor(), KNeighborsRegressor()]
).setup(X, y)
reg.fit()
selected_reg = ["LinearRegression", "KNeighborsRegressor", "RandomForestRegressor"]
reg.plot.residuals(estimator_names=selected_reg)

Target info
-----------
Type: continuous
Shape: (442,)
Unique values: 214

Main metric
-----------
neg_mean_squared_error

Thresholds
----------
Minimum unique values to consider a feature numeric: 44
Minimum unique values to consider a categorical high cardinality: 20

Inferred feature types
----------------------


Unnamed: 0,numeric,categorical_high,categorical_low,datetime
0,age,,sex,
1,bmi,,,
2,bp,,,
3,s1,,,
4,s2,,,
5,s3,,,
6,s4,,,
7,s5,,,
8,s6,,,






Completed: 100%|██████████| 4/4 [00:00<00:00,  4.43it/s]            
Completed: 100%|██████████| 1/1 [00:00<00:00, 19.02it/s]
Completed: 100%|██████████| 1/1 [00:00<00:00, 11.70it/s]  
Completed: 100%|██████████| 1/1 [00:01<00:00,  1.04s/it]    


In [9]:
reg.plot.residuals_histogram(estimator_names=selected_reg)

## Visualizations including computations

Some visualizations will make additional computations apart from predictions as part of their methods.

`plot.permutation_importance()` takes an estimator name and outputs each feature's permutation importance.

In [10]:
from sklearn.datasets import load_breast_cancer
from sklearn.neighbors import KNeighborsClassifier
from poniard import PoniardClassifier

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
clf = PoniardClassifier(estimators=[KNeighborsClassifier()], metrics="f1").setup(X, y)
clf.fit()
clf.plot.permutation_importance("KNeighborsClassifier")

Target info
-----------
Type: binary
Shape: (569,)
Unique values: 2

Main metric
-----------
f1

Thresholds
----------
Minimum unique values to consider a feature numeric: 56
Minimum unique values to consider a categorical high cardinality: 20

Inferred feature types
----------------------


Unnamed: 0,numeric,categorical_high,categorical_low,datetime
0,mean radius,,,
1,mean texture,,,
2,mean perimeter,,,
3,mean area,,,
4,mean smoothness,,,
5,mean compactness,,,
6,mean concavity,,,
7,mean concave points,,,
8,mean symmetry,,,
9,mean fractal dimension,,,






Completed: 100%|██████████| 2/2 [00:00<00:00,  7.87it/s]           


`plot.partial_dependence()` shows the average feature partial dependence for a given feature.

In [11]:
clf.plot.partial_dependence(
    estimator_name="KNeighborsClassifier", feature="worst texture"
)