Analyzing Machine Learning Models with Yellowbrick
Visualization thus has a critical role to play throughout the analytical process and is a, frankly, a must-have for any effective analysis, for model selection, and for evaluation. This article aims to discuss a diagnostic platform called Yellowbrick that allows data scientists to visualize the entire model selection process to steer us towards better, more explainable models—and avoid pitfalls and traps along the way.
Yellowbrick is an open source, Python project that extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create interactive data explorations.
It extends the scikit-learn API with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the scikit-learn pipeline process, providing visuals throughout the transformation of high-dimensional data.
Yellowbrick isn’t a replacement for other data visualization libraries but helps to achieve the following:
- Model Visualization
- Data visualization for machine learning
- Visual Diagnostics
- Visual Steering
Yellowbrick can either be installed through pip or through conda distribution. For detailed instructions, you may want to refer the documentation.
pip install yellowbrick
conda install -c districtdatalabs yellowbrick
The Yellowbrick API should appear easy if you are familiar with the scikit-learn interface.
The primary interface is a Visualizer – an object that learns from data to produce a visualization. In order to use visualizers, import the visualizer, instantiate it, call the visualizer’s fit() method, and then, in order to render the visualization, call the visualizer’s poof() method, which does the magic!