# Understanding hvPlot's Statistical Plot Types

hvPlot provides several statistical plotting functions that go beyond basic charts. Each plot type reveals different aspects of your data and has specific strengths and limitations. This guide explains when and why to use each type.

## Load sample data for examples

In [None]:
import hvplot.pandas  # noqa
from sklearn.preprocessing import StandardScaler
import pandas as pd


penguins = hvplot.sampledata.penguins("pandas").dropna()
stocks = hvplot.sampledata.stocks("pandas")

# Prepare data for multivariate examples
num_cols = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']
penguins_subset = penguins[['species'] + num_cols]#.sample(100, random_state=42)

# Normalized version for some plots
scaler = StandardScaler()
scaled_features = scaler.fit_transform(penguins_subset[num_cols])
penguins_scaled = pd.DataFrame(scaled_features, columns=num_cols)
penguins_scaled['species'] = penguins_subset['species'].values

## Distribution Analysis

Understanding the distribution of your data is fundamental to statistical analysis. hvPlot provides several plot types that reveal different aspects of data distributions:

### Histograms

**What it shows:** Frequency distribution of values in a single variable

**Strengths:**
- Clear visualization of data distribution shape
- Easy to identify skewness, modality, and outliers
- Familiar and intuitive for most users
- Customizable bin sizes for different levels of detail

**Best for:** Understanding the overall shape and spread of a single variable, identifying distribution patterns

**Limitations:** Can be sensitive to bin size choices; doesn't show relationships between variables

In [None]:
# Example: Histogram showing distribution shape

penguins.hvplot.hist(y='body_mass_g', by='species', alpha=0.6, bins=20)

Notice how each species shows a different distribution shape: Adelie penguins have a wider spread and lower average body mass, while Gentoo penguins are clearly heavier with less overlap with the other species.

:::{seealso}
See the reference guide for [Histograms](../ref/api/manual/hvplot.hvPlot.hist.ipynb)
:::

### Box Plots

**What it shows:** Five-number summary (minimum, Q1, median, Q3, maximum) plus outliers

**Strengths:**
- Compact summary of distribution characteristics
- Excellent for comparing distributions across groups
- Clearly identifies outliers and quartile ranges
- Robust to extreme values

**Best for:** Comparing distributions between groups, identifying outliers, understanding data spread and central tendency

**Limitations:** Hides detailed distribution shape; can miss bimodal or complex distributions

In [None]:
# Example: Box plot comparing distributions across groups
penguins.hvplot.box(y='flipper_length_mm', by='species')

The box plots provide a compact summary showing that Gentoo penguins have notably longer flippers with less variability, while Adelie penguins show the shortest flipper lengths. The boxes show quartiles, and any points beyond the whiskers would indicate outliers.

:::{seealso}
See the reference guide for [Box plots](../ref/api/manual/hvplot.hvPlot.box.ipynb)
:::

### Violin Plots

**What it shows:** Combination of box plot information with kernel density estimation

**Strengths:**
- Shows both summary statistics and distribution shape
- Reveals multimodal distributions that box plots miss
- Good for comparing complex distributions across groups
- More informative than box plots for understanding distribution shape

**Best for:** Comparing detailed distribution shapes across groups, when you need both summary statistics and distribution density

**Limitations:** Can be more complex to interpret; kernel density estimation may smooth over important details

In [None]:
# Example: Violin plot showing detailed distribution shapes
penguins.hvplot.violin(y='bill_length_mm', by='species')

The violin plots reveal the full distribution shape within each group. Notice how Chinstrap penguins show a slightly bimodal distribution in bill length, while Gentoo and Adelie show more symmetric, unimodal distributions. The white dot shows the median, and the thick black bar represents the interquartile range.

:::{seealso}
See the reference guide for [Violin plots](../ref/api/manual/hvplot.hvPlot.violin.ipynb)
:::

### Heatmaps

**What it shows:** Matrix of values represented as colors, often used for correlation matrices or 2D binned data

**Strengths:**
- Excellent for visualizing correlation matrices
- Clear representation of patterns in 2D gridded data
- Good for showing relationships across many variable pairs simultaneously
- Effective for identifying clusters and patterns in matrix data

**Best for:** Visualizing correlation matrices, 2D binned data, confusion matrices, or any matrix-structured data

**Limitations:** Requires gridded or matrix-structured data; can lose individual data point information

In [None]:
# Example: Heatmap showing correlation matrix
correlation_matrix = penguins[num_cols].corr()
correlation_matrix.hvplot.heatmap(cmap='coolwarm')

The heatmap reveals strong positive correlations (darker red) between flipper length and body mass, and between bill length and bill depth. These relationships suggest that larger penguins tend to have proportionally larger features overall.

:::{seealso}
See the reference guide for [Heatmaps](../ref/api/manual/hvplot.hvPlot.heatmap.ipynb)
:::

### KDE (Kernel Density Estimation) Plots

**What it shows:** Smooth density estimation of data distribution using kernel functions

**Strengths:**
- Provides smooth, continuous representation of data density
- Good for overlaying multiple distributions for comparison
- Less sensitive to bin choices than histograms
- Effective for showing distribution shape and identifying modes

**Best for:** Comparing multiple distributions, showing smooth density estimates, identifying distribution modes

**Limitations:** Bandwidth selection can affect results; may smooth over important details; computationally more expensive than histograms

In [None]:
# Example: KDE plot comparing smooth density distributions
penguins.hvplot.kde(y='body_mass_g', by='species', alpha=0.6)

The smooth KDE curves make it easy to compare distribution shapes across species. Note how Gentoo penguins show a distinct peak at higher body mass values, while Adelie and Chinstrap distributions overlap more significantly.

:::{seealso}
See the reference guide for [KDE plots](../ref/api/manual/hvplot.hvPlot.kde.ipynb)
:::

## Multivariate Data Visualization

When working with datasets containing multiple variables, understanding relationships between all dimensions becomes challenging. hvPlot offers three complementary approaches:

### Scatter Matrix

**What it shows:** All pairwise relationships between numeric variables

**Strengths:**
- Provides quantitative insights into correlations
- Interactive linking allows exploration across all variable pairs
- Familiar scatter plot format is easy to interpret

**Best for:** Identifying correlations, outliers, and clustering patterns between variable pairs

**Limitations:** Can become cluttered with many variables; doesn't show patterns across all dimensions simultaneously

In [None]:
# Example: Scatter matrix showing pairwise relationships
hvplot.scatter_matrix(penguins_subset, c="species", alpha=0.6)

The scatter matrix shows that Gentoo penguins (orange) form distinct clusters in most variable pairs, particularly visible in flipper length vs body mass. The diagonal histograms reveal the distribution of each individual variable.

:::{seealso}
See the reference guide for [Scatter Matrix](../ref/api/manual/hvplot.plotting.scatter_matrix.ipynb)
:::

### Parallel Coordinates

**What it shows:** Patterns and relationships across all variables simultaneously

**Strengths:**
- Reveals patterns across all dimensions at once
- Excellent for identifying distinct groups or classes
- Shows which variables contribute most to group differences

**Best for:** Comparing groups across multiple dimensions, identifying which variables distinguish different classes

**Limitations:** Can be difficult to read with many observations; requires some practice to interpret effectively

In [None]:
# Example: Parallel coordinates showing patterns across all dimensions
hvplot.parallel_coordinates(penguins_scaled, "species", alpha=0.7)

The parallel coordinates plot reveals that Gentoo penguins consistently have higher values across most features (especially flipper length and body mass), while Adelie and Chinstrap show more similar patterns with some overlap.

:::{seealso}
See the reference guide for [Parallel Coordinates](../ref/api/manual/hvplot.plotting.parallel_coordinates.ipynb)
:::

### Andrews Curves

**What it shows:** Aggregate differences between classes using Fourier series representation

**Strengths:**
- Smooth curves make group differences visually apparent
- Good for showing overall class separation
- Less cluttered than parallel coordinates with many observations

**Best for:** Visualizing overall differences between classes when you care more about separation than specific variable contributions

**Limitations:** Provides less quantitative insight into which specific features drive differences; mathematical transformation makes individual variable contributions less interpretable

In [None]:
# Example: Andrews curves showing class separation
hvplot.andrews_curves(penguins_scaled, "species", samples=30)

The Andrews curves transform the multi-dimensional data into smooth periodic functions. Notice how Gentoo penguins form a distinct curve pattern that's clearly separated from the other two species, confirming their distinctiveness across multiple dimensions.

:::{seealso}
See the reference guide for [Andrews Curves](../ref/api/manual/hvplot.plotting.andrews_curves.ipynb)
:::

## Bivariate Analysis

Understanding relationships between pairs of variables requires specialized visualization approaches. hvPlot provides several methods for bivariate exploration:

### Bivariate Plots

**What it shows:** Joint distribution and relationship between two continuous variables

**Strengths:**
- Combines scatter plot with marginal distributions
- Shows both individual variable distributions and their relationship
- Excellent for understanding correlation patterns and outliers
- Provides comprehensive view of two-variable relationships

**Best for:** Exploring relationships between two continuous variables, understanding joint distributions

**Limitations:** Limited to two variables at a time; can become cluttered with many data points

In [None]:
# Example: Bivariate plot showing joint distribution
penguins.hvplot.bivariate('bill_length_mm', 'flipper_length_mm', by='species')

The bivariate plot combines scatter plots with marginal histograms, showing both the relationship between bill length and flipper length and the individual distributions. The clear clustering by species in the main plot confirms these measurements are good discriminators.

:::{seealso}
See the reference guide for [Bivariate plots](../ref/api/manual/hvplot.hvPlot.bivariate.ipynb)
:::

## Time Series Analysis

### Lag Plots

**What it shows:** Relationship between current values and values at a previous time point

**Strengths:**
- Reveals autocorrelation patterns in time series
- Identifies volatility and stability in temporal data
- Helps detect seasonal or cyclical patterns

**Best for:** Understanding temporal dependencies, comparing volatility between different time series, detecting autocorrelation

**Key insight:** Tight clustering around the diagonal indicates stable, predictable behavior; scattered points indicate high volatility or weak temporal correlation

In [None]:
# Example: Lag plot comparing stock volatility
stock_subset = stocks[['Apple', 'Microsoft']].iloc[:200]  # Subset for clarity
hvplot.lag_plot(stock_subset, lag=30, alpha=0.6)

The lag plot shows the relationship between stock prices and their values 30 days earlier. Points scattered widely from the diagonal indicate high volatility, while points close to the diagonal suggest more predictable, stable price movements.

:::{seealso}
See the reference guide for [Lag plots](../ref/api/manual/hvplot.plotting.lag_plot.ipynb)
:::

## Interactive Advantages

All hvPlot statistical plots benefit from Bokeh's interactive features:

- **Shared axes:** Multiple subplots automatically share the same axis ranges, so zooming or panning in one subplot synchronizes across all related plots
- **Linked zooming/panning:** Coordinated exploration across multiple plot panels
- **Hover tooltips:** Detailed information about individual data points

These features make hvPlot's statistical plots significantly more powerful than static alternatives for data exploration.

## Choosing the Right Plot Type

| Goal | Recommended Plot | Why |
|------|------------------|-----|
| Find correlations between variable pairs | Scatter Matrix | Shows quantitative relationships clearly |
| Compare groups across many variables | Parallel Coordinates | Reveals which variables distinguish groups |
| Show overall class separation | Andrews Curves | Emphasizes aggregate differences |
| Analyze temporal dependencies | Lag Plot | Designed specifically for time series patterns |
| Understand single variable distribution | Histogram or KDE | Histograms for frequency, KDE for smooth density |
| Compare distributions across groups | Box Plot or Violin Plot | Box plots for simple comparisons, violin plots for detailed shapes |
| Identify outliers | Box Plot | Explicitly shows outliers beyond quartile ranges |
| Detect multimodal distributions | Violin Plot, KDE, or Histogram | Multiple approaches reveal different aspects of modes |
| Quick distribution summary | Box Plot | Compact five-number summary |
| Detailed distribution analysis | Violin Plot | Combines summary statistics with full distribution shape |
| Explore two-variable relationships | Bivariate Plot | Shows joint distribution and marginal distributions |
| Visualize correlation patterns | Heatmap | Clear matrix representation of correlations |
| Compare multiple distributions smoothly | KDE Plot | Smooth density curves for easy comparison |
| Analyze matrix or gridded data | Heatmap | Designed specifically for matrix visualization |
| Detect outliers in multivariate data | Scatter Matrix + Parallel Coordinates | Combine pairwise and multi-dimensional views |

:::{admonition} Next Steps
:class: seealso
Explore more visualization options at [holoviews.org](https://holoviews.org)
:::