In [None]:
import numpy as np
import pandas as pd
from hvplot.plotting import scatter_matrix

`scatter_matrix` shows all the pairwise relationships between the columns. Each non-diagonal plots the corresponding columns against each other, while the diagonal plot shows the distribution of each individual column.

This function is closely modelled on [pandas.plotting.scatter_matrix](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.scatter_matrix.html).

## Parameters:

* **`data`** (`DataFrame`): The data to plot. Every column is compared to every other column.
* **`c`** (`str`, optional): Column to color by
* **`chart`** (`str`, optional): Chart type for the off-diagonal plots (one of 'scatter', 'bivariate', 'hexbin')
* **`diagonal`**: (`str`, optional): Chart type for the diagonal plots (one of 'hist', 'kde')
* **`alpha`** (`float`, optional): Transparency level for the off-diagonal plots
* **`nonselection_alpha`** (`float`, optional): Transparency level for nonselected object in the off-diagonal plots
* **`tools`** (`list` of `str`, optional) Interaction tools to include. Defaults are 'box_select' and 'lasso_select'
* **`cmap`/`colormap`**: (`str` or colormap object, optional): Colormap to use for off-diagonal plots.  Default is [Category10](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#category10).
* **`diagonal_kwds`/`hist_kwds`/`density_kwds`** (`dict`, optional): Keyword options for the diagonal plots
* **`datashade`**(default=False):  Whether to apply rasterization and shading (colormapping) using the Datashader library, returning an RGB object instead of 
    individual points
* **`rasterize`**(default=False): Whether to apply rasterization using the Datashader library, returning an aggregated Image (to be colormapped by the 
    plotting backend) instead of individual points
* **`dynspread`**(default=False): For plots generated with datashade=True or rasterize=True,  automatically increase the point size when the data is sparse
    so that individual points become more visible. kwds supported include ``max_px``, ``threshold``,  ``shape``, ``how`` and ``mask``.
* **`spread`**(default=False): Make plots generated with datashade=True or rasterize=True increase the point size to make points more visible, by applying a fixed spreading of a certain number of cells/pixels. kwds
    supported include: ``px``, ``shape``, ``how`` and ``mask``.
* **`kwds`** : Keyword options for the off-diagonal plots and datashader's spreading , optional

___

In [None]:
df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])

scatter_matrix(df, alpha=0.2)

In [None]:
df_sub = df[['A', 'B']].copy()

The `chart` parameter allows to change the type of the *off-diagonal* plots.

In [None]:
scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')

The `diagonal` parameter allows to change the type of the *diagonal* plots.

In [None]:
scatter_matrix(df_sub, diagonal='kde')

Setting `tools` to include a selection tool like `box_select` and an inspection tool like `hover` permits further analysis.

In [None]:
scatter_matrix(df_sub, tools=['box_select', 'hover'])

In [None]:
df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))

The `c` parameter allows to colorize the data by a given column, here by `'CAT'`. Note also that the `diagonal_kwds` parameter (equivalent to `hist_kwds` in this case or `density_kwds` for *kde* plots) allow to customize the diagonal plots.

In [None]:
scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))

In [None]:
df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])

Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to `True` the `rasterize` (or `datashade`) parameter that uses [Datashader](https://datashader.org/) to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.

The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by `datashader`.

In [None]:
scatter_matrix(df, rasterize=True)

When `rasterize` (or `datashade`) is toggled it's possible to make individual points more visible by setting `dynspread=True` or `spread=True`. Head over to the [Working with large data using datashader](https://holoviews.org/user_guide/Large_Data.html) guide of [HoloViews](https://holoviews.org/index.html) to learn more about these operations and what parameters they accept (which can be passed as `kwds` to `scatter_matrix`).

In [None]:
scatter_matrix(df, rasterize=True, dynspread=True)