# hvPlot.scatter

```{eval-rst}
.. currentmodule:: hvplot

.. automethod:: hvPlot.scatter
```

## Backend-specific styling options

```{eval-rst}
.. backend-styling-options:: scatter
```

## Examples

Scatter plots are useful for exploring relationships, distributions, and potential correlations between numeric variables.

### Basic scatter plot

This example shows how to create a simple scatter plot.

In [None]:
import hvplot.pandas  # noqa
import pandas as pd

df = pd.DataFrame({"x": [0, 1, 2, 3], "y": [0, 1, 4, 9]})

df.hvplot.scatter(x="x", y="y")

Let's use a more realistic dataset.

In [None]:
import hvplot.pandas  # noqa

df = hvplot.sampledata.penguins("pandas")

df.hvplot.scatter(
    x='bill_length_mm', y='flipper_length_mm',
    title='Bill Length vs Flipper Length'
)

### Grouping by categories

To distinguish categories visually, you can use the `by` parameter. This automatically colors points based on the specified column(s). The generated plot is a [HoloViews NdOverlay](https://holoviews.org/reference/containers/bokeh/NdOverlay.html).

In [None]:
import hvplot.pandas  # noqa

df = hvplot.sampledata.penguins("pandas")

df.hvplot.scatter(
    x='bill_length_mm', y='flipper_length_mm',
    by=['sex', 'species'], title='Scatter plot grouped by sex and species with "by"',
)

:::{note}
If your goal is to simply color the plot by a given categorical variable, then you can use the [`color`](option-color) option instead of [`by`](option-by). The former will vectorize the color styling (i.e., each marker has its own color) while the latter will generate an overlay of scatter plots. As a consequence, using `color` is much more efficient in this case.
:::

In [None]:
import hvplot.pandas  # noqa

df = hvplot.sampledata.penguins("pandas")

df.hvplot.scatter(
    x='bill_length_mm', y='flipper_length_mm',
    color='species', title='Scatter plot colored by species with "color"',
)

(scatter-marker-style)=
### Control marker style

The marker style can be controlled with the styling option `marker`. For Bokeh plots, the option accepts Bokeh-based markers (see the plot below) and a subset of Matplotlib-compatible markers like `'+'` (note these markers cannot be vectorized). Matplotlib plots accept [Matplotlib](https://matplotlib.org/stable/api/markers_api.html) markers.

In [None]:
import bokeh as bk
import holoviews as hv
import hvplot.pandas  # noqa
import itertools
import pandas as pd

bokeh_orig_markers = list(bk.core.enums.MarkerType)
hv_bk_mpl_compat_markers = list(hv.plotting.bokeh.styles.markers)
print('Bokeh original markers:')
print(*map(repr, bokeh_orig_markers), sep=', ', end='\n\n')
print('Matplotlib-compatible markers for Bokeh:')
print(*map(repr, hv_bk_mpl_compat_markers), sep=', ')

df = pd.DataFrame(list(itertools.product(range(6), range(6))), columns=['x', 'y'])
df['marker_col'] = bokeh_orig_markers + [''] * (len(df) - len(bokeh_orig_markers))

df.hvplot.scatter(
    x='x', y='y', marker='marker_col', s=150, title='Bokeh-specific markers'
) *\
df.assign(y=df.y+0.2).hvplot.labels(
    x='x', y='y', text='marker_col', text_color='black',
    text_baseline='bottom', text_font_size='9pt', padding=0.2
)

### Control color and size

You can also vary marker size with the `s` option and color with `c` (or `color`) using numeric columns.

In [None]:
import hvplot.pandas  # noqa

df = hvplot.sampledata.earthquakes("pandas")

df.hvplot.scatter(
    x='lon', y='lat', c='mag', s='depth', cmap="inferno_r",
    clabel="Magnitude values", title='Earthquake depth (color by magnitude)',
)

### Scatter plot with scaling and logarithmic color mapping

This example shows how to fine-tune scatter plots by scaling point sizes and applying a logarithmic color scale. Note we set the `scale` option to uniformally increase the marker size by a factor of 3.

In [None]:
import pandas as pd
import hvplot.pandas  # noqa
import numpy as np

df = pd.DataFrame({
    'x': np.random.rand(100) * 10,
    'y': np.random.rand(100) * 10,
    'size': np.random.rand(100) * 100 + 10,
    'intensity': np.random.lognormal(mean=2, sigma=1, size=100)
})

df.hvplot.scatter(
    x='x', y='y', s='size', scale=3,
    c='intensity', cmap='Blues', logz=True,
    title='Scatter plot with size scaling and log color'
)

### Xarray example

In [None]:
import hvplot.xarray  # noqa

ds = hvplot.sampledata.air_temperature("xarray").sel(lon=285.,lat=40.)

ds.hvplot.scatter(y="air")