# Data Options

```{eval-rst}
.. plotting-options-table:: Data Options
```

## `by`

The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots.

In [None]:
import hvplot.pandas  # noqa
import hvsampledata

df = hvsampledata.penguins("pandas")

df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species')

In [None]:
import hvplot.pandas  # noqa
import hvsampledata

df = hvsampledata.penguins("pandas")

df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True, width=400)

## `dynamic`

The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance.

In [None]:
import hvplot.pandas  # noqa
import hvsampledata

df = hvsampledata.penguins("pandas")

df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm', groupby=['island', 'sex'],
    height=300, width=400, dynamic=False,
)

In this example, setting `dynamic=False` produces an interactive plot in the browser. You can engage with the plot’s widgets without needing an active Python session, as all the data is embedded directly in the plot.

::: {warning}
Using `dynamic=False` with very large datasets may significantly impact performance.
:::

## `fields`

The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains dimension names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. You can also assign metadata such as custom display labels and units by passing HoloViews Dimension objects as the values in the `fields` dictionary.

:::{note}
If you need to modify the data values themselves (for example, converting units or applying arithmetic operations), consider using the [`transforms`](#transforms) option instead.
:::

In [None]:
import hvplot.pandas  # noqa
import hvsampledata
import holoviews as hv

df = hvsampledata.penguins("pandas")

plot1 = df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    fields={
        'bill_length_mm': 'Bill Length',
        'bill_depth_mm': 'Bill Depth'
    },
    title="Simple columns renaming",
    width=400,
)

plot2 = df.hvplot.scatter(
    x='bill_length_mm', y='bill_depth_mm',
    fields={
        'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),
        'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')
    },
    title="Using Holoviews dimension metadata",
    width=400,
)

plot1 + plot2

In this example, the `fields` dictionary changes the axis labels from the original dimension names to more reader-friendly ones.

## `groupby`

The `groupby` option specifies one or more dimensions by which to partition your data into separate groups. This grouping enables the creation of interactive widgets that let users filter or switch between different groups. When `dynamic=True` (the default), each group is rendered interactively as a `DynamicMap`, updating on-the-fly; otherwise, with `dynamic=False`, all groups are pre-rendered and returned as a `HoloMap`.

In [None]:
import hvplot.pandas  # noqa
import hvsampledata

df = hvsampledata.penguins("pandas")

df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', groupby='species', dynamic=False)

In this example, the plot automatically generates a widget that lets users select among the different species, dynamically updating the plot for the selected group. See [`dynamic`](#dynamic) for more information.

::: {note}

While both `by` and `groupby` are used to segment your data based on categorical variables, they serve different purposes. The `by` option creates an overlay (or layout, if `subplots=True`) where all groups are displayed simultaneously, whereas the `groupby` option builds an interactive widget. With `groupby`, each group is rendered as a separate element (using a `DynamicMap` if `dynamic=True` or a `HoloMap` otherwise), allowing users to toggle between groups dynamically.
:::

## `group_label`

The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers.

In [None]:
import pandas as pd
import hvplot.pandas  # noqa

df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['x', 'y'])

df.hvplot.line(group_label='Category')

In this example, setting `group_label='Category'` customizes the legend to display this label instead of the default, making the plot more informative.

## `kind`

The `kind` option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you instruct hvPlot to create a specific visualization. For tabular data, the default is ‘line’, which generates a line plot. However, when working with xarray data, hvPlot automatically infers the most appropriate plot type based on the structure of your dataset. For example, it may default to a ‘hist’ plot for two-dimensional data or ‘rgb’ for image-like data.

Changing the `kind` parameter allows you to experiment with different visual representations without altering your underlying data.

### `Tabular data`

In [None]:
import pandas as pd
import hvplot.pandas  # noqa

df = pd.DataFrame({
    'year': [2018, 2019, 2020, 2021],
    'sales': [150, 200, 250, 300]
})

line_plot = df.hvplot(x='year', y='sales', title="Default line plot", width=400)
bar_plot = df.hvplot(x='year', y='sales', kind='bar', title="Bar plot", width=400)

line_plot + bar_plot

In this example, the first plot uses the default (`kind='line'`), while the second explicitly sets `kind='bar'` to create a bar chart. You can also specify it as an attribute of the `hvplot` class:

In [None]:
df.hvplot.bar(x='year', y='sales')

### `Xarray data`

In [None]:
import hvsampledata
import hvplot.xarray  # noqa

ds = hvsampledata.air_temperature("xarray")
hist_plot = ds.hvplot(title="Default hist plot", width=400)
image_plot = ds.hvplot.image(title="Image plot", width=400)

hist_plot + image_plot

## `label`

The label option allows you to specify a custom name for your dataset that appears in the plot legend or as the data series label.

:::{note}
While `label` defines the name of the data series (used in the legend), the `title` keyword sets the overall plot title. If a title is provided it is used as the plot’s heading, whereas `label` is used to annotate the plotted data.
:::

In [None]:
import pandas as pd
import hvplot.pandas  # noqa

df = pd.DataFrame({
    'name': ["Mark", "Luke", "Ken", "June"],
    'age': [15, 20, 25, 30]
})

line_plot = df.hvplot.line(x='name', y='age', label="line plot", title="Ages of students")
bar_plot = df.hvplot.bar(x='name', y='age', label="bar plot")

line_plot * bar_plot

## `persist`

The `persist` option is useful when working with Dask-backed datasets. Setting `persist=True` tells Dask to compute and keep the data in memory, which can speed up subsequent interactions and visualizations for large or computationally expensive datasets.

## `row`

The `row` and `col` options let you split your plot into separate subplots based on categorical variables. Use `row` to arrange subplots vertically and `col` to arrange them horizontally when used together, making it easier to compare subsets of your data side by side.

In [None]:
import pandas as pd
import hvplot.pandas  # noqa
import hvsampledata

df = hvsampledata.penguins("pandas")

df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', row='species', col='island')


In this example, the data is split into separate subplots: one row per `species` and one column per `island`, which allows for easy comparison between the different subsets.

## `col`

See [`row`](#row) above.

## `sort_date`

The `sort_date` option ensures that the x-axis is sorted chronologically when your data contains date values. This helps to correctly display time series data even if the original dataset isn’t in order. It is set to `True` by default.

In [None]:
import hvplot.pandas  # noqa
from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst

scrampled = sst.sample(frac=1)

plot1 = scrampled.hvplot(width=400)
plot2 = scrampled.hvplot(sort_date=False, width=400)

plot1 + plot2

In the first plot, even though the dates in the DataFrame are unsorted, the plot’s x-axis will display them in chronological order. However, setting `sort_date=False` results in jumbled lines in the plot because the lines are plotted as is in the dataframe.

## `subplots`

The `subplots` option is a Boolean flag that, when enabled (set to `True`), displays each group specified by the `by` keyword in its own subplot. This contrasts with the default behavior of overlaying all groups in a single plot, and it can provide clearer side-by-side comparisons of grouped data.

See [`by`](#by) for example usage.

## `symmetric`

The `symmetric` option controls whether the colormap range is centered around zero. If you do not explicitly set `symmetric=True` and no color limits are provided via `clim`, hvPlot automatically checks your data by computing the 5th and 95th percentiles. If the 5th percentile is below 0 and the 95th percentile is above 0, the option is enabled so that the colormap is balanced about 0.

::: {note}
For lazily loaded or very large xarray datasets, this check is skipped for performance reasons and defaults to `False`.
:::

In [None]:
import hvplot.xarray  # noqa
import hvsampledata

ds = hvsampledata.air_temperature("xarray")
# Select a single date and convert to Celsius to get both negative and positive values around 0
data = ds.sel(time='2014-02-25') - 273
plot1 = data.hvplot.image(title="Symmetric True by default", width=400)
plot2 = data.hvplot.image(symmetric=False, title="Symmetric=False", width=400)

plot1 + plot2

In this example, the left image uses the symmetric colormap scaling (centered at zero), while the right image shows the default color scaling without enforcing symmetry. Notice that when the temperature values are symmetric around 0, the “coolwarm” colormap is used by default.

## `check_symmetric_max`

The `check_symmetric_max` option sets an upper limit on the number of data elements for which the automatic symmetry check is performed. When the dataset’s size exceeds this threshold, hvPlot skips the symmetry check and defaults to treating the data as non-symmetric. By default this limit is **1,000,000** elements which usually works well for most datasets. However, you can adjust it if you want to force or avoid the symmetric check for smaller or larger datasets.

In [None]:
import hvplot.xarray  # noqa
import hvsampledata

ds = hvsampledata.air_temperature("xarray")

plot1 = (ds - 273).hvplot.image(width=400, title="Default check for symmetry")
plot2 = (ds - 273).hvplot.image(check_symmetric_max=10, width=400, title="Avoid symmetry check above 10")

plot1 + plot2

## `transforms`

The `transforms` option allows you to modify data values for specific dimensions before plotting. Unlike the [`fields`](#fields) option which only renames or adds metadata, `transforms` applies HoloViews expressions to the data. It accepts a dictionary where each key is a dimension (for example, a DataFrame column name) and each value is a HoloViews expression built with `holoviews.dim()` that defines how to transform that dimension.

For instance, if you have a 'probability' column with values between 0 and 1 and you want to display them as percentages, you can define a transformation as:

`percent = hv.dim('probability') * 100`

When passed via the transforms keyword, this expression multiplies all values in the ‘probability’ column by 100 before plotting.

In [None]:
import numpy as np
import pandas as pd
import holoviews as hv
import hvplot.pandas  # noqa

df = pd.DataFrame({'value': np.random.randn(50), 'probability': np.random.rand(50)})
percent = hv.dim('probability') * 100

df.hvplot.scatter(
    x='value', y='probability', transforms={'probability': percent}
)


## `use_dask`

The `use_dask` option tells hvPlot to treat your data as dask-backed, enabling out‐of‐core and parallelized computation for datasets that might not fit in memory. When set to `True`, hvPlot checks whether the provided data is a Dask DataFrame (or similar Dask object) and uses the appropriate processing branch. If you set `persist=True`, it persists the data in memory for improved performance on subsequent operations.

## `use_index`

The `use_index` option determines whether the data’s index is used as the x-axis by default. By default hvPlot automatically assigns the DataFrame’s index as a coordinate for plotting. This is particularly useful when the index contains meaningful information (such as timestamps) and when no explicit x-axis column is specified.

If you set `use_index=False`, hvPlot uses the first non-index column as the x-axis.

In [None]:
import pandas as pd
import hvplot.pandas  # noqa
from bokeh.sampledata import stocks

df = pd.DataFrame(stocks.AAPL)
df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

df[:50].hvplot.line(y=['open', 'close'], group_label='Prices')

Notice the use of the index column ('date') as the x-axis.

## `value_label`

The `value_label` option sets a custom label for the data values, and is typically used to label the y-axis or to annotate legends. By default, it is set to 'value', but you can override it with a more descriptive name to better convey what the data represents.

In [None]:
import pandas as pd
import hvplot.pandas  # noqa

df = pd.DataFrame({
    'time': pd.date_range("2020-01-01", periods=4),
    'high': [22, 23, 24, 25],
    'low': [12, 16, 18, 20]
})

df.hvplot.line(x='time', value_label='Temperature (°C)', group_label="Temp")