# Sample Data

hvPlot provides convenient access to sample datasets through the `hvplot.sampledata` module, which serves as an interface to the [`hvsampledata`](https://github.com/holoviz/hvsampledata) package. This makes it easy to get started with hvPlot using real-world datasets without having to find, download, or clean data yourself.

The sample datasets cover a variety of data types and use cases, making them perfect for learning hvPlot, testing functionality, creating examples, or building prototypes.

## Installation

The sample datasets are provided by the `hvsampledata` package, which is included as a dependency when installing hvPlot with example dependencies. If you installed hvPlot with minimal dependencies, you may need to install it separately:

::::{tab-set}

:::{tab-item} pip

```bash
pip install hvsampledata
```

:::

:::{tab-item} conda

```bash
conda install -c conda-forge hvsampledata
```

:::
::::

Once available, the datasets can be accessed directly through hvPlot:

In [None]:
import hvplot.pandas  # noqa

penguins = hvplot.sampledata.penguins("pandas")
penguins.head(3)

:::{tip}
This is the recommended way to use the `sampledata` module, as it avoids needing a separate import and is more explicit.
:::

If `hvsampledata` is not installed, you'll get a helpful error message with installation instructions.

## Available Datasets

The `sampledata` module provides access to various datasets covering different data types and domains:

### Tabular Datasets

| Dataset | Description | Rows | Use Cases |
|---------|-------------|------|----------|
| `penguins` | Penguin species measurements | 344 | Classification, EDA, scatter plots |
| `earthquakes` | Global earthquake data | 596 | Geographic visualization, time series |
| `apple_stocks` | Apple Inc. stock data | 1,509 | Financial time series, OHLC plots |
| `stocks` | Tech company stock comparison | 261 | Multi-line plots, correlation |
| `synthetic_clusters` | Clustered synthetic data | 1M (configurable) | Large data visualization, clustering |
| `us_states` | US states with economic data | 49 | Geographic plotting, choropleths |

### Gridded Datasets

| Dataset | Description | Dimensions | Use Cases |
|---------|-------------|------------|----------|
| `air_temperature` | Weather reanalysis data | 53×25×20 | Weather/climate analysis, animations |
| `landsat_rgb` | Satellite imagery | RGB image | Remote sensing, image analysis |
| `penguins_rgba` | Penguin image with transparency | RGBA image | Image processing, overlays |

Let's explore some of these datasets!

## Working with Tabular Data

### Penguins Dataset

The penguins dataset is perfect for exploring relationships between different variables:

In [None]:
penguins = hvplot.sampledata.penguins("pandas")

penguins.hvplot.scatter(
    x='bill_length_mm',
    y='bill_depth_mm',
    color='species',
    alpha=0.7,
    title='Penguin Bill Measurements by Species',
)

### Earthquakes Dataset

The earthquakes dataset includes geographic coordinates, making it perfect for demonstrating geographic visualizations:

In [None]:
earthquakes = hvplot.sampledata.earthquakes("pandas")
earthquakes.head(3)

In [None]:
earthquakes.hvplot.scatter(
    x='lon',
    y='lat',
    size='depth',
    color='mag',
    cmap='viridis_r',
    clabel='Earthquake Magnitude',
    alpha=0.6,
    title='Global Earthquakes: Location, Magnitude, and Depth',
    xlabel='Longitude',
    ylabel='Latitude',
)

### Stocks Dataset

The stocks dataset is great for time series analysis:

In [None]:
stocks = hvplot.sampledata.stocks("pandas")
stocks.head(3)

In [None]:
stocks.hvplot.line(
    x='date',
    title='Tech Stock Performance Comparison (Normalized)',
    ylabel='Normalized Price',
)

### Large Synthetic Dataset

The synthetic clusters dataset is perfect for demonstrating hvPlot's ability to handle large datasets:

In [None]:
# Load a large synthetic dataset (1 million points by default)
large_data = hvplot.sampledata.synthetic_clusters("pandas", total_points=100_000)  # Using smaller size for demo
print(f"Dataset shape: {large_data.shape}")
large_data.head(3)

In [None]:
large_data.hvplot.points(
    x='x',
    y='y',
    by='cat',
    datashade=True,  # Use datashader for performance
    data_aspect=1,
    title='Large Synthetic Dataset (100k points)'
)

## Working with Gridded Data

### Air Temperature Dataset

The air temperature dataset demonstrates working with multi-dimensional scientific data:

In [None]:
import hvplot.xarray # noqa

air_temp = hvplot.sampledata.air_temperature("xarray").sel(time="2014-02-25 12:00")
air_temp

In [None]:
air_temp.hvplot.image(
    x='lon',
    y='lat',
    cmap='coolwarm',
    title='Air Temperature (Kelvin)',
    aspect='square',
)

### Landsat RGB Dataset

The landsat dataset shows how to work with RGB image data:

In [None]:
landsat = hvplot.sampledata.landsat_rgb("rioxarray")
landsat

In [None]:
# Display the RGB image
landsat.hvplot.rgb(
    x='x',
    y='y',
    title='Landsat RGB Image',
    aspect='square'
)

## Geographic Data

### US States Dataset

The US states dataset includes geometric boundaries and economic data, perfect for choropleth maps:

In [None]:
us_states = hvplot.sampledata.us_states("geopandas")
us_states.head(3)

In [None]:
us_states.hvplot.polygons(
    geo=True,
    color='bea_region',
    hover_cols=['state', 'pop_density', 'income_range'],
    hover_tooltips=[
        ("State", "@state"),
        ("Pop. density", "@pop_density /mi2"),
        ("Median income range", "@income_range")
    ],
   title='US states colored by BEA region',
)

## Using Different Engines

Many datasets support multiple engines (pandas, polars, dask) for different performance characteristics:

In [None]:
# Load with pandas (eager evaluation)
penguins_pandas = hvplot.sampledata.penguins("pandas")
print(f"Pandas type: {type(penguins_pandas)}")

# Load with polars
penguins_polars = hvplot.sampledata.penguins("polars")
print(f"Polars type: {type(penguins_polars)}")

# Load with dask for larger-than-memory processing (lazy evaluation)
penguins_dask = hvplot.sampledata.penguins("dask", lazy=True)
print(f"Dask type: {type(penguins_dask)}")

## Next Steps

Now that you've seen how to use hvPlot's sample datasets, you can:

1. **Explore more plot types**: Try different visualization methods with these datasets in your own notebook
2. **Learn about specific data types**: Check out guides for [Tabular Data](../user_guide/Plotting.ipynb), [Gridded Data](../user_guide/Gridded_Data.ipynb), and [Geographic Data](../user_guide/Geographic_Data.ipynb)
3. **Work with your own data**: Apply the techniques you've learned to your own datasets

The sample datasets provide a great foundation for learning hvPlot without worrying about data preparation, letting you focus on visualization and analysis techniques!

:::{seealso}
[API Reference](../ref/api/manual/sampledata.md): Complete reference for all sample dataset functions
:::