Ray Data: Scalable Datasets for ML

Overview <overview> quickstart user-guide examples api/api data-internals

Ray Data is a scalable data processing library for ML workloads. It provides flexible and performant APIs for scaling Offline batch inference <batch_inference_overview> and Data preprocessing and ingest for ML training <ml_ingest_overview>. Ray Data uses streaming execution to efficiently process large datasets.

Install Ray Data

To install Ray Data, run:

$ pip install -U 'ray[data]'

To learn more about installing Ray and its libraries, see Installing Ray <installation>.

Learn more

1 2 2 2

Ray Data Overview ^^^

Get an overview of Ray Data, the workloads that it supports, and how it compares to alternatives.

+++ .. button-ref:: data_overview :color: primary :outline: :expand:

Ray Data Overview

Quickstart ^^^

Understand the key concepts behind Ray Data. Learn what Datasets are and how they're used.

+++ .. button-ref:: data_quickstart :color: primary :outline: :expand:

Quickstart

User Guides ^^^

Learn how to use Ray Data, from basic usage to end-to-end guides.

+++ .. button-ref:: data_user_guide :color: primary :outline: :expand:

Learn how to use Ray Data

Examples ^^^

Find both simple and scaling-out examples of using Ray Data.

+++ .. button-ref:: examples :color: primary :outline: :expand:

Ray Data Examples

API ^^^

Get more in-depth information about the Ray Data API.

+++ .. button-ref:: data-api :color: primary :outline: :expand:

Read the API Reference

Ray Blogs ^^^

Get the latest on engineering updates from the Ray team and how companies are using Ray Data.

+++ .. button-link:: https://www.anyscale.com/blog?tag=ray-datasets :color: primary :outline: :expand:

Read the Ray blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.rst

data.rst

Ray Data: Scalable Datasets for ML

Install Ray Data

Learn more

Files

data.rst

Latest commit

History

data.rst

File metadata and controls

Ray Data: Scalable Datasets for ML

Install Ray Data

Learn more