[misc][data.llm] Generalize the builder pattern in ray.data.llm by jeffreyjeffreywang · Pull Request #58484 · ray-project/ray

jeffreyjeffreywang · 2025-11-09T22:30:54Z

Description

Briefly describe what this PR accomplishes and why it's needed.

As discussed in https://docs.google.com/document/d/1danbyJjd3Zl_Q-CSsS3PjxtG4K9dZkyn0A9t7i4Fyjg/edit?disco=AAABtNCDbfw, the current builder function build_llm_processor is overly specific to LLM inference workloads and not flexible enough to support additional processors, such as those for multimodal preprocessing. To address this, we’ve generalized it to build_processor to better accommodate a broader range of LLM-related workloads.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

N/A

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Original:

import ray
from ray.data.llm import vLLMEngineProcessorConfig, build_processor

config = vLLMEngineProcessorConfig(
    model_source="meta-llama/Meta-Llama-3.1-8B-Instruct",
    concurrency=1,
    batch_size=64,
)

processor = build_llm_processor(
    config,
    preprocess=lambda row: dict(
        messages=[{"role": "user", "content": row["prompt"]}],
        sampling_params=dict(temperature=0.3, max_tokens=20),
    ),
    postprocess=lambda row: dict(resp=row["generated_text"]),
    preprocess_map_kwargs={"num_cpus": 0.5},
    postprocess_map_kwargs={"num_cpus": 0.25},
)

ds = ray.data.range(300)
ds = processor(ds)
for row in ds.take_all():
    print(row)

Updated:

import ray
from ray.data.llm import vLLMEngineProcessorConfig, build_processor

config = vLLMEngineProcessorConfig(
    model_source="meta-llama/Meta-Llama-3.1-8B-Instruct",
    concurrency=1,
    batch_size=64,
)

processor = build_processor( # This is the only difference. Arguments remain the same.
    config,
    preprocess=lambda row: dict(
        messages=[{"role": "user", "content": row["prompt"]}],
        sampling_params=dict(temperature=0.3, max_tokens=20),
    ),
    postprocess=lambda row: dict(resp=row["generated_text"]),
    preprocess_map_kwargs={"num_cpus": 0.5},
    postprocess_map_kwargs={"num_cpus": 0.25},
)

ds = ray.data.range(300)
ds = processor(ds)
for row in ds.take_all():
    print(row)

jeffreyjeffreywang · 2025-11-09T22:31:32Z

cc: @nrghosh @kouroshHakha

gemini-code-assist

Code Review

This pull request refactors build_llm_processor to the more generic build_processor. This is a good change that makes the API more flexible for various processing workloads beyond just LLM inference. The renaming has been applied consistently and thoroughly across the entire codebase, including source files, documentation, examples, and tests. The changes are well-executed and I found no issues in my review.

jeffreyjeffreywang · 2025-11-10T21:03:32Z

Will wait for #58298 to finalize before deciding if this PR is still necessary.

python/ray/data/llm.py

nrghosh

conflicts resolved, good for merge

jeffreyjeffreywang · 2025-12-09T03:20:46Z

Rebase to address failing doc tests.

nrghosh

cc @kouroshHakha ready to merge 🚀

kouroshHakha

needs some changes

kouroshHakha · 2025-12-09T23:05:45Z

python/ray/data/llm.py

+    """
+    [DEPRECATED] Prefer build_processor. Build a LLM processor using the given config.
+    """
+    deprecation_warning(


wait this should be a decorator and we should also remove the PublicAPI annotation from build_llm_processor

I see similar usages of directly using the deprecation_warning helper: https://github.com/search?q=repo%3Aray-project%2Fray+%22deprecation_warning%28%22&type=code&p=2. I'm not able to find associated decorators though.

Will remove the publicAPI annotation.

oh I guess I was talking about Deprecated class:

ray/python/ray/_common/deprecation.py

Line 61 in b8f22e9

"""Decorator for documenting a deprecated class, method, or function.

Maybe we can use that instead?

ah yeah, this is neater. adjusted in my latest revision.

alexeykudinkin

Rubber-stamping

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

jeffreywang-anyscale · 2025-12-10T00:53:54Z

Rebasing onto latest master

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

…project#58484) Signed-off-by: peterxcli <peterxcli@gmail.com>

jeffreyjeffreywang requested review from a team as code owners November 9, 2025 22:30

gemini-code-assist bot reviewed Nov 9, 2025

View reviewed changes

ray-gardener bot added data Ray Data-related issues llm community-contribution Contributed by the community labels Nov 10, 2025

gvspraveen requested a review from kouroshHakha November 13, 2025 04:13

jeffreyjeffreywang force-pushed the generic_build_processor branch from bd646b6 to 9272819 Compare November 19, 2025 20:13

cursor bot reviewed Nov 19, 2025

View reviewed changes

python/ray/data/llm.py Show resolved Hide resolved

nrghosh added the go add ONLY when ready to merge, run all tests label Nov 20, 2025

nrghosh approved these changes Nov 20, 2025

View reviewed changes

omatthew98 removed the data Ray Data-related issues label Dec 3, 2025

nrghosh approved these changes Dec 7, 2025

View reviewed changes

kouroshHakha enabled auto-merge (squash) December 7, 2025 00:20

kouroshHakha approved these changes Dec 7, 2025

View reviewed changes

auto-merge was automatically disabled December 9, 2025 03:11
Head branch was pushed to by a user without write access

jeffreyjeffreywang force-pushed the generic_build_processor branch from 46dbc11 to 8787f45 Compare December 9, 2025 03:11

nrghosh approved these changes Dec 9, 2025

View reviewed changes

kouroshHakha reviewed Dec 9, 2025

View reviewed changes

alexeykudinkin approved these changes Dec 9, 2025

View reviewed changes

jeffreywang-anyscale added 4 commits December 9, 2025 16:52

Make build_llm_processor more generic

2943fc8

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

Backward compatibility

ae45212

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

Use core's deprecation helper

e9bc754

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

Maintain backward compat

405fb1f

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

jeffreywang-anyscale force-pushed the generic_build_processor branch from 8787f45 to 405fb1f Compare December 10, 2025 00:53

jeffreywang-anyscale added 2 commits December 9, 2025 17:28

Remove PublicAPI annotation for build_llm_processor

0fd882e

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Use the Deprecated decorator

094241c

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

richardliaw merged commit a8857ae into ray-project:master Dec 10, 2025
6 checks passed

peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026

[misc][data.llm] Generalize the builder pattern in ray.data.llm (ray-…

6889b7a

…project#58484) Signed-off-by: peterxcli <peterxcli@gmail.com>

dancingactor mentioned this pull request Mar 16, 2026

[Data][docs] Update the doc's minimum version to support build_processor #61757

Open

Conversation

jeffreyjeffreywang commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

jeffreyjeffreywang commented Nov 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jeffreyjeffreywang commented Nov 10, 2025

Uh oh!

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

jeffreyjeffreywang commented Dec 9, 2025

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin left a comment

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jeffreyjeffreywang commented Nov 9, 2025 •

edited

Loading

jeffreywang-anyscale Dec 10, 2025 •

edited

Loading