[Core] Add Support for Furiosa AI NPU by nadongjun · Pull Request #63035 · ray-project/ray

nadongjun · 2026-04-30T09:01:49Z

Description

As a user of Furiosa AI's RNGD NPUs on Ray, I've found that management within Ray remains manual and error-prone, despite existing support in vLLM and Hugging Face.

This PR introduces first-class support for Furiosa AI NPUs (specifically the RNGD family: RNGD-S, RNGD, RNGD-Max, RNGD+) into Ray's accelerator management framework. This integration follows the established patterns used for other NPUs.

Currently, Furiosa RNGD is gaining traction in the LLM ecosystem with support in vLLM, Hugging Face Optimum (optimum-furiosa), and Kubernetes. However, Ray users running production inference workloads still face manual overhead:

Manual Resource Tagging: Users must pass --resources='{"FURIOSA": N}' to ray start as Ray lacks auto-detection.
Manual Device Isolation: Users have to manually manage FURIOSA_VISIBLE_DEVICES to prevent resource contention between actors.
Lack of SKU Awareness: There is no native way to target specific chip architectures (e.g., RNGD-Max vs. RNGD-S) using accelerator_type.

Usage Examples

import ray

# NPUs are auto-detected.
ray.init()

# Requesting a specific NPU with architecture pinning
@ray.remote(resources={"FURIOSA": 1}, accelerator_type="FURIOSA_RNGD")
class InferenceWorker:
    def __init__(self):
        # Ray automatically sets FURIOSA_VISIBLE_DEVICES.
        # furiosa-runtime / furiosa-llm will only see the assigned chip.
        from furiosa.runtime import session
        self.sess = session.create("model.dfg")

    def predict(self, x):
        return self.sess.run(x)

Related issues

Additional information

Product: Furiosa AI RNGD

SDK: furiosa-smi-py

Integrations: optimum-furiosa

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

gemini-code-assist

Code Review

This pull request adds support for Furiosa AI NPUs by implementing the FuriosaAcceleratorManager for device detection and resource management. The changes include logic for architecture identification, environment variable configuration for visible devices, and comprehensive unit tests with mocks for the Furiosa SMI SDK. Review feedback identifies a bug in architecture detection that could return an incorrect string, suggests implementing get_current_node_accelerator_labels for improved dashboard visibility, and recommends updating typing imports.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun · 2026-05-07T01:59:24Z

@rueian @ryanaoleary @edoakes Gentle ping. Any thoughts on this?

elpis-furiosa

Thank you for working on Ray support for Furiosa RNGD. I stumbled upon this PR while researching ways to support new accelerators. The code looks fine at a glance, but I'll test it on actual RNGD hardware just to make sure.

Also, it will be great if an example similar to those for other accelerators in the Using accelerators in Tasks and Actors section of the document. I can take a look at adding one as a follow-up PR.

Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit cae315e. Configure here.}

nadongjun · 2026-05-09T00:31:07Z

Thank you for working on Ray support for Furiosa RNGD. I stumbled upon this PR while researching ways to support new accelerators. The code looks fine at a glance, but I'll test it on actual RNGD hardware just to make sure.

Also, it will be great if an example similar to those for other accelerators in the Using accelerators in Tasks and Actors section of the document. I can take a look at adding one as a follow-up PR.

@elpis-furiosa Thanks for the review. I've applied all the suggested changes.

Most of the accelerators Ray supports don't seem to have real hardware tests, so verifying RNGD on actual hardware would be a meaningful contribution for end users.

If you'd like, I'm happy to hand off not just the follow-up PR but ownership of this PR as well (feel free to open a fresh PR if that's cleaner). Long-term maintenance items like SDK changes and RNGD family naming would naturally fit better on the Furiosa AI side.

If you're interested, please leave a comment here or DM me on the Ray Slack (handle: Dongjun Na).

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

elpis-furiosa · 2026-05-11T04:03:33Z

Thank you for working on Ray support for Furiosa RNGD. I stumbled upon this PR while researching ways to support new accelerators. The code looks fine at a glance, but I'll test it on actual RNGD hardware just to make sure.
Also, it will be great if an example similar to those for other accelerators in the Using accelerators in Tasks and Actors section of the document. I can take a look at adding one as a follow-up PR.

@elpis-furiosa Thanks for the review. I've applied all the suggested changes.

Most of the accelerators Ray supports don't seem to have real hardware tests, so verifying RNGD on actual hardware would be a meaningful contribution for end users.

If you'd like, I'm happy to hand off not just the follow-up PR but ownership of this PR as well (feel free to open a fresh PR if that's cleaner). Long-term maintenance items like SDK changes and RNGD family naming would naturally fit better on the Furiosa AI side.

If you're interested, please leave a comment here or DM me on the Ray Slack (handle: Dongjun Na).

Thanks for your offer, @nadongjun. Looking at the scope of this PR, you've already done most of the implementation work, so rather than taking over ownership, we'd prefer to focus on providing hardware testing support on actual RNGD devices. Any additional contributions from our side will follow as separate PRs.

Yicheng-Lu-llll · 2026-05-19T21:06:02Z

Thank you so much for the contribution, @nadongjun! I'll take a look by tmrw.

Yicheng-Lu-llll

Thank you for the PR! left some nits

Yicheng-Lu-llll · 2026-05-19T23:34:21Z

+    token = token.strip()
+    if token.startswith(_FURIOSA_DEVICE_PREFIX):
+        token = token[len(_FURIOSA_DEVICE_PREFIX) :]
+    # ``furiosa-llm`` allows ``npu:0:0-3`` to address a core range; we keep


Just wanted to confirm my understanding, this npu:0:0-3 format only shows up and gets used at ray node startup, right? So ray can safely treat it as a full device. And we don't really support core level scheduling.

Yes, that's the case. The :cores part only appears when users set FURIOSA_DEVICES themselves, so the same value can be passed straight through to furiosa-llm --devices.

Ray strips the suffix before scheduling and only operates at the device level, so core-level scheduling isn't supported.

But in this case, for a detailed example:

A user sets FURIOSA_DEVICES=npu:0:0-3 at node startup.

Ray would only see npu:0.

So, when we start a Ray worker on this node, Ray will set the env var to FURIOSA_DEVICES=npu:0 (via set_current_process_visible_accelerator_ids).

furiosa sdk reads npu:0 instead of npu:0:0-3.

For reference, NVIDIA solves the same subdevice partitioning problem with MIG by giving each instance its own id and listing them flat: CUDA_VISIBLE_DEVICES=MIG-GPU-abc-1,MIG-GPU-abc-2 and ray treats each MIG instance as a whole device.

I'm thinking we either do the similar way, or just add a doc note saying please don't use things like npu:0:0-3.

Thanks for the explanation. It would be better if FURIOSA_DEVICES can be partitioned, as in the case for CUDA_VISIBLE_DEVICES. To elaborate, Furiosa RNGD has 8PEs (Processing Elements) on one PCIe card, and they are dented as npu:{chip_id}:{pe_id}. Adjacent PEs can be fused to work together.

Can this issue be addressed in a different PR? I'll make a follow-up PR for this issue.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Yicheng-Lu-llll

Thanks for the update! The only concern I have now is: #63035 (comment). Otherwise LGTM.

And @elpis-furiosa would you mind helping with testing this e2e?

Yicheng-Lu-llll · 2026-05-20T21:44:18Z

  Trouble only occurs if those tasks and actors
  attempt to actually use accelerators that don't exist.

 Using accelerators in Tasks and Actors


Should we also consider adding furiosa to Sections 2 & 3?

7bbc9b7 adds the in-dev arch cases (RngdMax/RngdS/RngdPlus and rngd-max/rngd+) and the FURIOSA_RNGD constant. Kept the other SKUs out of accelerators.py since names may still change.

338435e refactors test_get_current_process_visible_accelerator_ids to @pytest.mark.parametrize.

For the Tasks/Actors section example and the e2e testing on actual RNGD hardware, @elpis-furiosa offered to handle both earlier. Would it be OK to take care of them in a separate follow-up PR?

For the Tasks/Actors section example and the e2e testing on actual RNGD hardware, @elpis-furiosa offered to handle both earlier. Would it be OK to take care of them in a separate follow-up PR?

I ran the following example on our hardware, which is similar to the examples of other accelerators:

(ray) ➜ ray git:(338435efad) ✗ cat ray_test.py import os import ray ray.init(resources={"FURIOSA": 2}) @ray.remote(resources={"FURIOSA": 1}) class RNGDActor: def ping(self): print("RNGD IDs: {}".format(ray.get_runtime_context().get_accelerator_ids()["FURIOSA"])) print("FURIOSA_DEVICES: {}".format(os.environ["FURIOSA_DEVICES"])) @ray.remote(resources={"FURIOSA": 1}) def rngd_task(): print("RNGD IDs: {}".format(ray.get_runtime_context().get_accelerator_ids()["FURIOSA"])) print("FURIOSA_DEVICES: {}".format(os.environ["FURIOSA_DEVICES"])) rngd_actor = RNGDActor.remote() ray.get(rngd_actor.ping.remote()) # The actor uses the first RNGD so the task uses the second one. ray.get(rngd_task.remote()) (ray) ➜ ray git:(338435efad) ✗ python3 ray_test.py 2026-05-21 15:52:42,642 INFO worker.py:2018 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (RNGDActor pid=693909) RNGD IDs: ['0'] (RNGDActor pid=693909) FURIOSA_DEVICES: npu:0 (rngd_task pid=693883) RNGD IDs: ['1'] (rngd_task pid=693883) FURIOSA_DEVICES: npu:1

@nadongjun If you like, you can add this code as an example.

Thanks!, added the example with you as co-author. Marked Fractional Accelerators as unsupported for now

29d6a1c

let me know if I should change that.

nadongjun · 2026-05-21T01:15:04Z

@Yicheng-Lu-llll The npu:0:0-3 format I used in the docstring actually comes from the FuriosaRT-era multi-NPU env var introduction notes in furiosa-sdk 0.10.0 (docs), the PE fusion notation. I had assumed furiosa-llm would parse the same syntax and brought it over to the RNGD scenario without nailing down the source, which was a mistake. Sorry about that.

Looking at the RNGD docs again, the partitioning model is completely different:

"By implementing Single Root I/O Virtualization (SR-IOV), the system allows a single physical chip to be partitioned into 2, 4, or 8 independent NPU instances."

So instead of software-level PE fusion from the older version, RNGD splits a single chip into 2/4/8 VFs via hardware SR-IOV. Since the current furiosa-llm source isn't public, I can't directly verify the env var parsing behavior, so let me lay out the scenarios under some assumptions for clarity.

Assumption: furiosa_smi_py.list_devices() returns entries depending on whether SR-IOV is configured:

Without SR-IOV: one entry per physical chip

>>> list_devices()
[Device(npu0)]
# len == 1

With N VFs configured: one entry per VF

>>> list_devices()
[Device(npu0vf0), Device(npu0vf1)]    # 2 VFs
# len == 2

Scenario 1: SR-IOV not configured (one RNGD chip used as a whole)

# Admin: no SR-IOV partitioning
$ ray start --head
# FuriosaAcceleratorManager detects list_devices() length = 1 → registers "FURIOSA: 1"

import ray
ray.init()

@ray.remote(resources={"FURIOSA": 1})
class LLMServer:
    def __init__(self, model_path):
        from furiosa_llm import LLM
        self.llm = LLM(model_path)
        # Ray sets FURIOSA_DEVICES="npu:0" beforehand,
        # so furiosa-llm uses only that NPU (assumed)

    def generate(self, prompt: str) -> str:
        return self.llm.generate([prompt])[0]

server_a = LLMServer.remote("/models/test")
print(ray.get(server_a.generate.remote("Test")))

# Second actor: no free NPU -> stays pending
server_b = LLMServer.remote("/models/test")

-> One actor per node. The RNGD chip is owned entirely by a single workload.

Scenario 2: SR-IOV with 2 VFs

# Admin: one RNGD chip split into 2 VFs
$ ray start --head
# list_devices() length = 2 -> registers "FURIOSA: 2"

import ray
ray.init()

@ray.remote(resources={"FURIOSA": 1})
class LLMServer:
    def __init__(self, model_path):
        from furiosa_llm import LLM
        self.llm = LLM(model_path)   # Ray sets a different VF as the env var for each actor

    def generate(self, prompt: str) -> str:
        return self.llm.generate([prompt])[0]

# Two actors are scheduled concurrently on different VFs
server_a = LLMServer.remote("/models/test")    # FURIOSA_DEVICES="npu:0" (VF 0)
server_b = LLMServer.remote("/models/test")       # FURIOSA_DEVICES="npu:1" (VF 1)

# Parallel execution on isolated VFs
results = ray.get([
    server_a.generate.remote("Test"),
    server_b.generate.remote("Test"),
])

-> Two workloads share one RNGD chip. Since the partitioning is already done at boot time by SR-IOV, the worker-side env var doesn't need to carry core info, so the round-trip concern doesn't surface.

@elpis-furiosa, would you mind helping confirm the following three things?

Enumeration unit of furiosa_smi_py.list_devices()

Without SR-IOV: is it one entry per physical chip?
With SR-IOV configured for N VFs: are N entries returned (one per VF)? Or just the PF, or both PF and VFs?

Default behavior of furiosa_llm.LLM(devices=None): does it automatically honor the FURIOSA_DEVICES env var, or does the value need to be passed explicitly (e.g., devices=os.environ["FURIOSA_DEVICES"])?
Legacy npu:N:cores notation: does RNGD's furiosa-llm still accept the older PE-fusion notation, or only SR-IOV VF indices (npu:N)?

If (1) matches the assumption, the current device-level model covers both scenarios as-is. If (2) matches, the user code examples stand without changes, otherwise we'd need one extra line. (3) mainly affects how I should tone down the docstring.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

elpis-furiosa · 2026-05-21T07:13:55Z

Enumeration unit of furiosa_smi_py.list_devices()

Without SR-IOV: is it one entry per physical chip?

With SR-IOV configured for N VFs: are N entries returned (one per VF)? Or just the PF, or both PF and VFs?

furiosa_smi_py.list_devices() enumerates one entry per physical chip. Since VF is an experimental feature, we'll primarily target PF in Ray for now.

Default behavior of furiosa_llm.LLM(devices=None): does it automatically honor the FURIOSA_DEVICES env var, or does the value need to be passed explicitly (e.g., devices=os.environ["FURIOSA_DEVICES"])?

furiosa_llm.LLM(devices=None) discovers and allocates all available devices via furiosa-smi — it does not automatically honor FURIOSA_DEVICES. To restrict allocation to specific devices, the list must be passed explicitly (e.g., devices=os.environ["FURIOSA_DEVICES"]).

Legacy npu:N:cores notation: does RNGD's furiosa-llm still accept the older PE-fusion notation, or only SR-IOV VF indices (npu:N)?

furiosa-llm serve accepts both notations. For clarification, here's the relevant excerpt from furiosa-llm serve --help.

  --devices DEVICES     The devices to run the model. It can be a single device or a comma-separated list of devices. Each device can be either "npu:X" or "npu:X:Y", where X is a device index and Y is a NPU core range
                        notation (e.g. "npu:0" for whole npu 0, "npu:0:0" for core 0 of NPU 0, and "npu:0:0-3" for fused core 0-3 of npu 0). If not given, all available unoccupied devices will be used.

And @elpis-furiosa would you mind helping with testing this e2e?

@Yicheng-Lu-llll Certainly. Are there any unit / E2E testing guidelines for adding new accelerators to Ray?

Yicheng-Lu-llll · 2026-05-21T21:39:10Z

@elpis-furiosa For the partition issue (npu:0:0-3), agreed, let's address it in a follow up PR. We don't have e2e testing guidelines since we don't have the hardware, so if you could help test it and share the results, and you're happy with the code, we should be good to go.

@nadongjun Could you update the doc description for points 2 and 3 you mentioned? Especially for point 2, it seems users need to set devices=os.environ["FURIOSA_DEVICES"] explicitly. Thanks! also you might needs to rebase you pr, ci failure is unrelate but need to rebase to fix.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

elpis-furiosa · 2026-05-22T09:43:14Z

I was able to verify that the following code works with RNGDs:

Code

import os
import ray
from ray.util.actor_pool import ActorPool
from furiosa_llm import LLM, SamplingParams

ray.init(resources={"FURIOSA": 2})


@ray.remote(resources={"FURIOSA": 1})
class FuriosaLLMActor:
    def __init__(self):
        print(
            "Initializing LLM with FURIOSA_DEVICES: {}".format(
                os.environ["FURIOSA_DEVICES"]
            )
        )
        self.llm = LLM(
            "furiosa-ai/Llama-3.1-8B-Instruct", devices=os.environ["FURIOSA_DEVICES"]
        )
        self.sampling_params = SamplingParams(temperature=0.5, max_tokens=1024)

    def chat(self, messages):
        outputs = self.llm.chat(messages, sampling_params=self.sampling_params)
        return [o.outputs[0].text for o in outputs]


actor_pool = ActorPool([FuriosaLLMActor.remote() for _ in range(2)])
print(
    list(
        actor_pool.map(
            lambda a, v: a.chat.remote(v),
            [
                [
                    {"role": "system", "content": "You are a helpful assistant"},
                    {"role": "user", "content": "Why is the sky blue?."},
                ],
                [
                    {"role": "system", "content": "You are a helpful assistant"},
                    {
                        "role": "user",
                        "content": "What is good for your health, water or coffee?",
                    },
                ],
            ],
        )
    )
)

Logs (RAY_DEDUP_LOGS=0)

/home/furiosa/elpis/ray/.venv/lib/python3.10/site-packages/furiosa/models/core/attention/attention.py:6: UserWarning: LTW Backend is provided temporarily for compatibility purposes only. This feature may be removed without notice in future versions. (Default behavior, set USE_WTL_BACKEND=1 to use WTL backend)
  from furiosa.models.core.attention.backends import LLMAttentionBackend
2026-05-22 18:23:29,532 INFO worker.py:2035 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(pid=849518) /home/furiosa/elpis/ray/.venv/lib/python3.10/site-packages/furiosa/models/core/attention/attention.py:6: UserWarning: LTW Backend is provided temporarily for compatibility purposes only. This feature may be removed without notice in future versions. (Default behavior, set USE_WTL_BACKEND=1 to use WTL backend)
(pid=849518)   from furiosa.models.core.attention.backends import LLMAttentionBackend
(pid=849529) /home/furiosa/elpis/ray/.venv/lib/python3.10/site-packages/furiosa/models/core/attention/attention.py:6: UserWarning: LTW Backend is provided temporarily for compatibility purposes only. This feature may be removed without notice in future versions. (Default behavior, set USE_WTL_BACKEND=1 to use WTL backend)
(pid=849529)   from furiosa.models.core.attention.backends import LLMAttentionBackend
(FuriosaLLMActor pid=849518) Initializing LLM with FURIOSA_DEVICES: npu:0
(FuriosaLLMActor pid=849529) Initializing LLM with FURIOSA_DEVICES: npu:1
(FuriosaLLMActor pid=849518) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 10571.14it/s]
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 16664.41it/s]
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:35.975243838+09:00  INFO furiosa_llm_common::artifact::types::next_gen: Loading artifact from path: /home/furiosa/.cache/huggingface/hub/models--furiosa-ai--Llama-3.1-8B-Instruct/snapshots/231d94fbc03cdd66aaeb2411697064a45f008ec7
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:36.154143104+09:00  INFO furiosa_llm_common::artifact::types::next_gen: Loading artifact from path: /home/furiosa/.cache/huggingface/hub/models--furiosa-ai--Llama-3.1-8B-Instruct/snapshots/231d94fbc03cdd66aaeb2411697064a45f008ec7
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:43.314344181+09:00  INFO furiosa_llm_common::artifact::types::commons: Loading artifact with schema version: 3.0
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:43.312929379+09:00  INFO furiosa_llm_common::artifact::types::commons: Loading artifact with schema version: 3.0
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:44.300213431+09:00  INFO furiosa::llm::engine: Loaded target artifact: SchemaVersion { major: 3, minor: 0 }
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:44.300281537+09:00  INFO furiosa::llm::engine: Parallelism Config: tp=8, pp=1, dp=1
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:44.313716413+09:00  INFO furiosa::llm::engine: Loaded target artifact: SchemaVersion { major: 3, minor: 0 }
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:44.313762504+09:00  INFO furiosa::llm::engine: Parallelism Config: tp=8, pp=1, dp=1
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:44.716205776+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Loading the target model ...
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:44.717235293+09:00  INFO device_runtime::context: Memory dump thread for Device([npu:0:0-3, npu:0:4-7]) started
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:44.708010467+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Loading the target model ...
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:44.708838951+09:00  INFO device_runtime::context: Memory dump thread for Device([npu:1:0-3, npu:1:4-7]) started
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:45.299625962+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: PP device#0 allocation plan: Binary=283.8 MiB, Model weights=15.0 GiB, Reserved IO memory=2.0 GiB
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:45.296792233+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: PP device#0 allocation plan: Binary=283.8 MiB, Model weights=15.0 GiB, Reserved IO memory=2.0 GiB
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:45.796373805+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: Resolve 47 pipeline for 1 DP groups (DP=1, PP=1) in 1.08s
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:45.796431641+09:00  INFO furiosa_generator::backing_file: Total size of parameters loaded: 15.0 GiB in 0.4967606 s
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:45.806585898+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: Resolve 47 pipeline for 1 DP groups (DP=1, PP=1) in 1.10s
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:45.806653519+09:00  INFO furiosa_generator::backing_file: Total size of parameters loaded: 15.0 GiB in 0.50978327 s
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:46.178928984+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Loading the target model took 1.462671207s
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:46.185103451+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: PP device#0 KV cache=30.2 GiB
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:46.215650206+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Loading the target model took 1.507582617s
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:46.222273219+09:00  INFO furiosa_generator::next_gen::pipeline::resolve: PP device#0 KV cache=30.2 GiB
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:46.27466863+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Computed bucket limits: max_executable_len=131072
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:46.275410191+09:00  INFO furiosa_generator::structured_output::manager: Initializing structured output manager for backend: Auto
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:46.322329339+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Computed bucket limits: max_executable_len=131072
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:46.323092381+09:00  INFO furiosa_generator::structured_output::manager: Initializing structured output manager for backend: Auto
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:47.247807151+09:00  INFO furiosa_generator::structured_output::manager: XGrammar backend is initialized.
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:47.246899128+09:00  INFO furiosa_generator::structured_output::manager: XGrammar backend is initialized.
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.37119724+09:00  INFO furiosa_generator::structured_output::manager: LLGuidance backend is initialized.
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.371233334+09:00  INFO furiosa_generator::next_gen::generator: DP entry DpId(0) → device [npu1pe0-3, npu1pe4-7]
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.38912785+09:00  INFO furiosa_generator::next_gen::scheduler::request_management::task_selector: Initializing TaskSelector([[npu:1:0-3, npu:1:4-7]]) with config: TaskSelectorConfig { enable_jit_compilation: false }, 46 AOT wired pipelines
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.393971612+09:00  INFO furiosa_generator::next_gen::scheduler::memory_manager: Initialize KVCacheManager with config: KVCacheConfig(kv_cache_memory: {[npu:1:0-3, npu:1:4-7]: Buffer { addr: 0x80000000, size: 0x78cfb8c00, device: Npu([npu:1:0-3, npu:1:4-7], Dram) }}, KVCachePlan(global_attention_config: LayerConfig(attention_type: Global, unit_block_size: 2048, block_size: 2048, num_chips: 1), global_kv_cache_tensors: 64, aux_attention_config: None, aux_kv_cache_tensors: 0)), is_prefix_cache_enabled: true
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.394175638+09:00  INFO furiosa_generator::next_gen::scheduler::memory_manager: Configured KV cache blocks, global_num_blocks: 247421, aux_num_blocks: None
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.399114574+09:00  INFO furiosa_generator::structured_output::manager: LLGuidance backend is initialized.
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.399137112+09:00  INFO furiosa_generator::next_gen::generator: DP entry DpId(0) → device [npu0pe0-3, npu0pe4-7]
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.407217965+09:00  INFO furiosa_generator::next_gen::generator: Eager scheduler has started with: SchedulerConfig { scheduler_kind: None, npu_queue_limit: 1, max_processing_samples: 65536, spare_blocks_ratio: 0.0, estimation_time_limit_ms: None, prefix_cache_config: PrefixCacheConfig { enabled: true, lookahead_requests: 2 }, experimental_scheduling_loop_type: Eager, experimental_aggressive_batching: false, max_concurrency: None, max_num_batched_tokens: None, data_parallel_routing_policy: RoundRobin }
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.407258009+09:00  INFO furiosa_generator::next_gen::generator: max_kv_len=247420 (from KV cache blocks across 1 DP device(s))
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.416559489+09:00  INFO furiosa_generator::next_gen::scheduler::request_management::task_selector: Initializing TaskSelector([[npu:0:0-3, npu:0:4-7]]) with config: TaskSelectorConfig { enable_jit_compilation: false }, 46 AOT wired pipelines
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.420056039+09:00  INFO furiosa_generator::next_gen::scheduler::memory_manager: Initialize KVCacheManager with config: KVCacheConfig(kv_cache_memory: {[npu:0:0-3, npu:0:4-7]: Buffer { addr: 0x80000000, size: 0x78cfb8c00, device: Npu([npu:0:0-3, npu:0:4-7], Dram) }}, KVCachePlan(global_attention_config: LayerConfig(attention_type: Global, unit_block_size: 2048, block_size: 2048, num_chips: 1), global_kv_cache_tensors: 64, aux_attention_config: None, aux_kv_cache_tensors: 0)), is_prefix_cache_enabled: true
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.420132965+09:00  INFO furiosa_generator::next_gen::scheduler::memory_manager: Configured KV cache blocks, global_num_blocks: 247421, aux_num_blocks: None
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.429961633+09:00  INFO furiosa_generator::next_gen::generator: Eager scheduler has started with: SchedulerConfig { scheduler_kind: None, npu_queue_limit: 1, max_processing_samples: 65536, spare_blocks_ratio: 0.0, estimation_time_limit_ms: None, prefix_cache_config: PrefixCacheConfig { enabled: true, lookahead_requests: 2 }, experimental_scheduling_loop_type: Eager, experimental_aggressive_batching: false, max_concurrency: None, max_num_batched_tokens: None, data_parallel_routing_policy: RoundRobin }
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.430004409+09:00  INFO furiosa_generator::next_gen::generator: max_kv_len=247420 (from KV cache blocks across 1 DP device(s))
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.591132838+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Num samples received: 1
(FuriosaLLMActor pid=849518) 2026-05-22T18:23:48.591527234+09:00  INFO device_runtime::alloc::cpu: Support for huge page size of 2 MiB has been detected.
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.571864594+09:00  INFO furiosa_generator::next_gen::hf_compat_next_gen: Num samples received: 1
(FuriosaLLMActor pid=849529) 2026-05-22T18:23:48.572620078+09:00  INFO device_runtime::alloc::cpu: Support for huge page size of 2 MiB has been detected.
[["The sky appears blue because of a phenomenon called scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen (N2) and oxygen (O2). These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.\n\nThis is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described the phenomenon in the late 19th century. The scattered blue light is then dispersed throughout the atmosphere, giving the sky its blue appearance.\n\nHere's a simplified explanation of the process:\n\n1. Sunlight enters the atmosphere as a mixture of all colors (white light).\n2. The shorter blue wavelengths are scattered more than the longer red wavelengths by the tiny molecules in the atmosphere.\n3. The scattered blue light is dispersed in all directions, reaching our eyes from all parts of the sky.\n4. Our eyes perceive the scattered blue light as the color of the sky.\n\nIt's worth noting that the exact shade of blue we see in the sky can vary depending on factors such as:\n\n* Time of day: During sunrise and sunset, the sky can take on hues of red, orange, and pink due to the scattering of light by atmospheric particles.\n* Atmospheric conditions: Pollution, dust, and water vapor in the atmosphere can affect the color of the sky.\n* Altitude: The sky can appear more intense blue at higher elevations due to the thinner atmosphere.\n\nOverall, the blue color of the sky is a result of the scattering of sunlight by the tiny molecules in the atmosphere, creating a breathtaking and ever-changing visual experience."], ["Both water and coffee have their own benefits and drawbacks when it comes to health. Here's a comparison:\n\n**Water:**\n\n1. **Hydration**: Water is essential for maintaining proper hydration and bodily functions.\n2. **Weight management**: Drinking water can help with weight loss and maintenance by suppressing appetite and increasing metabolism.\n3. **Flushes toxins**: Water helps to flush out toxins and waste products from the body.\n4. **Skin health**: Drinking enough water can improve skin health and reduce the appearance of wrinkles.\n5. **Exercise performance**: Proper hydration is essential for exercise performance and recovery.\n\n**Coffee:**\n\n1. **Cognitive function**: Caffeine in coffee can improve alertness, focus, and cognitive function.\n2. **Neuroprotection**: Moderate coffee consumption may have neuroprotective effects and reduce the risk of Parkinson's disease, Alzheimer's disease, and other neurodegenerative disorders.\n3. **Cardiovascular health**: Moderate coffee consumption may lower the risk of stroke, type 2 diabetes, and certain types of cancer.\n4. **Mood booster**: Caffeine can improve mood and reduce the risk of depression.\n5. **Antioxidants**: Coffee contains antioxidants, which can help protect cells from damage.\n\n**Key differences:**\n\n1. **Calorie content**: Water is calorie-free, while coffee contains calories, especially when added with sugar and cream.\n2. **Sleep**: Drinking coffee can interfere with sleep, while water is generally not a sleep disruptor.\n3. **Additives**: Coffee often contains added sugars, creamers, and syrups, which can greatly increase calorie intake.\n\n**The verdict:**\n\nWater is essential for maintaining proper hydration and overall health. Coffee, in moderation, can have cognitive and cardiovascular benefits, but it's essential to be mindful of calorie intake and potential sleep disruptions.\n\n**The ideal balance:**\n\n1. **Drink at least 8 cups of water per day**.\n2. **Consume coffee in moderation** (1-2 cups per day).\n3. **Avoid adding excessive sugar, cream, or syrup** to your coffee.\n4. **Monitor your sleep and adjust your coffee consumption accordingly**.\n\nRemember, a balanced lifestyle that includes a mix of water, coffee, and other healthy habits is key to maintaining overall health and well-being."]]

elpis-furiosa

Thank you for your work. Looks good to me!

@Yicheng-Lu-llll Is there anything else other than the result in this comment that you want me to verify?

Yicheng-Lu-llll

LGTM, Thanks for all the contribution!

cc @edoakes for merge.

nadongjun · 2026-05-27T06:25:50Z

@edoakes The premerge failure was unrelated to this PR (a CI infra issue already fixed on master). I've rebased to pick it up. Could you check and merge it when you get a chance?

nadongjun added 3 commits April 30, 2026 17:24

[Core] Add AcceleratorManager implementation for Furiosa AI NPU

e95eb62

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

[Core] Clarify Furiosa NPU docstring with RNGD family SKUs

0ea9cd1

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

[Core] Replace 'RNGD family' with SDK Arch enum terminology in docstring

180afc1

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun requested a review from a team as a code owner April 30, 2026 09:01

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread python/ray/_private/accelerators/furiosa.py

Comment thread python/ray/_private/accelerators/furiosa.py

Comment thread python/ray/_private/accelerators/furiosa.py

fix

d167701

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread python/ray/_private/accelerators/furiosa.py

fix

2e71095

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

ray-gardener Bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Apr 30, 2026

nadongjun added 4 commits May 1, 2026 19:30

fix

039857b

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

fix

6867e54

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Merge branch 'master' into core/add-furiosa-accelerator-manager

3e47532

Merge branch 'master' into core/add-furiosa-accelerator-manager

20c2cbe

elpis-furiosa reviewed May 8, 2026

View reviewed changes

Comment thread python/ray/_private/accelerators/furiosa.py Outdated

Comment thread python/ray/_private/accelerators/furiosa.py Outdated

Comment thread python/ray/tests/accelerators/test_furiosa.py

nadongjun and others added 2 commits May 9, 2026 09:08

Update python/ray/_private/accelerators/furiosa.py

6b62699

Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Update python/ray/_private/accelerators/furiosa.py

57417d8

Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread python/ray/tests/accelerators/mock_furiosa_smi_py.py

fix

cae315e

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread python/ray/tests/accelerators/test_furiosa.py

nadongjun added 2 commits May 9, 2026 09:37

lint

64f919d

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Merge branch 'master' into core/add-furiosa-accelerator-manager

2c2b8e2

Yicheng-Lu-llll self-assigned this May 19, 2026

Yicheng-Lu-llll reviewed May 20, 2026

View reviewed changes

nadongjun added 2 commits May 20, 2026 12:34

init once and edit docs example

bf64b87

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

switch init guard to lru_cache

6c6245e

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Yicheng-Lu-llll reviewed May 20, 2026

View reviewed changes

Comment thread python/ray/tests/accelerators/test_furiosa.py Outdated

Comment thread python/ray/tests/accelerators/test_furiosa.py Outdated

Yicheng-Lu-llll reviewed May 20, 2026

View reviewed changes

nadongjun and others added 3 commits May 21, 2026 10:57

add arch SKU test cases and FURIOSA_RNGD constant

7bbc9b7

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

parametrize visible accelerator ids test

338435e

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

add Furiosa to Tasks/Actors and Fractional Accelerators docs

29d6a1c

Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun added 2 commits May 22, 2026 08:25

Merge branch 'master' into core/add-furiosa-accelerator-manager

b672337

clarify Furiosa device handling in docs and docstrings

1fd7d18

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

elpis-furiosa approved these changes May 26, 2026

View reviewed changes

Yicheng-Lu-llll added the go add ONLY when ready to merge, run all tests label May 26, 2026

Yicheng-Lu-llll approved these changes May 26, 2026

View reviewed changes

edoakes enabled auto-merge (squash) May 26, 2026 19:24

Merge branch 'master' into core/add-furiosa-accelerator-manager

43634f1

github-actions Bot disabled auto-merge May 27, 2026 06:17

edoakes merged commit 5a7eb2d into ray-project:master May 27, 2026
6 checks passed

Conversation

nadongjun commented Apr 30, 2026

Description

Usage Examples

Related issues

Additional information

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nadongjun commented May 7, 2026

Uh oh!

elpis-furiosa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nadongjun commented May 9, 2026

Uh oh!

elpis-furiosa commented May 11, 2026

Uh oh!

Yicheng-Lu-llll commented May 19, 2026

Uh oh!

Yicheng-Lu-llll left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yicheng-Lu-llll left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nadongjun commented May 21, 2026

Assumption: furiosa_smi_py.list_devices() returns entries depending on whether SR-IOV is configured:

Scenario 1: SR-IOV not configured (one RNGD chip used as a whole)

Scenario 2: SR-IOV with 2 VFs

Uh oh!

elpis-furiosa commented May 21, 2026

Uh oh!

Yicheng-Lu-llll commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elpis-furiosa commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elpis-furiosa left a comment

Choose a reason for hiding this comment

Yicheng-Lu-llll left a comment •

edited

Loading

Yicheng-Lu-llll left a comment •

edited

Loading

Yicheng-Lu-llll commented May 21, 2026 •

edited

Loading

elpis-furiosa commented May 22, 2026 •

edited

Loading