[Core] Add Support for Furiosa AI NPU#63035
Conversation
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request adds support for Furiosa AI NPUs by implementing the FuriosaAcceleratorManager for device detection and resource management. The changes include logic for architecture identification, environment variable configuration for visible devices, and comprehensive unit tests with mocks for the Furiosa SMI SDK. Review feedback identifies a bug in architecture detection that could return an incorrect string, suggests implementing get_current_node_accelerator_labels for improved dashboard visibility, and recommends updating typing imports.
|
@rueian @ryanaoleary @edoakes Gentle ping. Any thoughts on this? |
elpis-furiosa
left a comment
There was a problem hiding this comment.
Thank you for working on Ray support for Furiosa RNGD. I stumbled upon this PR while researching ways to support new accelerators. The code looks fine at a glance, but I'll test it on actual RNGD hardware just to make sure.
Also, it will be great if an example similar to those for other accelerators in the Using accelerators in Tasks and Actors section of the document. I can take a look at adding one as a follow-up PR.
Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit cae315e. Configure here.
@elpis-furiosa Thanks for the review. I've applied all the suggested changes. Most of the accelerators Ray supports don't seem to have real hardware tests, so verifying RNGD on actual hardware would be a meaningful contribution for end users. If you'd like, I'm happy to hand off not just the follow-up PR but ownership of this PR as well ( If you're interested, please leave a comment here or DM me on the Ray Slack (handle: Dongjun Na). |
Thanks for your offer, @nadongjun. Looking at the scope of this PR, you've already done most of the implementation work, so rather than taking over ownership, we'd prefer to focus on providing hardware testing support on actual RNGD devices. Any additional contributions from our side will follow as separate PRs. |
|
Thank you so much for the contribution, @nadongjun! I'll take a look by tmrw. |
| token = token.strip() | ||
| if token.startswith(_FURIOSA_DEVICE_PREFIX): | ||
| token = token[len(_FURIOSA_DEVICE_PREFIX) :] | ||
| # ``furiosa-llm`` allows ``npu:0:0-3`` to address a core range; we keep |
There was a problem hiding this comment.
Just wanted to confirm my understanding, this npu:0:0-3 format only shows up and gets used at ray node startup, right? So ray can safely treat it as a full device. And we don't really support core level scheduling.
There was a problem hiding this comment.
Yes, that's the case. The :cores part only appears when users set FURIOSA_DEVICES themselves, so the same value can be passed straight through to furiosa-llm --devices.
Ray strips the suffix before scheduling and only operates at the device level, so core-level scheduling isn't supported.
There was a problem hiding this comment.
But in this case, for a detailed example:
- A user sets
FURIOSA_DEVICES=npu:0:0-3at node startup. - Ray would only see
npu:0. - So, when we start a Ray worker on this node, Ray will set the env var to
FURIOSA_DEVICES=npu:0(viaset_current_process_visible_accelerator_ids). - furiosa sdk reads
npu:0instead ofnpu:0:0-3.
For reference, NVIDIA solves the same subdevice partitioning problem with MIG by giving each instance its own id and listing them flat: CUDA_VISIBLE_DEVICES=MIG-GPU-abc-1,MIG-GPU-abc-2 and ray treats each MIG instance as a whole device.
I'm thinking we either do the similar way, or just add a doc note saying please don't use things like npu:0:0-3.
There was a problem hiding this comment.
Thanks for the explanation. It would be better if FURIOSA_DEVICES can be partitioned, as in the case for CUDA_VISIBLE_DEVICES. To elaborate, Furiosa RNGD has 8PEs (Processing Elements) on one PCIe card, and they are dented as npu:{chip_id}:{pe_id}. Adjacent PEs can be fused to work together.
Can this issue be addressed in a different PR? I'll make a follow-up PR for this issue.
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Thanks for the update! The only concern I have now is: #63035 (comment). Otherwise LGTM.
And @elpis-furiosa would you mind helping with testing this e2e?
| Trouble only occurs if those tasks and actors | ||
| attempt to actually use accelerators that don't exist. | ||
|
|
||
| Using accelerators in Tasks and Actors |
There was a problem hiding this comment.
Should we also consider adding furiosa to Sections 2 & 3?
There was a problem hiding this comment.
- 7bbc9b7 adds the in-dev arch cases (RngdMax/RngdS/RngdPlus and rngd-max/rngd+) and the FURIOSA_RNGD constant. Kept the other SKUs out of accelerators.py since names may still change.
- 338435e refactors test_get_current_process_visible_accelerator_ids to @pytest.mark.parametrize.
For the Tasks/Actors section example and the e2e testing on actual RNGD hardware, @elpis-furiosa offered to handle both earlier. Would it be OK to take care of them in a separate follow-up PR?
There was a problem hiding this comment.
For the Tasks/Actors section example and the e2e testing on actual RNGD hardware, @elpis-furiosa offered to handle both earlier. Would it be OK to take care of them in a separate follow-up PR?
I ran the following example on our hardware, which is similar to the examples of other accelerators:
(ray) ➜ ray git:(338435efad) ✗ cat ray_test.py
import os
import ray
ray.init(resources={"FURIOSA": 2})
@ray.remote(resources={"FURIOSA": 1})
class RNGDActor:
def ping(self):
print("RNGD IDs: {}".format(ray.get_runtime_context().get_accelerator_ids()["FURIOSA"]))
print("FURIOSA_DEVICES: {}".format(os.environ["FURIOSA_DEVICES"]))
@ray.remote(resources={"FURIOSA": 1})
def rngd_task():
print("RNGD IDs: {}".format(ray.get_runtime_context().get_accelerator_ids()["FURIOSA"]))
print("FURIOSA_DEVICES: {}".format(os.environ["FURIOSA_DEVICES"]))
rngd_actor = RNGDActor.remote()
ray.get(rngd_actor.ping.remote())
# The actor uses the first RNGD so the task uses the second one.
ray.get(rngd_task.remote())
(ray) ➜ ray git:(338435efad) ✗ python3 ray_test.py
2026-05-21 15:52:42,642 INFO worker.py:2018 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(RNGDActor pid=693909) RNGD IDs: ['0']
(RNGDActor pid=693909) FURIOSA_DEVICES: npu:0
(rngd_task pid=693883) RNGD IDs: ['1']
(rngd_task pid=693883) FURIOSA_DEVICES: npu:1
@nadongjun If you like, you can add this code as an example.
There was a problem hiding this comment.
Thanks!, added the example with you as co-author. Marked Fractional Accelerators as unsupported for now
let me know if I should change that.
|
@Yicheng-Lu-llll The Looking at the RNGD docs again, the partitioning model is completely different:
So instead of software-level PE fusion from the older version, RNGD splits a single chip into 2/4/8 VFs via hardware SR-IOV. Since the current furiosa-llm source isn't public, I can't directly verify the env var parsing behavior, so let me lay out the scenarios under some assumptions for clarity. Assumption: furiosa_smi_py.list_devices() returns entries depending on whether SR-IOV is configured:Without SR-IOV: one entry per physical chip With N VFs configured: one entry per VF Scenario 1: SR-IOV not configured (one RNGD chip used as a whole)# Admin: no SR-IOV partitioning
$ ray start --head
# FuriosaAcceleratorManager detects list_devices() length = 1 → registers "FURIOSA: 1"import ray
ray.init()
@ray.remote(resources={"FURIOSA": 1})
class LLMServer:
def __init__(self, model_path):
from furiosa_llm import LLM
self.llm = LLM(model_path)
# Ray sets FURIOSA_DEVICES="npu:0" beforehand,
# so furiosa-llm uses only that NPU (assumed)
def generate(self, prompt: str) -> str:
return self.llm.generate([prompt])[0]
server_a = LLMServer.remote("/models/test")
print(ray.get(server_a.generate.remote("Test")))
# Second actor: no free NPU -> stays pending
server_b = LLMServer.remote("/models/test")-> One actor per node. The RNGD chip is owned entirely by a single workload. Scenario 2: SR-IOV with 2 VFs# Admin: one RNGD chip split into 2 VFs
$ ray start --head
# list_devices() length = 2 -> registers "FURIOSA: 2"import ray
ray.init()
@ray.remote(resources={"FURIOSA": 1})
class LLMServer:
def __init__(self, model_path):
from furiosa_llm import LLM
self.llm = LLM(model_path) # Ray sets a different VF as the env var for each actor
def generate(self, prompt: str) -> str:
return self.llm.generate([prompt])[0]
# Two actors are scheduled concurrently on different VFs
server_a = LLMServer.remote("/models/test") # FURIOSA_DEVICES="npu:0" (VF 0)
server_b = LLMServer.remote("/models/test") # FURIOSA_DEVICES="npu:1" (VF 1)
# Parallel execution on isolated VFs
results = ray.get([
server_a.generate.remote("Test"),
server_b.generate.remote("Test"),
])-> Two workloads share one RNGD chip. Since the partitioning is already done at boot time by SR-IOV, the worker-side env var doesn't need to carry core info, so the round-trip concern doesn't surface. @elpis-furiosa, would you mind helping confirm the following three things?
If (1) matches the assumption, the current device-level model covers both scenarios as-is. If (2) matches, the user code examples stand without changes, otherwise we'd need one extra line. (3) mainly affects how I should tone down the docstring. |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: Sukchul Cho <sukchul.cho@furiosa.ai> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@Yicheng-Lu-llll Certainly. Are there any unit / E2E testing guidelines for adding new accelerators to Ray? |
|
@elpis-furiosa For the partition issue (npu:0:0-3), agreed, let's address it in a follow up PR. We don't have e2e testing guidelines since we don't have the hardware, so if you could help test it and share the results, and you're happy with the code, we should be good to go. @nadongjun Could you update the doc description for points 2 and 3 you mentioned? Especially for point 2, it seems users need to set |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
I was able to verify that the following code works with RNGDs: CodeLogs (RAY_DEDUP_LOGS=0) |
elpis-furiosa
left a comment
There was a problem hiding this comment.
Thank you for your work. Looks good to me!
@Yicheng-Lu-llll Is there anything else other than the result in this comment that you want me to verify?
Yicheng-Lu-llll
left a comment
There was a problem hiding this comment.
LGTM, Thanks for all the contribution!
cc @edoakes for merge.
|
@edoakes The premerge failure was unrelated to this PR (a CI infra issue already fixed on master). I've rebased to pick it up. Could you check and merge it when you get a chance? |

Description
As a user of Furiosa AI's RNGD NPUs on Ray, I've found that management within Ray remains manual and error-prone, despite existing support in vLLM and Hugging Face.
This PR introduces first-class support for Furiosa AI NPUs (specifically the RNGD family: RNGD-S, RNGD, RNGD-Max, RNGD+) into Ray's accelerator management framework. This integration follows the established patterns used for other NPUs.
Currently, Furiosa RNGD is gaining traction in the LLM ecosystem with support in vLLM, Hugging Face Optimum (optimum-furiosa), and Kubernetes. However, Ray users running production inference workloads still face manual overhead:
Manual Resource Tagging: Users must pass --resources='{"FURIOSA": N}' to ray start as Ray lacks auto-detection.
Manual Device Isolation: Users have to manually manage FURIOSA_VISIBLE_DEVICES to prevent resource contention between actors.
Lack of SKU Awareness: There is no native way to target specific chip architectures (e.g., RNGD-Max vs. RNGD-S) using accelerator_type.
Usage Examples
Related issues
Additional information
Product: Furiosa AI RNGD
SDK: furiosa-smi-py
Integrations: optimum-furiosa