[Docs] Fix llamafactory fine-tune template by as-jding · Pull Request #62519 · ray-project/ray

as-jding · 2026-04-10T22:49:33Z

Remove hf_transfer dependency from all notebooks, train configs, CI build scripts, and BYOD scripts
Add new run_as_job.ipynb notebook with step-by-step guide for running LlamaFactory as an Anyscale job
Add job-configs/job.yaml for standalone job submission
Simplify file-copy steps in SFT and CPT notebooks
Replace inline YAML/JSON config blocks in notebook markdown with cat code cells

Signed-off-by: as-jding <jding@anyscale.com>

gemini-code-assist

Code Review

This pull request adds support for running LLaMA-Factory fine-tuning as an Anyscale job, introducing a new tutorial notebook and configuration templates. It refactors existing notebooks to use external YAML configurations and pins the llamafactory version to 0.9.3. Review feedback identifies a path mismatch in the dataset registry within the new job notebook and suggests adding instructional comments to the job configuration template to improve adaptability.

gemini-code-assist · 2026-04-10T22:51:47Z

+# Copy and adapt train-configs/sft_lora_deepspeed.yaml to /mnt/user_storage/,
+# updating paths (dataset_dir, deepspeed, ray_storage_path) from
+# /mnt/cluster_storage/ to /mnt/user_storage/.
+entrypoint: llamafactory-cli train /mnt/user_storage/sft_lora_deepspeed.yaml


The entrypoint is hardcoded to the SFT training configuration. Since this template is intended to be adaptable for other methods (DPO, KTO, CPT) as mentioned in the accompanying notebook, it would be beneficial to add a comment reminding users to update this path if they swap the training configuration.

gemini-code-assist · 2026-04-10T22:51:47Z

+    "\n",
+    "## Step 2: Prepare shared storage\n",
+    "\n",
+    "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."


The instructions should mention that the dataset registry (dataset_info.json) also needs to be adapted, as it contains absolute paths that must be updated to match the job's storage environment.

Copy the required files to /mnt/user_storage/ via a running workspace. You also need to copy and adapt the dataset registry and training config, updating paths from /mnt/cluster_storage/ to /mnt/user_storage/.

…m-fine-tune/notebooks/run_as_job.ipynb Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jason Ding <jding@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit fd56733. Configure here.}

Signed-off-by: as-jding <jding@anyscale.com>

kunling-anyscale · 2026-04-13T17:32:03Z

+   "id": "0e4fcd2d",
+   "metadata": {},
+   "source": [
+    "## Step 3: Create the job config"


in step 3, please mention again to change the image_uri

kunling-anyscale · 2026-04-13T17:35:54Z

+    "\n",
+    "## Step 2: Prepare shared storage\n",
+    "\n",
+    "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."


can you explain why updating paths from /mnt/cluster_storage/ to /mnt/user_storage/, sth like "A workspace runs on its own cluster, and a job typically runs on a separate execution cluster. See Shared storage on Anyscale for more details."

kunling-anyscale · 2026-04-13T17:36:21Z

+    "\n",
+    "## Step 2: Prepare shared storage\n",
+    "\n",
+    "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."


kunling-anyscale · 2026-04-13T17:38:53Z

+   "source": [
+    "## Output\n",
+    "\n",
+    "Training output (checkpoints, loss plots) is saved to the `output_dir` within the job's working directory, and Ray Train results are stored at the `ray_storage_path` (`/mnt/user_storage/`)."


can you add more description of where is the job's working directory located?

Signed-off-by: as-jding <jding@anyscale.com>

as-jding added 2 commits April 10, 2026 15:44

DELAY_CLEANUP fix template llamafactory

492d69a

Signed-off-by: as-jding <jding@anyscale.com>

DELAY_CLEANUP fix byod version

dfdb427

Signed-off-by: as-jding <jding@anyscale.com>

as-jding added the go add ONLY when ready to merge, run all tests label Apr 10, 2026

as-jding requested a review from a team as a code owner April 10, 2026 22:49

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

DELAY_CLEANUP Update doc/source/ray-overview/examples/llamafactory-ll…

1e7e116

…m-fine-tune/notebooks/run_as_job.ipynb Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jason Ding <jding@anyscale.com>

cursor bot reviewed Apr 10, 2026

View reviewed changes

Comment thread doc/source/ray-overview/examples/llamafactory-llm-fine-tune/notebooks/run_as_job.ipynb Outdated

cursor bot reviewed Apr 10, 2026

View reviewed changes

Comment thread doc/source/ray-overview/examples/llamafactory-llm-fine-tune/train-configs/dpo_qlora.yaml Outdated

as-jding added 3 commits April 10, 2026 16:51

DELAY_CLEANUP fix broken json

cceafe5

Signed-off-by: as-jding <jding@anyscale.com>

DELAY_CLEANUP fix empty env_vars

6699a78

Signed-off-by: as-jding <jding@anyscale.com>

DELAY_CLEANUP add empty env comment

041bb2d

Signed-off-by: as-jding <jding@anyscale.com>

as-jding force-pushed the fix-template-llama-factory branch from 035f3e1 to 041bb2d Compare April 10, 2026 23:52

ray-gardener bot added docs An issue or change related to documentation train Ray Train Related Issue labels Apr 11, 2026

as-jding requested review from kunling-anyscale and matthewdeng April 13, 2026 16:48

kunling-anyscale requested changes Apr 13, 2026

View reviewed changes

update

6f026a4

Signed-off-by: as-jding <jding@anyscale.com>

kunling-anyscale self-requested a review April 15, 2026 00:33

kunling-anyscale approved these changes Apr 15, 2026

View reviewed changes

matthewdeng approved these changes Apr 16, 2026

View reviewed changes

matthewdeng merged commit 4cf90d3 into ray-project:master Apr 16, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Fix llamafactory fine-tune template#62519

[Docs] Fix llamafactory fine-tune template#62519
matthewdeng merged 7 commits intoray-project:masterfrom
as-jding:fix-template-llama-factory

as-jding commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

kunling-anyscale Apr 13, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

kunling-anyscale Apr 13, 2026

Uh oh!

kunling-anyscale Apr 13, 2026

Uh oh!

kunling-anyscale Apr 13, 2026

Uh oh!

kunling-anyscale Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

as-jding commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

kunling-anyscale Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kunling-anyscale Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kunling-anyscale Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kunling-anyscale Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kunling-anyscale Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants