Skip to content

[Docs] Fix llamafactory fine-tune template#62519

Merged
matthewdeng merged 7 commits intoray-project:masterfrom
as-jding:fix-template-llama-factory
Apr 16, 2026
Merged

[Docs] Fix llamafactory fine-tune template#62519
matthewdeng merged 7 commits intoray-project:masterfrom
as-jding:fix-template-llama-factory

Conversation

@as-jding
Copy link
Copy Markdown
Contributor

Remove hf_transfer dependency from all notebooks, train configs, CI build scripts, and BYOD scripts
Add new run_as_job.ipynb notebook with step-by-step guide for running LlamaFactory as an Anyscale job
Add job-configs/job.yaml for standalone job submission
Simplify file-copy steps in SFT and CPT notebooks
Replace inline YAML/JSON config blocks in notebook markdown with cat code cells

Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
@as-jding as-jding added the go add ONLY when ready to merge, run all tests label Apr 10, 2026
@as-jding as-jding requested a review from a team as a code owner April 10, 2026 22:49
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for running LLaMA-Factory fine-tuning as an Anyscale job, introducing a new tutorial notebook and configuration templates. It refactors existing notebooks to use external YAML configurations and pins the llamafactory version to 0.9.3. Review feedback identifies a path mismatch in the dataset registry within the new job notebook and suggests adding instructional comments to the job configuration template to improve adaptability.

# Copy and adapt train-configs/sft_lora_deepspeed.yaml to /mnt/user_storage/,
# updating paths (dataset_dir, deepspeed, ray_storage_path) from
# /mnt/cluster_storage/ to /mnt/user_storage/.
entrypoint: llamafactory-cli train /mnt/user_storage/sft_lora_deepspeed.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The entrypoint is hardcoded to the SFT training configuration. Since this template is intended to be adaptable for other methods (DPO, KTO, CPT) as mentioned in the accompanying notebook, it would be beneficial to add a comment reminding users to update this path if they swap the training configuration.

"\n",
"## Step 2: Prepare shared storage\n",
"\n",
"Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instructions should mention that the dataset registry (dataset_info.json) also needs to be adapted, as it contains absolute paths that must be updated to match the job's storage environment.

Copy the required files to /mnt/user_storage/ via a running workspace. You also need to copy and adapt the dataset registry and training config, updating paths from /mnt/cluster_storage/ to /mnt/user_storage/.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

…m-fine-tune/notebooks/run_as_job.ipynb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Jason Ding <jding@anyscale.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit fd56733. Configure here.

Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
@as-jding as-jding force-pushed the fix-template-llama-factory branch from 035f3e1 to 041bb2d Compare April 10, 2026 23:52
@ray-gardener ray-gardener bot added docs An issue or change related to documentation train Ray Train Related Issue labels Apr 11, 2026
"id": "0e4fcd2d",
"metadata": {},
"source": [
"## Step 3: Create the job config"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in step 3, please mention again to change the image_uri

"\n",
"## Step 2: Prepare shared storage\n",
"\n",
"Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why updating paths from /mnt/cluster_storage/ to /mnt/user_storage/, sth like "A workspace runs on its own cluster, and a job typically runs on a separate execution cluster. See Shared storage on Anyscale for more details."

"\n",
"## Step 2: Prepare shared storage\n",
"\n",
"Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

"source": [
"## Output\n",
"\n",
"Training output (checkpoints, loss plots) is saved to the `output_dir` within the job's working directory, and Ray Train results are stored at the `ray_storage_path` (`/mnt/user_storage/`)."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more description of where is the job's working directory located?

Signed-off-by: as-jding <jding@anyscale.com>
@kunling-anyscale kunling-anyscale self-requested a review April 15, 2026 00:33
@matthewdeng matthewdeng merged commit 4cf90d3 into ray-project:master Apr 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants