[Docs] Fix llamafactory fine-tune template#62519
[Docs] Fix llamafactory fine-tune template#62519matthewdeng merged 7 commits intoray-project:masterfrom
Conversation
Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request adds support for running LLaMA-Factory fine-tuning as an Anyscale job, introducing a new tutorial notebook and configuration templates. It refactors existing notebooks to use external YAML configurations and pins the llamafactory version to 0.9.3. Review feedback identifies a path mismatch in the dataset registry within the new job notebook and suggests adding instructional comments to the job configuration template to improve adaptability.
| # Copy and adapt train-configs/sft_lora_deepspeed.yaml to /mnt/user_storage/, | ||
| # updating paths (dataset_dir, deepspeed, ray_storage_path) from | ||
| # /mnt/cluster_storage/ to /mnt/user_storage/. | ||
| entrypoint: llamafactory-cli train /mnt/user_storage/sft_lora_deepspeed.yaml |
There was a problem hiding this comment.
The entrypoint is hardcoded to the SFT training configuration. Since this template is intended to be adaptable for other methods (DPO, KTO, CPT) as mentioned in the accompanying notebook, it would be beneficial to add a comment reminding users to update this path if they swap the training configuration.
| "\n", | ||
| "## Step 2: Prepare shared storage\n", | ||
| "\n", | ||
| "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`." |
There was a problem hiding this comment.
The instructions should mention that the dataset registry (dataset_info.json) also needs to be adapted, as it contains absolute paths that must be updated to match the job's storage environment.
Copy the required files to /mnt/user_storage/ via a running workspace. You also need to copy and adapt the dataset registry and training config, updating paths from /mnt/cluster_storage/ to /mnt/user_storage/.
…m-fine-tune/notebooks/run_as_job.ipynb Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jason Ding <jding@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit fd56733. Configure here.
Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
Signed-off-by: as-jding <jding@anyscale.com>
035f3e1 to
041bb2d
Compare
| "id": "0e4fcd2d", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Step 3: Create the job config" |
There was a problem hiding this comment.
in step 3, please mention again to change the image_uri
| "\n", | ||
| "## Step 2: Prepare shared storage\n", | ||
| "\n", | ||
| "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`." |
There was a problem hiding this comment.
can you explain why updating paths from /mnt/cluster_storage/ to /mnt/user_storage/, sth like "A workspace runs on its own cluster, and a job typically runs on a separate execution cluster. See Shared storage on Anyscale for more details."
| "\n", | ||
| "## Step 2: Prepare shared storage\n", | ||
| "\n", | ||
| "Copy the required files to `/mnt/user_storage/` via a running workspace. You also need to copy and adapt the training config, updating paths from `/mnt/cluster_storage/` to `/mnt/user_storage/`." |
| "source": [ | ||
| "## Output\n", | ||
| "\n", | ||
| "Training output (checkpoints, loss plots) is saved to the `output_dir` within the job's working directory, and Ray Train results are stored at the `ray_storage_path` (`/mnt/user_storage/`)." |
There was a problem hiding this comment.
can you add more description of where is the job's working directory located?

Remove hf_transfer dependency from all notebooks, train configs, CI build scripts, and BYOD scripts
Add new run_as_job.ipynb notebook with step-by-step guide for running LlamaFactory as an Anyscale job
Add job-configs/job.yaml for standalone job submission
Simplify file-copy steps in SFT and CPT notebooks
Replace inline YAML/JSON config blocks in notebook markdown with cat code cells