Add mm data example by yubofredwang · Pull Request #3 · lightseekorg/TorchSpec

yubofredwang · 2026-02-26T02:13:00Z

This pull request updates the dataset configuration and documentation to support OpenAI-style conversations with multimodal and tool-call capabilities. It also adds new sample data covering a range of scenarios, and clarifies the expected dataset format for Kimi-K2.5 training setups.

Dataset format and documentation updates:

Added detailed dataset format explanations and multimodal conversation examples to examples/kimi-k25-2node-h200/README.md and examples/kimi-k25-3node-h100/README.md, including guidance on using <think>...</think> blocks, image content, and tool calls. [1] [2]
Clarified that each dataset row should contain a conversations field, and explained how multimodal content is handled for SGLang engine training. [1] [2]

Configuration changes for dataset compatibility:

Changed the prompt_key from messages to conversations in both configs/sglang_kimi_k25_2node.yaml and configs/sglang_kimi_k25_3node.yaml to match the new dataset format. [1] [2]

Sample data additions:

Added new samples to examples/data/sample_conversations.jsonl covering text-only, multimodal (single/multi-image), system messages, multi-turn, and tool calls, providing comprehensive examples for model training.

Copilot

Pull request overview

This PR updates the Kimi-K2.5 SGLang training examples to use an OpenAI-style conversations column (including multimodal and tool-call examples), and aligns the provided example configs and sample dataset accordingly.

Changes:

Documented the expected conversations JSONL dataset format (including multimodal + tool-call examples) in the 2-node and 3-node Kimi-K2.5 READMEs.
Updated Kimi-K2.5 SGLang example configs to use prompt_key: conversations (instead of messages).
Added new kimi_* example rows to examples/data/sample_conversations.jsonl covering text, images, multi-image, multi-turn, and tool calls.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
examples/kimi-k25-3node-h100/README.md	Adds dataset format guidance and a multimodal OpenAI-style example for the 3-node setup.
examples/kimi-k25-2node-h200/README.md	Adds dataset format guidance and a multimodal OpenAI-style example for the 2-node setup.
examples/data/sample_conversations.jsonl	Appends new `kimi_*` sample conversations including multimodal and tool-call scenarios.
configs/sglang_kimi_k25_3node.yaml	Switches dataset `prompt_key` to `conversations` to match the documented dataset schema.
configs/sglang_kimi_k25_2node.yaml	Switches dataset `prompt_key` to `conversations` to match the documented dataset schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* refactor mooncake * fix * fix

* add mm data * split back

add mm data

9714162

yubofredwang marked this pull request as ready for review February 26, 2026 02:13

yubofredwang requested a review from Copilot February 26, 2026 02:13

Copilot started reviewing on behalf of yubofredwang February 26, 2026 02:13 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Comment thread examples/kimi-k25-3node-h100/README.md Outdated

Comment thread examples/kimi-k25-3node-h100/README.md

Comment thread examples/kimi-k25-2node-h200/README.md Outdated

Comment thread examples/kimi-k25-2node-h200/README.md

split back

2be87d3

yubofredwang merged commit 8974d0e into main Feb 26, 2026
1 check passed

torchspec-bot deleted the ywang/add-mm-doc branch February 26, 2026 06:29

cicirori pushed a commit to cicirori/TorchSpec that referenced this pull request Feb 26, 2026

reactor mooncake (lightseekorg#3)

d38a013

* refactor mooncake * fix * fix

zhubohao911 pushed a commit to zhubohao911/TorchSpec that referenced this pull request Mar 23, 2026

Add mm data example (lightseekorg#3)

070ade8

* add mm data * split back

zhubohao911 mentioned this pull request May 15, 2026

[WIP] Support co-locate training and inference (#81) #92

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mm data example#3

Add mm data example#3
yubofredwang merged 2 commits into
mainfrom
ywang/add-mm-doc

yubofredwang commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yubofredwang commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants