feat: support alpaca, sharegpt & chatml output format #43

ChenZiHong-Gavin · 2025-08-28T08:12:37Z

Output Formats

we support generating datasets in alpaca, sharegpt and chatml format.

Alpaca Format

Supervised Fine-Tuning Dataset

Example
In supervised fine-tuning, the instruction column will be concatenated with the input column and used as the user prompt, then the user prompt would be instruction\ninput. The output column represents the model response.

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)"
  }
]

Sharegpt Format

Supervised Fine-Tuning Dataset

Example
Compared to the alpaca format, the sharegpt format allows the datasets have more roles, such as human, gpt, observation and function. They are presented in a list of objects in the conversations column.

Note that the human and observation should appear in odd positions, while gpt and function should appear in even positions. The gpt and function will be learned by the model.

In our implementation, only human and gpt will be used.

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "user instruction (required)"
      },
      {
        "from": "gpt",
        "value": "model response (required)"
      }
    ]
    }
]

ChatML Format

Supervised Fine-Tuning Dataset

Example
Like the sharegpt format, the chatml format also allows the datasets have more roles, such as user, assistant, system and tool. They are presented in a list of objects in the messages column.

In our implementation, only user and assistant will be used.

[
  {
    "messages": [
      {
        "role": "user",
        "content": "user instruction (required)"
      },
      {
        "role": "assistant",
        "content": "model response (required)"
      }
    ]
    }
]

ChenZiHong-Gavin added 2 commits August 27, 2025 15:06

docs: update README

1e4c8a7

feat: support alpaca, sharegpt & chatml output format

3b4eb75

ChenZiHong-Gavin merged commit 7b89816 into main Aug 28, 2025
0 of 2 checks passed

ChenZiHong-Gavin deleted the output_format branch August 28, 2025 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support alpaca, sharegpt & chatml output format #43

feat: support alpaca, sharegpt & chatml output format #43

Uh oh!

ChenZiHong-Gavin commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: support alpaca, sharegpt & chatml output format #43

feat: support alpaca, sharegpt & chatml output format #43

Uh oh!

Conversation

ChenZiHong-Gavin commented Aug 28, 2025

Output Formats

Alpaca Format

Supervised Fine-Tuning Dataset

Sharegpt Format

Supervised Fine-Tuning Dataset

ChatML Format

Supervised Fine-Tuning Dataset

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants