### Prepare the data for DPO training using reward-bench-2 from allenai/reward-bench-2

Once converted, this dataset can use the following config to train the DPO model:

```yaml
dataset:
  _component_: torchtune.datasets.stack_exchange_paired_dataset
  source: john02171574/reward-bench-2-converted
```

In [None]:
## Load the lvwerra/stack-exchange-paired dataset and only keep 1% of the data
import datasets
from huggingface_hub import login
from tqdm import tqdm

login(token="xxx")

# Load the dataset
dataset = datasets.load_dataset("allenai/reward-bench-2", split="test")

# Show some stats
print(f"Number of rows: {len(dataset)}")
print(f"Number of columns: {len(dataset[0].keys())}")
print(f"Number of unique values in the 'prompt' column: {len(dataset.unique('prompt'))}")

# Convert each row to the format of DPO
def convert_to_dpo(sample):
    if sample["chosen"] is None or sample["rejected"] is None:
        return None
    
    return {
        "chosen": [
            {
                "role": "user",
                "content": sample["prompt"],
            },
            {
                "role": "assistant",
                "content": sample["chosen"][0],
            },
        ],
        "rejected": [
            {
                "role": "user",
                "content": sample["prompt"],
            },
            {
                "role": "assistant",
                "content": sample["rejected"][0],
            },
        ],
    }

# Apply the conversion to the dataset
converted_dataset = []
for sample in tqdm(dataset):
    converted_dataset.append(convert_to_dpo(sample))

# Filter out None values and create a new dataset
converted_dataset = datasets.Dataset.from_list([sample for sample in converted_dataset if sample is not None])

# Show the result dataset
print(converted_dataset[0])

# Save the dataset to a jsonl file
# dataset.to_json("reward-bench-2.jsonl", orient="records", lines=True)

# Upload the dataset to huggingface with account john02171574 as the name reward-bench-2
converted_dataset.push_to_hub("john02171574/reward-bench-2-converted")


Number of rows: 1865
Number of columns: 10
Number of unique values in the 'prompt' column: 1864


100%|██████████| 1865/1865 [00:00<00:00, 24031.29it/s]

{'chosen': [{'content': 'Hi do you know how to fix Codeium defaule setting that is responsible for this  [ERROR]: [deadline_exceeded] context deadline exceeded', 'role': 'user'}, {'content': "Certainly! The error `[ERROR]: [deadline_exceeded] context deadline exceeded` typically indicates that the code generation process in Codeium (or another similar AI coding assistant) has timed out because it took too long to generate a response.\n\nHere are some steps you can take to address this issue:\n\n### 1. **Increase Timeout Settings:**\n   - **Codeium**: Unfortunately, Codeium does not have an official UI setting to increase the timeout. However, you can try to simplify your code or use more efficient algorithms to reduce the time required.\n   - **General Tips**: If you're working on a complex task, consider breaking it down into smaller parts or optimizing your code.\n\n### 2. **Optimize Your Code:**\n   - Ensure your code is as optimized as possible. This might involve:\n     - Using mo




Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/john02171574/reward-bench-2-converted/commit/059a9434fda5abf521927093f47f0f0d59e60668', commit_message='Upload dataset', commit_description='', oid='059a9434fda5abf521927093f47f0f0d59e60668', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/john02171574/reward-bench-2-converted', endpoint='https://huggingface.co', repo_type='dataset', repo_id='john02171574/reward-bench-2-converted'), pr_revision=None, pr_num=None)