Training data format for Magicoder-OSS-Instruct-75K #23

VoiceBeer · 2023-12-27T07:56:51Z

Hi, thx for the work!

I was wondering how you format the OSS75k data for training? Is it in the alpaca format like:

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
{instruction} # problem column of the OSS75k dataset

@@ Response
# solution column of the OSS75k dataset

Thx

The text was updated successfully, but these errors were encountered:

UniverseFly · 2023-12-28T07:54:05Z

Hi, here is the exact format we used when finetuning the model on OSS75K:

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
Write a solution to the following coding problem:
{problem}

@@ Response
{solution}

We haven't tested other templates ourselves, but we encourage anyone interested to explore them!

VoiceBeer · 2023-12-28T07:55:39Z

Thx! Appreciate it :>

shatealaboxiaowang · 2024-01-24T06:04:04Z

Hi, here is the exact format we used when finetuning the model on OSS75K:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
Write a solution to the following coding problem:
{problem}

@@ Response
{solution}
We haven't tested other templates ourselves, but we encourage anyone interested to explore them!

Thx, is there a py script to convert the dataset Magicoder-OSS-Instruct-75K download from huggingface to the above format （instruction-response pairs）?

shatealaboxiaowang · 2024-01-24T07:37:49Z

Hi, here is the exact format we used when finetuning the model on OSS75K:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
Write a solution to the following coding problem:
{problem}

@@ Response
{solution}
We haven't tested other templates ourselves, but we encourage anyone interested to explore them!
Thx, is there a py script to convert the dataset Magicoder-OSS-Instruct-75K download from huggingface to the above format （instruction-response pairs）?

@UniverseFly Thx, I have found the py script, yes it is preprocess_data.py

UniverseFly self-assigned this Dec 28, 2023

UniverseFly added the question Further information is requested label Dec 28, 2023

VoiceBeer closed this as completed Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data format for Magicoder-OSS-Instruct-75K #23

Training data format for Magicoder-OSS-Instruct-75K #23

VoiceBeer commented Dec 27, 2023

UniverseFly commented Dec 28, 2023

VoiceBeer commented Dec 28, 2023

shatealaboxiaowang commented Jan 24, 2024

shatealaboxiaowang commented Jan 24, 2024

Training data format for Magicoder-OSS-Instruct-75K #23

Training data format for Magicoder-OSS-Instruct-75K #23

Comments

VoiceBeer commented Dec 27, 2023

UniverseFly commented Dec 28, 2023

VoiceBeer commented Dec 28, 2023

shatealaboxiaowang commented Jan 24, 2024

shatealaboxiaowang commented Jan 24, 2024