You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how you format the OSS75k data for training? Is it in the alpaca format like:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
{instruction} # problem column of the OSS75k dataset
@@ Response
# solution column of the OSS75k dataset
Thx
The text was updated successfully, but these errors were encountered:
Hi, here is the exact format we used when finetuning the model on OSS75K:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
Write a solution to the following coding problem:
{problem}
@@ Response
{solution}
We haven't tested other templates ourselves, but we encourage anyone interested to explore them!
Hi, here is the exact format we used when finetuning the model on OSS75K:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
Write a solution to the following coding problem:
{problem}
@@ Response
{solution}
We haven't tested other templates ourselves, but we encourage anyone interested to explore them!
Thx, is there a py script to convert the dataset Magicoder-OSS-Instruct-75K download from huggingface to the above format (instruction-response pairs)?
Hi, here is the exact format we used when finetuning the model on OSS75K:
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
Write a solution to the following coding problem:
{problem}
@@ Response
{solution}
We haven't tested other templates ourselves, but we encourage anyone interested to explore them!
Thx, is there a py script to convert the dataset Magicoder-OSS-Instruct-75K download from huggingface to the above format (instruction-response pairs)?
@UniverseFly Thx, I have found the py script, yes it is preprocess_data.py
Hi, thx for the work!
I was wondering how you format the OSS75k data for training? Is it in the alpaca format like:
Thx
The text was updated successfully, but these errors were encountered: