Data format for fine-tuning #11

e0397123 · 2022-08-12T09:10:16Z

Hi, may I know if it is possible to share data samples for finetuning contriever? In the code, there are fields like "positive_ctxs", "hard_negative_ctxs", "negative_ctxs", "question". Could you provide some examples?

sld · 2023-03-16T12:22:15Z

You should have train.jsonl and eval.jsonl files.
E.g. of usage in finetuning:

python finetuning.py --eval_data eval.jsonl --train_data train.jsonl  ....

Here is the example row in jsonl file:

{"question":"What is the most popular operating system?","positive_ctxs":[{"text": "Windows is the most popular operating system."}],"negative_ctxs":[{"text": "Windows is the most popular programming language."}],"hard_negative_ctxs":[{"text": "Windows is the most popular game console."}],"title":"Windows","text":"Windows is the most popular operating system."}

For fine-tuning I am using only 'question', 'positive_ctxs' and 'negative_ctxs' fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data format for fine-tuning #11

Data format for fine-tuning #11

e0397123 commented Aug 12, 2022

sld commented Mar 16, 2023

Data format for fine-tuning #11

Data format for fine-tuning #11

Comments

e0397123 commented Aug 12, 2022

sld commented Mar 16, 2023