Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Data format for fine-tuning #11

Open
e0397123 opened this issue Aug 12, 2022 · 1 comment
Open

Data format for fine-tuning #11

e0397123 opened this issue Aug 12, 2022 · 1 comment

Comments

@e0397123
Copy link

Hi, may I know if it is possible to share data samples for finetuning contriever? In the code, there are fields like "positive_ctxs", "hard_negative_ctxs", "negative_ctxs", "question". Could you provide some examples?

@sld
Copy link

sld commented Mar 16, 2023

You should have train.jsonl and eval.jsonl files.
E.g. of usage in finetuning:

python finetuning.py --eval_data eval.jsonl --train_data train.jsonl  ....

Here is the example row in jsonl file:

{"question":"What is the most popular operating system?","positive_ctxs":[{"text": "Windows is the most popular operating system."}],"negative_ctxs":[{"text": "Windows is the most popular programming language."}],"hard_negative_ctxs":[{"text": "Windows is the most popular game console."}],"title":"Windows","text":"Windows is the most popular operating system."}

For fine-tuning I am using only 'question', 'positive_ctxs' and 'negative_ctxs' fields.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants