Skip to content

Commit

Permalink
Update GPT4All dataset loading for new files (#2344)
Browse files Browse the repository at this point in the history
Close #2331. The multiround conversations in the previous files were
badly formatted, so the GPT4All author uploaded separate files for
singleround and multiround. Here we only load singleround examples
and can later add support for multiround
  • Loading branch information
olliestanley committed Apr 7, 2023
1 parent b575c39 commit 394d458
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion model/model_training/custom_datasets/prompt_dialogue.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def __init__(self, mode: str, cache_dir: str = None) -> None:
self.mode = mode
dataset = load_dataset(
"Nebulous/gpt4all_pruned",
data_files="data_pruned_3.jsonl",
data_files="data_singleround_pruned_3.jsonl",
cache_dir=cache_dir,
)
self.rows = [(row["prompt"], row["response"]) for row in dataset["train"]]
Expand Down

0 comments on commit 394d458

Please sign in to comment.