-
Notifications
You must be signed in to change notification settings - Fork 232
/
contamination.yaml
95 lines (85 loc) · 3.1 KB
/
contamination.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
points:
# Models trained on The Pile
- models:
- together/bloom
groups:
- the_pile
level: strong
description: BLOOM is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://huggingface.co/spaces/bigscience/BigScienceCorpus.
- models:
- together/gpt-j-6b
groups:
- the_pile
level: strong
description: GPT-J is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/.
- models:
- together/gpt-neox-20b
groups:
- the_pile
level: strong
description: GPT-NeoX is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://arxiv.org/abs/2204.06745.
- models:
- together/opt-66b
- together/opt-175b
groups:
- the_pile
level: strong
description: OPT is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://arxiv.org/abs/2205.01068.
- models:
- microsoft/TNLGv2_7B
- microsoft/TNLGv2_530B
groups:
- the_pile
level: strong
description: MT-NLG is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://arxiv.org/abs/2201.11990.
- models:
- anthropic/stanford-online-all-v4-s3
- anthropic/claude-v1.3
- anthropic/claude-instant-v1
- anthropic/claude-instant-1.2
groups:
- the_pile
level: strong
description: Anthropic is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://arxiv.org/abs/2112.00861.
- models:
- together/yalm
groups:
- the_pile
level: strong
description: YaLM is explicitly trained on the Pile, i.e. data from the same distribution as the test set. See https://github.com/yandex/YaLM-100B.
# Models explicitly trained on specific downstream scenarios
- models:
- together/t0pp
groups:
- hellaswag
- openbookqa
- boolq
- summarization_xsum
- summarization_cnndm
- imdb
level: strong
description: T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.
# Models with contamination analyses.
- models:
- openai/davinci
- openai/curie
- openai/babbage
- openai/ada
- openai/text-curie-001
- openai/text-babbage-001
- openai/text-ada-001
- openai/text-davinci-002
- openai/text-davinci-003
- openai/code-davinci-002
- openai/code-davinci-001
- openai/code-cushman-001
groups:
- natural_qa_closedbook
- natural_qa_openbook_longans
- hellaswag
- openbookqa
- boolq
- quac
level: weak
description: Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.