Skip to content

Latest commit

 

History

History
44 lines (37 loc) · 9.87 KB

README.md

File metadata and controls

44 lines (37 loc) · 9.87 KB

Registry Data

The JSONL need to be pulled via git-lfs / downloaded to view.

Here are some example JSONLs for reference and how they are used in evals. See our eval templates docs for more details.

test_match/samples.jsonl In the associated eval from test-basic.yaml, we see this data is used in a Match class, which means we will check if a completion starts with the value for "ideal" key.

{"input": [{"role": "system", "content": "Complete the phrase as concisely as possible."}, {"role": "user", "content": "Once upon a "}], "ideal": "time"}
{"input": [{"role": "system", "content": "Complete the phrase as concisely as possible."}, {"role": "user", "content": "The first US president was "}], "ideal": "George Washington"}
{"input": [{"role": "system", "content": "Complete the phrase as concisely as possible."}, {"role": "user", "content": "OpenAI was founded in 20"}], "ideal": "15"}

Another example of a Match eval is:

{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Spell this sentence backwards, character by character: We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests."}], "ideal": ".stseuqer etairporppani tcejer dna ,sesimerp tcerrocni egnellahc ,sekatsim sti timda ,snoitseuq puwollof rewsna ot TPGtahC rof elbissop ti sekam tamrof eugolaid ehT .yaw lanoitasrevnoc a ni stcaretni hcihw TPGtahC dellac ledom a deniart ev’eW"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Spell this sentence backwards, character by character: Latencies will vary over time so we recommend benchmarking prior to making deployment decisions"}], "ideal": "snoisiced tnemyolped gnikam ot roirp gnikramhcneb dnemmocer ew os emit revo yrav lliw seicnetaL"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Spell this sentence backwards, character by character: Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity."}], "ideal": ".ytinamuh fo lla stifeneb—snamuh naht retrams yllareneg era taht smetsys IA—ecnegilletni lareneg laicifitra taht erusne ot si noissim ruO"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Spell this sentence backwards, character by character: There are several things we think are important to do now to prepare for AGI."}], "ideal": ".IGA rof eraperp ot won od ot tnatropmi era kniht ew sgniht lareves era erehT"}

test_fuzzy_match/samples.jsonl In the associated eval from test-basic.yaml, we see this data is used in a FuzzyMatch class, which means we will check if a completion includes a normalized version of the "ideal" key or vice-versa.

{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "system", "content": "What's the capital of France?", "name": "example_user"}, {"role": "system", "content": "Paris", "name": "example_assistant"}, {"role": "system", "content": "What's 2+2?", "name": "example_user"}, {"role": "system", "content": "4", "name": "example_assistant"}, {"role": "user", "content": "Who is the girl who plays eleven in stranger things?"}], "ideal": ["Millie Bobby Brown"]}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "system", "content": "What's the capital of France?", "name": "example_user"}, {"role": "system", "content": "Paris", "name": "example_assistant"}, {"role": "system", "content": "What's 2+2?", "name": "example_user"}, {"role": "system", "content": "4", "name": "example_assistant"}, {"role": "user", "content": "What season did derek die in grey's?"}], "ideal": ["Season 11", "11"]}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "system", "content": "What's the capital of France?", "name": "example_user"}, {"role": "system", "content": "Paris", "name": "example_assistant"}, {"role": "system", "content": "What's 2+2?", "name": "example_user"}, {"role": "system", "content": "4", "name": "example_assistant"}, {"role": "user", "content": "Who played the girl elf in the hobbit?"}], "ideal": ["Evangeline Lilly"]}

logic/samples.jsonl In the associated eval from test-basic.yaml, we see this data is used with a ModelBasedClassify class with fact model-graded yaml, which will compare the factual content of the completion against a ground truth.

{"input":[{"role":"system","content":"Solve the following logical puzzle. Carefully think step by step, and show your reasoning. If there is not enough information to solve the puzzle, conclude with 'There is not enough information.' There are five students, Anna, Bob and Cynthia, Dan and Eliza. They all tell the truth. Anna is taller than Cynthia. Bob says he's taller than Anna if and only if Eliza is the shortest. Cynthia is taller than Dan. Eliza is shorter than Dan. Who's the tallest in the group? Let's think step by step:"}],"ideal":"Anna > Cynthia > Dan > Eliza. But, based on Bob's statement, there are still two possibilities: 1. Bob is taller than Eliza, making Eliza the shortest, making Bob taller than Anna, making Bob the tallest. 2. Bob is shorter than Eliza: this would still be valid, as Eliza wouldn't be the shortest and therefore Bob isn't taller than Anna. And Anna would be the tallest. So there's not enough information"}
{"input":[{"role":"system","content":"Laura thinks that Jessica thinks that Angie is only 23 years old. Angie thinks Josie knows where Laura's mother is. Jessica thinks Laura was once an engineer. Josie thinks Laura is friendly. Based on the text, what thoughts do we know that Laura, Jessica, Angie, and Josie have?"}],"ideal":"Laura thinks: Jessica thinks Angie is only 23 years old. Jessica thinks: Laura was once an engineer. Angie thinks: Josie knows where Laura's mother is. Josie thinks: Laura is friendly."}
{"input":[{"role":"system","content":"At a party, there are 100 people. Some always lie and some always tell the truth. They all know which one of them is a truth-teller and which one is a liar. After the party, you ask each person how many truth-tellers they shook hands with. Each person gives a different answer, ranging from 0 to 99. How many were truth-tellers and how many were liars?"}],"ideal":"There is 1 truth-teller and 99 liars at the party."}
{"input":[{"role":"system","content":"Two people want to cross a river. The only way to get across is with a boat that they find on one side; but that boat can only take one person at a time. The boat cannot return on its own, and there are no ropes to haul it back, yet both persons manage to cross using the boat. How did they do it?"}],"ideal":"The people are on different sides of the river, so the person on the same side as the boat originally can cross first to bring the boat to the side with the other person, then that person can cross."}
{"input":[{"role":"system","content":"There are two men. One of them is wearing a red shirt, and the other is wearing a blue shirt. The two men are named Andrew and Bob, but we do not know which is Andrew and which is Bob. The guy in the blue shirt says, 'I am Andrew.' The guy in the red shirt says, 'I am Bob.' If we know that at least one of them lied, then what color shirt is Andrew wearing?"}],"ideal":"Andrew is wearing the red shirt."}
{"input":[{"role":"system","content":"Which word does NOT belong with the others? A. index B. glossary C. chapter D. book"}],"ideal":"D. book"}
{"input":[{"role":"system","content":"The day before yesterday, Chris was 7 years old. Next year he'll turn 10. How is this possible?"}],"ideal":"Assuming today is January 1st of any given year: Two days ago, on December 30th, Chris was 7 years old. The next day, on December 31st, Chris celebrated his 8th birthday. On December 31st of this year, Chris will celebrate his 9th birthday. At the end of next year, on December 31st, Chris will turn 10 years old."}
{"input":[{"role":"system","content":"Inhabitants of an island lie consistently on Tuesdays, Thursdays, and Saturdays, and they tell the truth on the other four days of the week. You have forgotten what day of the week it is, so you ask a passerby. 'Saturday,' he answers. 'And what day will it be tomorrow?' you inquire. 'Wednesday,' he replies. Can you tell what day it is today?"}],"ideal":"Based on answer 1, today cannot be M, W, F, Su, or Sa (lying day). Based on answer 2, today cannot be M, W, F, Su, or Tu (lying day). So, today must be Thursday."}
{"input":[{"role":"system","content":"You are on an island populated by two tribes. Members of one tribe consistently lie. Members of the other tribe always tell the truth. Tribe members can recognize one another, but you can't tell them apart. You meet two people, C and D on the island. C says, 'Exactly one of us is from the liars tribe.' Which tribe is D from?"}],"ideal":"D is from the Liars tribe."}
{"input":[{"role":"system","content":"There are five people in a room. Each person will either always tell the truth or always tell a lie. Each person is asked the following question: How many liars are among you? The answers are: \"one\", \"two\", \"three\", \"four\", \"five\". How many liars are in the room?"}],"ideal":"There are four liars."}

Dataset attributions

This work includes data from the Illinois Intentional Tort Qualitative Dataset, which was compiled by the Qualitative Reasoning Group at Northwestern University. The dataset is freely available under the Creative Commons Attribution 4.0 license from https://www.qrg.northwestern.edu/Resources/caselawcorpus.html