Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning Instruct-Prompt Version For OpenAI EVAL #470

Merged
merged 5 commits into from
Apr 11, 2023

Conversation

csitfun
Copy link
Contributor

@csitfun csitfun commented Mar 27, 2023

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. We encourage partial PR's with ~5-10 example that we can then run the evals on and share the results with you so you know how your eval does with GPT-4 before writing all 100 examples.

Eval details 📑

Eval name

[LogiQA]

Eval description

[LogiQA is a multi-choice machine reading dataset released in 2020, and in 2023 we updated the dataset to version 2. The dataset is collected from Logical reasoning exams with excellent annotation quality. Our experiments showed that it is challenging even for Large Language Models like GPT-3. Fine-tuning methods perform even worse. Up to now, the SOTA score on the dataset is 55% accuracy. The dataset is used by many researchers in their probing tasks and is an important benchmark for logical reasoning. For this OpeanAI Eval, we transformed 1572 instances from the test set into an instruct-prompt scheme.]

What makes this a useful eval?

[Logical reasoning ability is a fascinating topic in the NLP community. We hope to see if GPT4 sheds more light on this topic.]

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 100 high quality examples (it is okay to only contribute 5-10 meaningful examples and have us test them with GPT-4 before adding all 100)

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your yaml is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgement

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.

Submit eval

  • I have filled out all required fields in the evals PR form
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: Sigatoka disease drastically reduces the yield of banana trees and is epidemic throughout the areas of the world where bananas are grown. The fungus that causes the disease can be controlled with fungicides, but the fungicides can pose a health hazard to people living nearby. The fungicides are thus unsuitable for small banana groves in populated areas. Fortunately, most large banana plantations are in locations so isolated that fungicides can be used safely there. Therefore, most of the world' s banana crop is not seriously threatened by Sigatoka disease."}, {"role": "system", "content": "Question: Which one of the following is an assumption on which the argument depends?"}, {"role": "user", "content": "Option 1: Sigatoka disease is the only disease that threatens bananas on a worldwide scale."}, {"role": "user", "content": "Option 2: Most of the banana trees that have not been exposed to the Sigatoka fungus grow in small banana groves."}, {"role": "user", "content": "Option 3: Large plantations produce most or all of the world's bananas."}, {"role": "user", "content": "Option 4: Sigatoka disease is the only disease that threatens bananas on a worldwide scale."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: At present, there are many books such as Ten Keys to Success in the book market. Publishers marketed these books as books that would actually help readers achieve great success. In fact, almost everyone knows that great success is destined to belong to a minority, and people cannot all become one of the minority through books. In this regard, the exaggerated and even false claims made by publishers cannot be considered unethical. To say the least, even if one believes the publisher's false claims, it is not immoral to make such claims as long as reading such books does more good than harm to one's success."}, {"role": "system", "content": "Question: Which of the following conclusions best fits the above argument?"}, {"role": "user", "content": "Option 1: Deliberately making false propaganda is immoral only when it has no positive effect"}, {"role": "user", "content": "Option 2: Deliberate propaganda of this kind is only immoral if people are deceived and suffer from it"}, {"role": "user", "content": "Option 3: If the deliberate disinformation is made to profit at the expense of the deceived, then the deliberate disinformation is immoral"}, {"role": "user", "content": "Option 4: Deliberately making false propaganda is immoral only when it has no positive effect"}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: Attorney for Ziegler: My client continued to do consulting work between the time of his arrest for attempted murder and the start of this trial. But I contend that Ziegler was insane at the time that he fired the shot. This is the only reasonable conclusion to draw from the fact that the accusers have submitted no evidence that he was sane at the time he pulled the trigger, only that he was sane some time after he did so."}, {"role": "system", "content": "Question: Which one of the following most accurately describes a flaw in the reasoning of Ziegler's attorney?"}, {"role": "user", "content": "Option 1: It presumes that being a well-educated professional is relevant to being guilty or innocent."}, {"role": "user", "content": "Option 2: It fails to consider that Ziegler might have been insane when he worked as a consultant."}, {"role": "user", "content": "Option 3: It fails to consider the possibility that Ziegler's being sane after the shooting is an indication that he was sane at the time of the shooting."}, {"role": "user", "content": "Option 4: It presumes that being a well-educated professional is relevant to being guilty or innocent."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: It is proposed to allow the sale, without prescription, of a medication that physicians currently prescribe to treat the common ear inflammation called swimmer' s ear.  The principal objection is that most people lack the expertise for proper self-diagnosis and might not seek medical help for more serious conditions in the mistaken belief that they have swimmer' s ear. Yet in a recent study, of 1, 000 people who suspected that they had swimmer' s ear, 84 percent had made a correct diagnosis -- a slightly better accuracy rate than physicians have in diagnosing swimmer' s ear. Thus, clearly, most people can diagnose swimmer' s ear in themselves without ever having to consult a physician."}, {"role": "system", "content": "Question: Which one of the following, if true, most undermines the conclusion?"}, {"role": "user", "content": "Option 1: Cases in which swimmer's ear progresses to more serious infections are very rare."}, {"role": "user", "content": "Option 2: For many people who develop swimmer's ear, the condition disappears without medical or pharmaceutical intervention."}, {"role": "user", "content": "Option 3: Physicians who specialize in ear diseases are generally able to provide more accurate diagnoses than those provided by general practitioners."}, {"role": "user", "content": "Option 4: Cases in which swimmer's ear progresses to more serious infections are very rare."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: All any reporter knows about the accident is what the press agent has said. Ttherefore, if the press agent told every reporter everything about the accident, then no reporter knows any more about it than any other reporter. If no reporter knows any more about the accident than any other reporter, then no reporter can scoop all of the other reporters. However, the press agent did not tell every reporter everything about the accident. It follows that some reporter can scoop all of the other reporters."}, {"role": "system", "content": "Question: The argument's reasoning is flawed because the argument fails to recognize that which one of the following is consistent with the facts the argument presents?"}, {"role": "user", "content": "Option 1: The press agent may not know any more about the accident than the most knowledgeable reporter."}, {"role": "user", "content": "Option 2: No reporter knows any more about the accident than any other reporter."}, {"role": "user", "content": "Option 3: Even if some reporter knows more about the accident than all of the other reporters, that reporter need not scoop any other reporter."}, {"role": "user", "content": "Option 4: The press agent may not know any more about the accident than the most knowledgeable reporter."}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: Crowdsourcing refers to the practice of a company or organization to delegate tasks traditionally performed by employees to the general public."}, {"role": "system", "content": "Question: Which of the following is not crowdsourcing?"}, {"role": "user", "content": "Option 1: A toy company has been encouraging and sponsoring users to participate in its design work. From robotic control systems to building block kits, the company has had fairly good results."}, {"role": "user", "content": "Option 2: A detergent company often posts its own R & D projects on major websites, soliciting solutions, and promises to give certain rewards for solutions."}, {"role": "user", "content": "Option 3: In the past three years, a real estate company has handed over all the daily maintenance of computers, networks and peripherals to a computer company."}, {"role": "user", "content": "Option 4: A toy company has been encouraging and sponsoring users to participate in its design work. From robotic control systems to building block kits, the company has had fairly good results."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: Social risk refers to the risk of loss of social production and people's life due to the actions of individuals or groups."}, {"role": "system", "content": "Question: Which of the following is not a social risk?"}, {"role": "user", "content": "Option 1: Larceny."}, {"role": "user", "content": "Option 2: Robbery."}, {"role": "user", "content": "Option 3: Frost disaster."}, {"role": "user", "content": "Option 4: Larceny."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: A manager is hoping to reach a certain target for camera sales in his store, which sells between 10 and 20 cameras a week. Typically, most cameras sold in any week are the less expensive economy models, and his store has sold relatively fewer of the more expensive, high-end cameras. The manager realizes that if, on average, three more cameras sold each week were high-end instead of economy models, the store would reach its target in sales. The manager prepares a detailed information sheet for the sales associates, outlining the numerous advantages of the high-end cameras over the economy cameras, and provides each sales associate with a portfolio of contrasting photos of the same images, showing the clearly superior image quality of the high-end cameras."}, {"role": "system", "content": "Question: Which of the following, if true, would provide most support for the prediction that the detailed information sheet and photo portfolio given to sales associates will have its intended effect of allowing the store to reach its target in sales?"}, {"role": "user", "content": "Option 1: Camera stores that are part of the same national franchise in major metropolitan locations, like New York or Los Angeles, sell comparatively large numbers of the high end cameras."}, {"role": "user", "content": "Option 2: The sales associates are already well informed about the capabilities of all the cameras, and often know detailed technical information about their circuitry."}, {"role": "user", "content": "Option 3: The high end cameras can generate photographs of profession quality, such as those a portrait photographer might produce"}, {"role": "user", "content": "Option 4: Camera stores that are part of the same national franchise in major metropolitan locations, like New York or Los Angeles, sell comparatively large numbers of the high end cameras."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: In people's impression, bio-fuel is a renewable green energy. The latest research results overturn people's traditional impression. Researchers found that bio-fuel may be converted into acetaldehyde due to incomplete combustion, which will pollute the air. This pollution will lead to 1400 early deaths in country M every year. Therefore, some medical institution personnel in country M believe that the promotion of bio-fuels should be suspended and its use should be limited at this stage."}, {"role": "system", "content": "Question: Which of the following, if true, would most effectively question the views of medical institution personnel?"}, {"role": "user", "content": "Option 1: At present, the country's scientists have developed a new technology to fully burn biofuels."}, {"role": "user", "content": "Option 2: Pollution from other fuels currently being used in the country causes more than 3,000 premature deaths a year."}, {"role": "user", "content": "Option 3: Conventional fuels such as oil have been technologically improved to reduce pollution from combustion."}, {"role": "user", "content": "Option 4: At present, the country's scientists have developed a new technology to fully burn biofuels."}], "ideal": "1"}
{"input": [{"role": "system", "content": "Instructions: You will be presented with a passage, and a question about that passage. There are four options to be choosed from, you need to choose the only correct option answering that question. If the first option is right, you generate the answer '1', if the second option is right, you generate the answer '2', if the third option is right, you generate the answer '3', if the fourth option is right, you generate the answer '4'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails."}, {"role": "system", "content": "Passage: Road traffic accident refers to the event of personal injury or property loss caused by vehicle fault or accident on the road. Among them, road refers to roads, urban roads and places where social motor vehicles are allowed to pass although within the jurisdiction of the unit, including squares, public parking lots and other places used for public passage. Vehicle refers to motor vehicles and non motor vehicles. Non motor vehicles, It refers to the means of transport driven by human or animal power and running on the road, as well as the motor wheelchair, electric bicycle and other means of transport for the disabled whose design maximum speed, empty vehicle quality and overall dimensions meet the relevant national standards although driven by power devices."}, {"role": "system", "content": "Question: According to the above definition, which of the followings doesn't belong to road traffic accident:"}, {"role": "user", "content": "Option 1: Xiao Wang accidentally knocked down an old man when reversing in the closed management community"}, {"role": "user", "content": "Option 2: When Miss Zhou crossed the road with her pet dog, the stray pet dog unfortunately died under the ring"}, {"role": "user", "content": "Option 3: Xiao Zhao parked his car in the parking lot near the shopping mall. When he picked up the car, he found that the rear of the car was hit and the accident vehicle had escaped"}, {"role": "user", "content": "Option 4: Xiao Wang accidentally knocked down an old man when reversing in the closed management community"}], "ideal": "1"}

@csitfun
Copy link
Contributor Author

csitfun commented Mar 30, 2023

LogiQA 2.0 dataset section that is specifically collected after 2021, the tests are released in 2022 and later. GPT3.5 turbo did them horribly, only achieves 38% accuracy. Manual tests on GPT4 yield 54 correct answers out of 112 questions (48.21%).

Copy link
Contributor

@andrew-openai andrew-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this eval! We've approved the PR and will merge. We’ll also add GPT-4 API access to your account.

@andrew-openai andrew-openai merged commit e4a3ac0 into openai:main Apr 11, 2023
1 check passed
michailmelonas pushed a commit to michailmelonas/evals that referenced this pull request Apr 12, 2023
…gical Reasoning Instruct-Prompt Version For OpenAI EVAL (openai#470)

# Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, __failure to follow
the guidelines below will result in the PR being closed automatically__.
Note that even if the criteria are met, that does not guarantee the PR
will be merged nor GPT-4 access granted. 🚨

__PLEASE READ THIS__:

In order for a PR to be merged, it must fail on GPT-4. We are aware that
right now, users do not have access, so you will not be able to tell if
the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep
in mind as we run the eval, if GPT-4 gets higher than 90% on the eval,
we will likely reject since GPT-4 is already capable of completing the
task.

We plan to roll out a way for users submitting evals to see the eval
performance on GPT-4 soon. Stay tuned! Until then, you will not be able
to see the eval performance on GPT-4. We encourage partial PR's with
~5-10 example that we can then run the evals on and share the results
with you so you know how your eval does with GPT-4 before writing all
100 examples.

## Eval details 📑
### Eval name
[LogiQA]

### Eval description

[LogiQA is a multi-choice machine reading dataset released in 2020, and
in 2023 we updated the dataset to version 2. The dataset is collected
from Logical reasoning exams with excellent annotation quality. Our
experiments showed that it is challenging even for Large Language Models
like GPT-3. Fine-tuning methods perform even worse. Up to now, the SOTA
score on the dataset is 55% accuracy. The dataset is used by many
researchers in their probing tasks and is an important benchmark for
logical reasoning. For this OpeanAI Eval, we transformed 1572 instances
from the test set into an instruct-prompt scheme.]

### What makes this a useful eval?

[Logical reasoning ability is a fascinating topic in the NLP community.
We hope to see if GPT4 sheds more light on this topic.]

## Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general,
we are seeking cases where the model does not do a good job despite
being capable of generating a good response (note that there are some
things large language models cannot do, so those would not make good
evals).

Your eval should be:

- [x] Thematically consistent: The eval should be thematically
consistent. We'd like to see a number of prompts all demonstrating some
particular failure mode. For example, we can create an eval on cases
where the model fails to reason about the physical world.
- [x] Contains failures where a human can do the task, but either GPT-4
or GPT-3.5-Turbo could not.
- [x] Includes good signal around what is the right behavior. This means
either a correct answer for `Basic` evals or the `Fact` Model-graded
eval, or an exhaustive rubric for evaluating answers for the `Criteria`
Model-graded eval.
- [x] Include at least 100 high quality examples (it is okay to only
contribute 5-10 meaningful examples and have us test them with GPT-4
before adding all 100)

If there is anything else that makes your eval worth including, please
document it below.

### Unique eval value

> Insert what makes your eval high quality that was not mentioned above.
(Not required)

## Eval structure 🏗️

Your eval should
- [x] Check that your data is in `evals/registry/data/{name}`
- [x] Check that your yaml is registered at
`evals/registry/evals/{name}.yaml`
- [x] Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing
eval classes. You may still write custom eval classes for your own
cases, and we may consider merging them in the future.)

## Final checklist 👀

### Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic
and data under the same MIT license as this repository. You must have
adequate rights to upload any data used in an Eval. OpenAI reserves the
right to use this data in future service improvements to our product.
Contributions to OpenAI Evals will be subject to our usual Usage
Policies (https://platform.openai.com/docs/usage-policies).

- [x] I agree that my submission will be made available under an MIT
license and complies with OpenAI's usage policies.

### Email address validation

If your submission is accepted, we will be granting GPT-4 access to a
limited number of contributors. Access will be given to the email
address associated with the merged pull request.

- [x] I acknowledge that GPT-4 access will only be granted, if
applicable, to the email address used for my merged pull request.

### Limited availability acknowledgement

We know that you might be excited to contribute to OpenAI's mission,
help improve our models, and gain access to GPT-4. However, due to the
requirements mentioned above and high volume of submissions, we will not
be able to accept all submissions and thus not grant everyone who opens
a PR GPT-4 access. We know this is disappointing, but we hope to set the
right expectation before you open this PR.

- [x] I understand that opening a PR, even if it meets the requirements
above, does not guarantee the PR will be merged nor GPT-4 access
granted.

### Submit eval

- [x] I have filled out all required fields in the evals PR form
- [ ] (Ignore if not submitting code) I have run `pip install
pre-commit; pre-commit install` and have verified that `black`, `isort`,
and `autoflake` are running when I commit and push

Failure to fill out all required fields will result in the PR being
closed.

### Eval JSON data 

Since we are using Git LFS, we are asking eval submitters to add in as
many Eval Samples (at least 5) from their contribution here:

<details>
  <summary>View evals in JSON</summary>

  ### Eval
  ```jsonl
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Sigatoka
disease drastically reduces the yield of banana trees and is epidemic
throughout the areas of the world where bananas are grown. The fungus
that causes the disease can be controlled with fungicides, but the
fungicides can pose a health hazard to people living nearby. The
fungicides are thus unsuitable for small banana groves in populated
areas. Fortunately, most large banana plantations are in locations so
isolated that fungicides can be used safely there. Therefore, most of
the world' s banana crop is not seriously threatened by Sigatoka
disease."}, {"role": "system", "content": "Question: Which one of the
following is an assumption on which the argument depends?"}, {"role":
"user", "content": "Option 1: Sigatoka disease is the only disease that
threatens bananas on a worldwide scale."}, {"role": "user", "content":
"Option 2: Most of the banana trees that have not been exposed to the
Sigatoka fungus grow in small banana groves."}, {"role": "user",
"content": "Option 3: Large plantations produce most or all of the
world's bananas."}, {"role": "user", "content": "Option 4: Sigatoka
disease is the only disease that threatens bananas on a worldwide
scale."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: At
present, there are many books such as Ten Keys to Success in the book
market. Publishers marketed these books as books that would actually
help readers achieve great success. In fact, almost everyone knows that
great success is destined to belong to a minority, and people cannot all
become one of the minority through books. In this regard, the
exaggerated and even false claims made by publishers cannot be
considered unethical. To say the least, even if one believes the
publisher's false claims, it is not immoral to make such claims as long
as reading such books does more good than harm to one's success."},
{"role": "system", "content": "Question: Which of the following
conclusions best fits the above argument?"}, {"role": "user", "content":
"Option 1: Deliberately making false propaganda is immoral only when it
has no positive effect"}, {"role": "user", "content": "Option 2:
Deliberate propaganda of this kind is only immoral if people are
deceived and suffer from it"}, {"role": "user", "content": "Option 3: If
the deliberate disinformation is made to profit at the expense of the
deceived, then the deliberate disinformation is immoral"}, {"role":
"user", "content": "Option 4: Deliberately making false propaganda is
immoral only when it has no positive effect"}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Attorney
for Ziegler: My client continued to do consulting work between the time
of his arrest for attempted murder and the start of this trial. But I
contend that Ziegler was insane at the time that he fired the shot. This
is the only reasonable conclusion to draw from the fact that the
accusers have submitted no evidence that he was sane at the time he
pulled the trigger, only that he was sane some time after he did so."},
{"role": "system", "content": "Question: Which one of the following most
accurately describes a flaw in the reasoning of Ziegler's attorney?"},
{"role": "user", "content": "Option 1: It presumes that being a
well-educated professional is relevant to being guilty or innocent."},
{"role": "user", "content": "Option 2: It fails to consider that Ziegler
might have been insane when he worked as a consultant."}, {"role":
"user", "content": "Option 3: It fails to consider the possibility that
Ziegler's being sane after the shooting is an indication that he was
sane at the time of the shooting."}, {"role": "user", "content": "Option
4: It presumes that being a well-educated professional is relevant to
being guilty or innocent."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: It is
proposed to allow the sale, without prescription, of a medication that
physicians currently prescribe to treat the common ear inflammation
called swimmer' s ear. The principal objection is that most people lack
the expertise for proper self-diagnosis and might not seek medical help
for more serious conditions in the mistaken belief that they have
swimmer' s ear. Yet in a recent study, of 1, 000 people who suspected
that they had swimmer' s ear, 84 percent had made a correct diagnosis --
a slightly better accuracy rate than physicians have in diagnosing
swimmer' s ear. Thus, clearly, most people can diagnose swimmer' s ear
in themselves without ever having to consult a physician."}, {"role":
"system", "content": "Question: Which one of the following, if true,
most undermines the conclusion?"}, {"role": "user", "content": "Option
1: Cases in which swimmer's ear progresses to more serious infections
are very rare."}, {"role": "user", "content": "Option 2: For many people
who develop swimmer's ear, the condition disappears without medical or
pharmaceutical intervention."}, {"role": "user", "content": "Option 3:
Physicians who specialize in ear diseases are generally able to provide
more accurate diagnoses than those provided by general practitioners."},
{"role": "user", "content": "Option 4: Cases in which swimmer's ear
progresses to more serious infections are very rare."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: All any
reporter knows about the accident is what the press agent has said.
Ttherefore, if the press agent told every reporter everything about the
accident, then no reporter knows any more about it than any other
reporter. If no reporter knows any more about the accident than any
other reporter, then no reporter can scoop all of the other reporters.
However, the press agent did not tell every reporter everything about
the accident. It follows that some reporter can scoop all of the other
reporters."}, {"role": "system", "content": "Question: The argument's
reasoning is flawed because the argument fails to recognize that which
one of the following is consistent with the facts the argument
presents?"}, {"role": "user", "content": "Option 1: The press agent may
not know any more about the accident than the most knowledgeable
reporter."}, {"role": "user", "content": "Option 2: No reporter knows
any more about the accident than any other reporter."}, {"role": "user",
"content": "Option 3: Even if some reporter knows more about the
accident than all of the other reporters, that reporter need not scoop
any other reporter."}, {"role": "user", "content": "Option 4: The press
agent may not know any more about the accident than the most
knowledgeable reporter."}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage:
Crowdsourcing refers to the practice of a company or organization to
delegate tasks traditionally performed by employees to the general
public."}, {"role": "system", "content": "Question: Which of the
following is not crowdsourcing?"}, {"role": "user", "content": "Option
1: A toy company has been encouraging and sponsoring users to
participate in its design work. From robotic control systems to building
block kits, the company has had fairly good results."}, {"role": "user",
"content": "Option 2: A detergent company often posts its own R & D
projects on major websites, soliciting solutions, and promises to give
certain rewards for solutions."}, {"role": "user", "content": "Option 3:
In the past three years, a real estate company has handed over all the
daily maintenance of computers, networks and peripherals to a computer
company."}, {"role": "user", "content": "Option 4: A toy company has
been encouraging and sponsoring users to participate in its design work.
From robotic control systems to building block kits, the company has had
fairly good results."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Social
risk refers to the risk of loss of social production and people's life
due to the actions of individuals or groups."}, {"role": "system",
"content": "Question: Which of the following is not a social risk?"},
{"role": "user", "content": "Option 1: Larceny."}, {"role": "user",
"content": "Option 2: Robbery."}, {"role": "user", "content": "Option 3:
Frost disaster."}, {"role": "user", "content": "Option 4: Larceny."}],
"ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: A
manager is hoping to reach a certain target for camera sales in his
store, which sells between 10 and 20 cameras a week. Typically, most
cameras sold in any week are the less expensive economy models, and his
store has sold relatively fewer of the more expensive, high-end cameras.
The manager realizes that if, on average, three more cameras sold each
week were high-end instead of economy models, the store would reach its
target in sales. The manager prepares a detailed information sheet for
the sales associates, outlining the numerous advantages of the high-end
cameras over the economy cameras, and provides each sales associate with
a portfolio of contrasting photos of the same images, showing the
clearly superior image quality of the high-end cameras."}, {"role":
"system", "content": "Question: Which of the following, if true, would
provide most support for the prediction that the detailed information
sheet and photo portfolio given to sales associates will have its
intended effect of allowing the store to reach its target in sales?"},
{"role": "user", "content": "Option 1: Camera stores that are part of
the same national franchise in major metropolitan locations, like New
York or Los Angeles, sell comparatively large numbers of the high end
cameras."}, {"role": "user", "content": "Option 2: The sales associates
are already well informed about the capabilities of all the cameras, and
often know detailed technical information about their circuitry."},
{"role": "user", "content": "Option 3: The high end cameras can generate
photographs of profession quality, such as those a portrait photographer
might produce"}, {"role": "user", "content": "Option 4: Camera stores
that are part of the same national franchise in major metropolitan
locations, like New York or Los Angeles, sell comparatively large
numbers of the high end cameras."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: In
people's impression, bio-fuel is a renewable green energy. The latest
research results overturn people's traditional impression. Researchers
found that bio-fuel may be converted into acetaldehyde due to incomplete
combustion, which will pollute the air. This pollution will lead to 1400
early deaths in country M every year. Therefore, some medical
institution personnel in country M believe that the promotion of
bio-fuels should be suspended and its use should be limited at this
stage."}, {"role": "system", "content": "Question: Which of the
following, if true, would most effectively question the views of medical
institution personnel?"}, {"role": "user", "content": "Option 1: At
present, the country's scientists have developed a new technology to
fully burn biofuels."}, {"role": "user", "content": "Option 2: Pollution
from other fuels currently being used in the country causes more than
3,000 premature deaths a year."}, {"role": "user", "content": "Option 3:
Conventional fuels such as oil have been technologically improved to
reduce pollution from combustion."}, {"role": "user", "content": "Option
4: At present, the country's scientists have developed a new technology
to fully burn biofuels."}], "ideal": "1"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Road
traffic accident refers to the event of personal injury or property loss
caused by vehicle fault or accident on the road. Among them, road refers
to roads, urban roads and places where social motor vehicles are allowed
to pass although within the jurisdiction of the unit, including squares,
public parking lots and other places used for public passage. Vehicle
refers to motor vehicles and non motor vehicles. Non motor vehicles, It
refers to the means of transport driven by human or animal power and
running on the road, as well as the motor wheelchair, electric bicycle
and other means of transport for the disabled whose design maximum
speed, empty vehicle quality and overall dimensions meet the relevant
national standards although driven by power devices."}, {"role":
"system", "content": "Question: According to the above definition, which
of the followings doesn't belong to road traffic accident:"}, {"role":
"user", "content": "Option 1: Xiao Wang accidentally knocked down an old
man when reversing in the closed management community"}, {"role":
"user", "content": "Option 2: When Miss Zhou crossed the road with her
pet dog, the stray pet dog unfortunately died under the ring"}, {"role":
"user", "content": "Option 3: Xiao Zhao parked his car in the parking
lot near the shopping mall. When he picked up the car, he found that the
rear of the car was hit and the accident vehicle had escaped"}, {"role":
"user", "content": "Option 4: Xiao Wang accidentally knocked down an old
man when reversing in the closed management community"}], "ideal": "1"}

  ```
</details>
@csitfun
Copy link
Contributor Author

csitfun commented Apr 15, 2023 via email

@andrew-openai
Copy link
Contributor

Hi, unfortunately I couldn't find your email details in the commits, please email me at kondrich@openai.com from your OpenAI-affiliated email!

Linmj-Judy pushed a commit to TablewareBox/evals that referenced this pull request Feb 27, 2024
…gical Reasoning Instruct-Prompt Version For OpenAI EVAL (openai#470)

# Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, __failure to follow
the guidelines below will result in the PR being closed automatically__.
Note that even if the criteria are met, that does not guarantee the PR
will be merged nor GPT-4 access granted. 🚨

__PLEASE READ THIS__:

In order for a PR to be merged, it must fail on GPT-4. We are aware that
right now, users do not have access, so you will not be able to tell if
the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep
in mind as we run the eval, if GPT-4 gets higher than 90% on the eval,
we will likely reject since GPT-4 is already capable of completing the
task.

We plan to roll out a way for users submitting evals to see the eval
performance on GPT-4 soon. Stay tuned! Until then, you will not be able
to see the eval performance on GPT-4. We encourage partial PR's with
~5-10 example that we can then run the evals on and share the results
with you so you know how your eval does with GPT-4 before writing all
100 examples.

## Eval details 📑
### Eval name
[LogiQA]

### Eval description

[LogiQA is a multi-choice machine reading dataset released in 2020, and
in 2023 we updated the dataset to version 2. The dataset is collected
from Logical reasoning exams with excellent annotation quality. Our
experiments showed that it is challenging even for Large Language Models
like GPT-3. Fine-tuning methods perform even worse. Up to now, the SOTA
score on the dataset is 55% accuracy. The dataset is used by many
researchers in their probing tasks and is an important benchmark for
logical reasoning. For this OpeanAI Eval, we transformed 1572 instances
from the test set into an instruct-prompt scheme.]

### What makes this a useful eval?

[Logical reasoning ability is a fascinating topic in the NLP community.
We hope to see if GPT4 sheds more light on this topic.]

## Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general,
we are seeking cases where the model does not do a good job despite
being capable of generating a good response (note that there are some
things large language models cannot do, so those would not make good
evals).

Your eval should be:

- [x] Thematically consistent: The eval should be thematically
consistent. We'd like to see a number of prompts all demonstrating some
particular failure mode. For example, we can create an eval on cases
where the model fails to reason about the physical world.
- [x] Contains failures where a human can do the task, but either GPT-4
or GPT-3.5-Turbo could not.
- [x] Includes good signal around what is the right behavior. This means
either a correct answer for `Basic` evals or the `Fact` Model-graded
eval, or an exhaustive rubric for evaluating answers for the `Criteria`
Model-graded eval.
- [x] Include at least 100 high quality examples (it is okay to only
contribute 5-10 meaningful examples and have us test them with GPT-4
before adding all 100)

If there is anything else that makes your eval worth including, please
document it below.

### Unique eval value

> Insert what makes your eval high quality that was not mentioned above.
(Not required)

## Eval structure 🏗️

Your eval should
- [x] Check that your data is in `evals/registry/data/{name}`
- [x] Check that your yaml is registered at
`evals/registry/evals/{name}.yaml`
- [x] Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing
eval classes. You may still write custom eval classes for your own
cases, and we may consider merging them in the future.)

## Final checklist 👀

### Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic
and data under the same MIT license as this repository. You must have
adequate rights to upload any data used in an Eval. OpenAI reserves the
right to use this data in future service improvements to our product.
Contributions to OpenAI Evals will be subject to our usual Usage
Policies (https://platform.openai.com/docs/usage-policies).

- [x] I agree that my submission will be made available under an MIT
license and complies with OpenAI's usage policies.

### Email address validation

If your submission is accepted, we will be granting GPT-4 access to a
limited number of contributors. Access will be given to the email
address associated with the merged pull request.

- [x] I acknowledge that GPT-4 access will only be granted, if
applicable, to the email address used for my merged pull request.

### Limited availability acknowledgement

We know that you might be excited to contribute to OpenAI's mission,
help improve our models, and gain access to GPT-4. However, due to the
requirements mentioned above and high volume of submissions, we will not
be able to accept all submissions and thus not grant everyone who opens
a PR GPT-4 access. We know this is disappointing, but we hope to set the
right expectation before you open this PR.

- [x] I understand that opening a PR, even if it meets the requirements
above, does not guarantee the PR will be merged nor GPT-4 access
granted.

### Submit eval

- [x] I have filled out all required fields in the evals PR form
- [ ] (Ignore if not submitting code) I have run `pip install
pre-commit; pre-commit install` and have verified that `black`, `isort`,
and `autoflake` are running when I commit and push

Failure to fill out all required fields will result in the PR being
closed.

### Eval JSON data 

Since we are using Git LFS, we are asking eval submitters to add in as
many Eval Samples (at least 5) from their contribution here:

<details>
  <summary>View evals in JSON</summary>

  ### Eval
  ```jsonl
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Sigatoka
disease drastically reduces the yield of banana trees and is epidemic
throughout the areas of the world where bananas are grown. The fungus
that causes the disease can be controlled with fungicides, but the
fungicides can pose a health hazard to people living nearby. The
fungicides are thus unsuitable for small banana groves in populated
areas. Fortunately, most large banana plantations are in locations so
isolated that fungicides can be used safely there. Therefore, most of
the world' s banana crop is not seriously threatened by Sigatoka
disease."}, {"role": "system", "content": "Question: Which one of the
following is an assumption on which the argument depends?"}, {"role":
"user", "content": "Option 1: Sigatoka disease is the only disease that
threatens bananas on a worldwide scale."}, {"role": "user", "content":
"Option 2: Most of the banana trees that have not been exposed to the
Sigatoka fungus grow in small banana groves."}, {"role": "user",
"content": "Option 3: Large plantations produce most or all of the
world's bananas."}, {"role": "user", "content": "Option 4: Sigatoka
disease is the only disease that threatens bananas on a worldwide
scale."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: At
present, there are many books such as Ten Keys to Success in the book
market. Publishers marketed these books as books that would actually
help readers achieve great success. In fact, almost everyone knows that
great success is destined to belong to a minority, and people cannot all
become one of the minority through books. In this regard, the
exaggerated and even false claims made by publishers cannot be
considered unethical. To say the least, even if one believes the
publisher's false claims, it is not immoral to make such claims as long
as reading such books does more good than harm to one's success."},
{"role": "system", "content": "Question: Which of the following
conclusions best fits the above argument?"}, {"role": "user", "content":
"Option 1: Deliberately making false propaganda is immoral only when it
has no positive effect"}, {"role": "user", "content": "Option 2:
Deliberate propaganda of this kind is only immoral if people are
deceived and suffer from it"}, {"role": "user", "content": "Option 3: If
the deliberate disinformation is made to profit at the expense of the
deceived, then the deliberate disinformation is immoral"}, {"role":
"user", "content": "Option 4: Deliberately making false propaganda is
immoral only when it has no positive effect"}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Attorney
for Ziegler: My client continued to do consulting work between the time
of his arrest for attempted murder and the start of this trial. But I
contend that Ziegler was insane at the time that he fired the shot. This
is the only reasonable conclusion to draw from the fact that the
accusers have submitted no evidence that he was sane at the time he
pulled the trigger, only that he was sane some time after he did so."},
{"role": "system", "content": "Question: Which one of the following most
accurately describes a flaw in the reasoning of Ziegler's attorney?"},
{"role": "user", "content": "Option 1: It presumes that being a
well-educated professional is relevant to being guilty or innocent."},
{"role": "user", "content": "Option 2: It fails to consider that Ziegler
might have been insane when he worked as a consultant."}, {"role":
"user", "content": "Option 3: It fails to consider the possibility that
Ziegler's being sane after the shooting is an indication that he was
sane at the time of the shooting."}, {"role": "user", "content": "Option
4: It presumes that being a well-educated professional is relevant to
being guilty or innocent."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: It is
proposed to allow the sale, without prescription, of a medication that
physicians currently prescribe to treat the common ear inflammation
called swimmer' s ear. The principal objection is that most people lack
the expertise for proper self-diagnosis and might not seek medical help
for more serious conditions in the mistaken belief that they have
swimmer' s ear. Yet in a recent study, of 1, 000 people who suspected
that they had swimmer' s ear, 84 percent had made a correct diagnosis --
a slightly better accuracy rate than physicians have in diagnosing
swimmer' s ear. Thus, clearly, most people can diagnose swimmer' s ear
in themselves without ever having to consult a physician."}, {"role":
"system", "content": "Question: Which one of the following, if true,
most undermines the conclusion?"}, {"role": "user", "content": "Option
1: Cases in which swimmer's ear progresses to more serious infections
are very rare."}, {"role": "user", "content": "Option 2: For many people
who develop swimmer's ear, the condition disappears without medical or
pharmaceutical intervention."}, {"role": "user", "content": "Option 3:
Physicians who specialize in ear diseases are generally able to provide
more accurate diagnoses than those provided by general practitioners."},
{"role": "user", "content": "Option 4: Cases in which swimmer's ear
progresses to more serious infections are very rare."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: All any
reporter knows about the accident is what the press agent has said.
Ttherefore, if the press agent told every reporter everything about the
accident, then no reporter knows any more about it than any other
reporter. If no reporter knows any more about the accident than any
other reporter, then no reporter can scoop all of the other reporters.
However, the press agent did not tell every reporter everything about
the accident. It follows that some reporter can scoop all of the other
reporters."}, {"role": "system", "content": "Question: The argument's
reasoning is flawed because the argument fails to recognize that which
one of the following is consistent with the facts the argument
presents?"}, {"role": "user", "content": "Option 1: The press agent may
not know any more about the accident than the most knowledgeable
reporter."}, {"role": "user", "content": "Option 2: No reporter knows
any more about the accident than any other reporter."}, {"role": "user",
"content": "Option 3: Even if some reporter knows more about the
accident than all of the other reporters, that reporter need not scoop
any other reporter."}, {"role": "user", "content": "Option 4: The press
agent may not know any more about the accident than the most
knowledgeable reporter."}], "ideal": "2"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage:
Crowdsourcing refers to the practice of a company or organization to
delegate tasks traditionally performed by employees to the general
public."}, {"role": "system", "content": "Question: Which of the
following is not crowdsourcing?"}, {"role": "user", "content": "Option
1: A toy company has been encouraging and sponsoring users to
participate in its design work. From robotic control systems to building
block kits, the company has had fairly good results."}, {"role": "user",
"content": "Option 2: A detergent company often posts its own R & D
projects on major websites, soliciting solutions, and promises to give
certain rewards for solutions."}, {"role": "user", "content": "Option 3:
In the past three years, a real estate company has handed over all the
daily maintenance of computers, networks and peripherals to a computer
company."}, {"role": "user", "content": "Option 4: A toy company has
been encouraging and sponsoring users to participate in its design work.
From robotic control systems to building block kits, the company has had
fairly good results."}], "ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Social
risk refers to the risk of loss of social production and people's life
due to the actions of individuals or groups."}, {"role": "system",
"content": "Question: Which of the following is not a social risk?"},
{"role": "user", "content": "Option 1: Larceny."}, {"role": "user",
"content": "Option 2: Robbery."}, {"role": "user", "content": "Option 3:
Frost disaster."}, {"role": "user", "content": "Option 4: Larceny."}],
"ideal": "3"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: A
manager is hoping to reach a certain target for camera sales in his
store, which sells between 10 and 20 cameras a week. Typically, most
cameras sold in any week are the less expensive economy models, and his
store has sold relatively fewer of the more expensive, high-end cameras.
The manager realizes that if, on average, three more cameras sold each
week were high-end instead of economy models, the store would reach its
target in sales. The manager prepares a detailed information sheet for
the sales associates, outlining the numerous advantages of the high-end
cameras over the economy cameras, and provides each sales associate with
a portfolio of contrasting photos of the same images, showing the
clearly superior image quality of the high-end cameras."}, {"role":
"system", "content": "Question: Which of the following, if true, would
provide most support for the prediction that the detailed information
sheet and photo portfolio given to sales associates will have its
intended effect of allowing the store to reach its target in sales?"},
{"role": "user", "content": "Option 1: Camera stores that are part of
the same national franchise in major metropolitan locations, like New
York or Los Angeles, sell comparatively large numbers of the high end
cameras."}, {"role": "user", "content": "Option 2: The sales associates
are already well informed about the capabilities of all the cameras, and
often know detailed technical information about their circuitry."},
{"role": "user", "content": "Option 3: The high end cameras can generate
photographs of profession quality, such as those a portrait photographer
might produce"}, {"role": "user", "content": "Option 4: Camera stores
that are part of the same national franchise in major metropolitan
locations, like New York or Los Angeles, sell comparatively large
numbers of the high end cameras."}], "ideal": "4"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: In
people's impression, bio-fuel is a renewable green energy. The latest
research results overturn people's traditional impression. Researchers
found that bio-fuel may be converted into acetaldehyde due to incomplete
combustion, which will pollute the air. This pollution will lead to 1400
early deaths in country M every year. Therefore, some medical
institution personnel in country M believe that the promotion of
bio-fuels should be suspended and its use should be limited at this
stage."}, {"role": "system", "content": "Question: Which of the
following, if true, would most effectively question the views of medical
institution personnel?"}, {"role": "user", "content": "Option 1: At
present, the country's scientists have developed a new technology to
fully burn biofuels."}, {"role": "user", "content": "Option 2: Pollution
from other fuels currently being used in the country causes more than
3,000 premature deaths a year."}, {"role": "user", "content": "Option 3:
Conventional fuels such as oil have been technologically improved to
reduce pollution from combustion."}, {"role": "user", "content": "Option
4: At present, the country's scientists have developed a new technology
to fully burn biofuels."}], "ideal": "1"}
{"input": [{"role": "system", "content": "Instructions: You will be
presented with a passage, and a question about that passage. There are
four options to be choosed from, you need to choose the only correct
option answering that question. If the first option is right, you
generate the answer '1', if the second option is right, you generate the
answer '2', if the third option is right, you generate the answer '3',
if the fourth option is right, you generate the answer '4'. Read the
question and options thoroughly and select the correct answer from the
four answer labels. Read the passage thoroughly to ensure you know what
the passage entails."}, {"role": "system", "content": "Passage: Road
traffic accident refers to the event of personal injury or property loss
caused by vehicle fault or accident on the road. Among them, road refers
to roads, urban roads and places where social motor vehicles are allowed
to pass although within the jurisdiction of the unit, including squares,
public parking lots and other places used for public passage. Vehicle
refers to motor vehicles and non motor vehicles. Non motor vehicles, It
refers to the means of transport driven by human or animal power and
running on the road, as well as the motor wheelchair, electric bicycle
and other means of transport for the disabled whose design maximum
speed, empty vehicle quality and overall dimensions meet the relevant
national standards although driven by power devices."}, {"role":
"system", "content": "Question: According to the above definition, which
of the followings doesn't belong to road traffic accident:"}, {"role":
"user", "content": "Option 1: Xiao Wang accidentally knocked down an old
man when reversing in the closed management community"}, {"role":
"user", "content": "Option 2: When Miss Zhou crossed the road with her
pet dog, the stray pet dog unfortunately died under the ring"}, {"role":
"user", "content": "Option 3: Xiao Zhao parked his car in the parking
lot near the shopping mall. When he picked up the car, he found that the
rear of the car was hit and the accident vehicle had escaped"}, {"role":
"user", "content": "Option 4: Xiao Wang accidentally knocked down an old
man when reversing in the closed management community"}], "ideal": "1"}

  ```
</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants