Skip to content

Causal reasoning benchmarks and tasks for large language models.

License

Notifications You must be signed in to change notification settings

linyingyang/CausalReasoningLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

CausalReasoningLLM

Causal reasoning benchmarks and tasks for large language models. For detailed reviews, please see Yang, L., Clivio, O., Shirvaikar, V., & Falck, F. (2023, December). A critical review of Causal Inference benchmarks for Large Language Models. In AAAI 2024 Workshop on''Are Large Language Models Simply Causal Parrots?''.

Benchmark name Paper title Link / URL to data
ART The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code https://github.com/allenai/abductive-commonsense-reasoning
BIGbench empirical_judgements Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/empirical_judgments/task.json
BIGbench cause_and_effect Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/cause_and_effect
BIGbench com2sense Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks
BIGbench crass_ai Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/crass_ai
BIGbench entailed_polarity Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/entailed_polarity
BIGbench entailed_polarity_hindi Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/entailed_polarity_hindi
BIGbench fantasy_reasoning Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/fantasy_reasoning
BIGbench figure_of_speech_detection Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/figure_of_speech_detection
BIGbench forecasting_subquestions Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/forecasting_subquestions
BIGbench goal_step_wikihow Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/goal_step_wikihow
BIGbench human_organs_senses Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/human_organs_senses
BIGbench indic_cause_and_effect Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/indic_cause_and_effect
BIGbench minute_mysteries_qa Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/minute_mysteries_qa
BIGbench winowhy Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/winowhy
BIGbench-tellmewhy Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/tellmewhy
Causal Chains Causal Parrots: Large Language Models May Talk Causality But Are Not Causal https://github.com/MoritzWillig/causalParrots/blob/master/media/causal_chains.pdf
Causal-TimeBank (CTB) (Original) The Causal News Corpus: Annotating Causal Relations in Event Sentences from News
(LLM) Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation.
https://github.com/paramitamirza/Causal-TimeBank
causalbank (Original) Guided generation of cause and effect
(LLM) Boosting Language Models Reasoning with Chain-of-Knowledge Prompting
https://nlp.jhu.edu/causalbank/
CausalDiscovery (Causal Parrots) Causal Parrots: Large Language Models May Talk Causality But Are Not Causal https://github.com/MoritzWillig/causalParrots
CLadder CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models https://huggingface.co/datasets/causalnlp/CLadder
COPA (Original) Choice of plausible alternatives: An evaluation of commonsense causal reasoning.
(LLM) Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation.
https://people.ict.usc.edu/gordon/copa.html#::text=An%20evaluation%20of%20commonsense%20causal,sets%20of%20500%20questions%20each.
Corr2Cause Causal Parrots: Large Language Models May Talk Causality But Are Not Causal https://github.com/causalNLP/corr2cause
Counterfactual reasoning Counterfactual reasoning: Do Language Models need world knowledge for causal inference? https://github.com/goldengua/Counterfactual_Inference_LM/tree/main/dataset
DREAM Dream: A challenge data set and models for dialogue-based reading comprehension https://paperswithcode.com/dataset/dream
e-CARE Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation. https://github.com/Waste-Wood/e-CARE
EventstoryLinev0.9 (ESC) (Original) The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction
(LLM) Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation.
https://github.com/cltl/EventStoryLine
FCR Towards fine-grained causal reasoning and qa https://github.com/YangLinyi/Fine-grained-Causal-Reasoning
FinCausal Financial document causality detection shared task (fincausal 2020) https://wp.lancs.ac.uk/cfie/fincausal2020/
Intuitive Physics Causal Parrots: Large Language Models May Talk Causality But Are Not Causal https://github.com/MoritzWillig/causalParrots/blob/master/media/intuitive_physics.pdf
Knowledge Base Fact Embeddings Causal Parrots: Large Language Models May Talk Causality But Are Not Causal
LogiQA A challenge dataset for machine reading comprehension with logical reasoning https://github.com/lgw863/LogiQA-dataset
MAVEN-ERE (Original) MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction(LLM) Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation. https://github.com/THU-KEG/MAVEN-ERE
Natural World Chain Causal Parrots: Large Language Models May Talk Causality But Are Not Causal https://github.com/MoritzWillig/causalParrots/blob/master/media/causal_world.pdf
Neuropathic-pain-diagnosis (Original) Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation
(LLM) Causal-discovery performance of chatgpt in the context of neuropathic pain diagnosis.
https://github.com/TURuibo/Neuropathic-Pain-Diagnosis-Simulator
NLP tasks for cause and effects Super-natural instructions: Generalization via declarative instructions on 1600+ nlp tasks https://github.com/allenai/natural-instructions
RACE Race: Large-scale reading comprehension dataset from examinations https://www.cs.cmu.edu/~glai1/data/race/
TimeTravel The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code https://github.com/qkaren/Counterfactual-StoryRW
Tuebingen cause-effect pairs dataset (Original) Distinguishing cause from effect using observational data: methods and benchmarks
(LLM) Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
https://github.com/amit-sharma/chatgpt-causality-pairs/tree/main

About

Causal reasoning benchmarks and tasks for large language models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages