Detecting Causal Language Use in Science Findings

This is a collaborative project with the School of Information at Syracuse University. The goal of the project is to develop an automated NLP model that can identify causal language use in science findings, and to further study whether causal language use differs by countries and languages. We have developed an annotated corpus and trained a BioBert model with 0.88 macro-F1 score to categorize conclusion sentences into direct causal, conditional causal, correlational, and no relationships. We then applied this model to the observational studies in PubMed, and observed different levels of causal language use by authors from different countries and language backgrounds. This result challenges the notion of a shared consensus on causal language use in global science community.

How to cite

Yu, B., Li, Y. and Wang, J. (2019). Detecting Causal Language Use in Science Findings. EMNLP 2019, pages 4656–4666, Hong Kong, China, November 3–7, 2019. PDF

@inproceedings{yu2019EMNLPCausalLanguage,
  title={Detecting Causal Language Use in Science Findings},
  author={Yu, Bei and Li, Yingya and Wang, Jun},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={4656-4666},
  year={2019},
  url={https://www.aclweb.org/anthology/D19-1473.pdf}
}

About the data

Label = 0 : No relationship (1356 cases)
Label = 1 : Direct causal (494 cases)
Label = 2 : Conditional causal (213 cases)
Label = 3 : Correlational (998 cases)

Usage

STEP 1: Install bert-sklearn from https://github.com/junwang4/bert-sklearn-with-class-weight (for handling imbalanced classes)

STEP 2: Clone this repo and run

git clone https://github.com/junwang4/causal-language-use-in-science
cd causal-language-use-in-science
python3 main.py

Performance

On Ubuntu 16.04 with a GPU of 1080TI (your performance numbers may be different but should be similar)

5-fold; 5 epochs; BioBERT

       Acc     F1   F1_0   F1_1   F1_2   F1_3      P    P_0  ...    P_3      R    R_0    R_1    R_2    R_3  size  weight
0    0.876  0.859  0.885  0.853  0.804  0.893  0.848  0.911  ...  0.868  0.876  0.860  0.818  0.907  0.920   614   0.201
1    0.905  0.896  0.905  0.892  0.864  0.921  0.887  0.930  ...  0.908  0.905  0.882  0.919  0.884  0.935   613   0.200
2    0.902  0.889  0.902  0.892  0.843  0.919  0.897  0.901  ...  0.907  0.882  0.904  0.879  0.814  0.930   613   0.200
3    0.935  0.914  0.945  0.907  0.854  0.952  0.915  0.945  ...  0.964  0.914  0.945  0.939  0.833  0.940   611   0.200
4    0.880  0.855  0.907  0.821  0.804  0.889  0.849  0.893  ...  0.915  0.866  0.923  0.796  0.881  0.864   610   0.199
avg  0.900  0.883  0.909  0.873  0.834  0.915  0.879  0.916  ...  0.912  0.889  0.903  0.870  0.864  0.918   612   0.200
time used: 916s

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

Repository files navigation

Detecting Causal Language Use in Science Findings

How to cite

About the data

Usage

Performance

About

Releases

Packages

Languages

License

junwang4/causal-language-use-in-science

Folders and files

Latest commit

History

Repository files navigation

Detecting Causal Language Use in Science Findings

How to cite

About the data

Usage

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Languages