CREST: A Causal Relation Schema for Text 🚀

CREST is a machine-readable format/schema that is created to help researchers who work on causal/counterfactual relation extraction and commonsense causal reasoning, to use and leverage the scattered data resources around these topics more easily. CREST-formatted data are stored as pandas DataFrame.

How to convert dataset(s) to CREST:

Clone this repository and go to the /CREST directory.
Install the requirements: pip install -r requirements.txt
Download spaCy's model: python -m spacy download en_core_web_sm
Run the /crest/convert.py:
- python convert.py -i: printing the full list of currently supported datasets
- python convert.py [DATASET_ID_0] ... [DATASET_ID_n] [OUTPUT_FILE_NAME]
  - DATASET_ID_*: id of a dataset.
  - OUTPUT_FILE_NAME: name of the output file that should be in .xlsx format
Examples:
- Converting datasets 1 and 2: python convert.py 1 2 output.xlsx
- Converting dataset 5: python convert.py 5 output.xlsx

The excel file of all converted datasets: crest_v2.xlsx

PDTB is not available in this file due to copyright. However, you can still use CREST to convert this dataset if you have access to PDTB.

`CREST` format

Each relation in a CREST-formatted DataFrame has the following fields/values:

original_id: the id of a relation in the original dataset, if such an id exists.
span1: a list of strings of the first span/argument of the relation.
span2: a list of strings of the second span/argument of the relation
signal: a list of strings of signals/markers of the relation in context, if any.
context: a text string of the context in which the relation appears.
idx: indices of span1, span2, and signal tokens/spans in context stored in 3 lines, each line in the form of span_type start_1:end_1 ... start_n:end_n. For example, if span1 has multiple tokens/spans with start:end indices 2:5 and 10:13, respectively, span1's line value in idx is span1 2:5 10:13. Indices are sorted based on the start indexes of tokens/spans.
label: label of the relation, 0: non-causal, 1: causal
direction: direction between span1 and span2. 0: span1 => span2, 1: span1 <= span2, -1: not-specified
source: id of the source dataset (ids are listed in a table below)
split: 0: train, 1: dev, 2: test. This is the split to which the relation belongs in the original dataset. If there is no split specified for a relation in the original dataset, we assign the relation to the train split by default.

Note: The reason we save a list of strings instead of a single string for span1, span2, and signal is that these text spans may contain multiple non-consecutive sub-spans in context.

Available Data Resources

List of data resources already converted to CREST format:

Id	Data resource	Samples	Causal	Non-causal	Document	Year
1	SemEval 2007 Task 4	1,529	114	1,415	Paper	2007
2	SemEval 2010 Task 8	10,717	1,331	9,386	Paper	2010
3	EventCausality	583	583	-	Paper	2011
4	Causal-TimeBank	318	318	-	Paper	2014
5	EventStoryLine v1.5	2,608	2,608	-	Paper	2016
6	CaTeRS	2,502	308	2,194	Paper	2016
7	BECauSE v2.1 ^⚠️	729	554	175	Paper	2017
8	Choice of Plausible Alternatives (COPA)	2,000	1,000	1,000	Paper	2011
9	The Penn Discourse Treebank (PDTB) 3.0 ^⚠️	7,991	7,991	-	Manual	2019
10	BioCause Corpus	844	844	-	Paper	2013
11	Temporal and Causal Reasoning (TCR)	172	172	-	Paper	2018
12	Benchmark Corpus for Adverse Drug Effects	5,671	5,671	-	Paper	2012
13	SemEval 2020 Task 5	5,501	5,501	-	Paper	2020

⚠️ The data is either not publicly available or partially available. You can still use CREST for conversion if you have full access to this dataset.

Counterfactual Relations

`CREST` conversion

We provide helper methods to convert CREST-formatted data to popular formats and annotation schemes, mainly formats that are used across relation extraction/classification tasks. In the following, there is a list of formats for which we have already developed CREST converter methods:

brat: we have provided helper methods for two-way conversion of CREST data frames to brat (see example here). brat is a popular web-based annotation tool that has been used for a variety of relation extraction NLP tasks. We use brat for two main reasons: 1) better visualization of causal and non-causal relations and their arguments, and 2) modifying annotations if needed and adding new annotations to provided context. In the following, there is a sample of a converted version of CREST-formatted relation to brat (example is taken from CaTeRS dataset):
TACRED: TACRED is a large-scale relation extraction dataset. We convert samples from CREST to TACRED since TACRED-formatted data can be easily used as input to many transformers-based language models (e.g. for Relation Classification/Extraction). You can find an example of converting CREST-formatted data to TACRED in this notebook.

How you can contribute:

Are there any related datasets you don’t see in the list? Let us know or feel free to submit a Pull Request (PR), we actively check the PRs and appreciate it ☺️
Is there a well-known or widely-used machine-readable format you think can be added? We can add the helper methods for conversion or we appreciate PRs.

How to cite CREST?

For now, please cite our arXiv paper:

@article{hosseini2021predicting,
  title={Predicting Directionality in Causal Relations in Text},
  author={Hosseini, Pedram and Broniatowski, David A and Diab, Mona},
  journal={arXiv preprint arXiv:2103.13606},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
crest		crest
data		data
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crest

crest

data

data

notebooks

notebooks

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

CREST: A Causal Relation Schema for Text 🚀

How to convert dataset(s) to CREST:

`CREST` format

Available Data Resources

`CREST` conversion

How you can contribute:

How to cite CREST?

About

Releases

Packages

Contributors 2

Languages

phosseini/CREST

Folders and files

Latest commit

History

Repository files navigation

CREST: A Causal Relation Schema for Text 🚀

How to convert dataset(s) to CREST:

CREST format

Available Data Resources

CREST conversion

How you can contribute:

How to cite CREST?

About

Topics

Resources

Stars

Watchers

Forks

Languages

`CREST` format

`CREST` conversion