Skip to content

gucorpling/DisCoDisCo

Repository files navigation

Introduction

DisCoDisCo (District of Columbia Discourse Cognoscente) is GU Corpling's submission to the DISRPT 2021 shared task. DisCoDisCo placed first among all systems submitted to the 2021 shared task across all five subtasks. Consult the official repo for more information on the shared task.

See our paper here: https://aclanthology.org/2021.disrpt-1.6/

Citation:

@inproceedings{gessler-etal-2021-discodisco,
    title = "{D}is{C}o{D}is{C}o at the {DISRPT}2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection",
    author = "Gessler, Luke  and
      Behzad, Shabnam  and
      Liu, Yang Janet  and
      Peng, Siyao  and
      Zhu, Yilun  and
      Zeldes, Amir",
    booktitle = "Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.disrpt-1.6",
    pages = "51--62"
}

Usage

Setup

  1. Create a new environment:
conda create --name disrpt python=3.8
conda activate disrpt
  1. Install dependencies:
pip install -r requirements.txt
  1. Ensure the 2021 shared task data is at data/2021/.

Experiments

Gold segmentation:

bash seg_scripts/single_corpus_train_and_test_ft.sh zho.rst.sctb

Silver segmentation:

bash seg_scripts/silver_single_corpus_train_and_test_ft.sh zho.rst.sctb

Relation classification:

bash rel_scripts/run_single_flair_clone.sh zho.rst.sctb

Troubleshooting

Batch size may be modified, if necessary, using the batch_size parameter in:

About

GUCorpling's DISRPT 2021 shared task submission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages