Skip to content

ostapen/Switch-and-Explain

 
 

Repository files navigation

Switch-and-Explain

An XLM-RoBERTa based classifier for predicting code-switch points from English-Spanish human--human dialogue.

Link to the full paper and citation here.

This code incorporates the LIL layer from the SelfExplain framework (https://arxiv.org/abs/2103.12279)

Make sure to unzip the data folder and put it in an enclosing folder named bangor_data, otherwise rename the filepaths for '$DATA_FOLDER' or --dataset_basedir in the sh scripts below.

Preprocessing

Data for preprocessing available in bangor_data/ folder.

To run scripts for getting the phrase masks, use:

For baseline masking:

sh scripts/run_preprocessing_bangor_idx.sh

For masking speaker descriptions + dialogues (this will take a while)

sh scripts/run_preprocessing_bangor_desc.sh

Training Baselines on 10 seeds for context size 1, and extracting LIL interpretations:

sh scripts/baseline_train_ctx1.sh

Training speaker List models on 10 seeds for context size 1 and extracting LIL interpretations:

sh scripts/list_models_ctx1.sh

Update - new files

Preprocessed data is available for download here.

Data for control experiments is available for download here.

Extract these under the bangor_data folder.

Model outputs for the unbalanced validation and test sets is available under model_outputs. Folders are organized by split (test or validation), model type (speaker-prompted or baseline), and context size (in number of previous utterances).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 92.2%
  • Shell 7.8%