Switch-and-Explain

An XLM-RoBERTa based classifier for predicting code-switch points from English-Spanish human--human dialogue.

Link to the full paper and citation here.

This code incorporates the LIL layer from the SelfExplain framework (https://arxiv.org/abs/2103.12279)

Make sure to unzip the data folder and put it in an enclosing folder named bangor_data, otherwise rename the filepaths for '$DATA_FOLDER' or --dataset_basedir in the sh scripts below.

Preprocessing

Data for preprocessing available in bangor_data/ folder.

To run scripts for getting the phrase masks, use:

For baseline masking:

sh scripts/run_preprocessing_bangor_idx.sh

For masking speaker descriptions + dialogues (this will take a while)

sh scripts/run_preprocessing_bangor_desc.sh

Training Baselines on 10 seeds for context size 1, and extracting LIL interpretations:

sh scripts/baseline_train_ctx1.sh

Training speaker List models on 10 seeds for context size 1 and extracting LIL interpretations:

sh scripts/list_models_ctx1.sh

Update - new files

Preprocessed data is available for download here.

Data for control experiments is available for download here.

Extract these under the bangor_data folder.

Model outputs for the unbalanced validation and test sets is available under model_outputs. Folders are organized by split (test or validation), model type (speaker-prompted or baseline), and context size (in number of previous utterances).

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
bangor_data		bangor_data
model		model
model_outputs		model_outputs
preprocessing		preprocessing
scripts		scripts
.gitignore		.gitignore
README.md		README.md
interpret_bangor.py		interpret_bangor.py
requirements.txt		requirements.txt
run_baseline.py		run_baseline.py
run_baseline_description.py		run_baseline_description.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Switch-and-Explain

Preprocessing

For baseline masking:

For masking speaker descriptions + dialogues (this will take a while)

Training Baselines on 10 seeds for context size 1, and extracting LIL interpretations:

Training speaker List models on 10 seeds for context size 1 and extracting LIL interpretations:

Update - new files

About

Releases

Packages

Languages

ostapen/Switch-and-Explain

Folders and files

Latest commit

History

Repository files navigation

Switch-and-Explain

Preprocessing

For baseline masking:

For masking speaker descriptions + dialogues (this will take a while)

Training Baselines on 10 seeds for context size 1, and extracting LIL interpretations:

Training speaker List models on 10 seeds for context size 1 and extracting LIL interpretations:

Update - new files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages