CASA-Dialogue-Act-Classifier

PyTorch implementation of the paper Dialogue Act Classification with Context-Aware Self-Attention for dialogue act classification with a generic dataset class and PyTorch-Lightning trainer. This implementation has following differences compare to the actual paper

In this implementation contextualized embedding (ie: BERT, RoBERta, etc ) (freezed hence not trainable) is used while paper uses combination of GloVe and ELMo.
This implementation has simple softmax classifier but paper has CRF classifier.

Kaggle Kernel to train it on switchboard dialogue act corpus

To train this on switchboard dialogue act dataset:

Navigate to data/ using: cd data/
Unzip the dataset: unzip switchboard.zip
Navigate to the main dir: cd ..
Install the dependencies in a separate python environment.
[Optional] Change the project_name and run_name in the logger or disable the wandb logger if you don't want to use it by commenting the logger code (line 15-20 in main.py) and don't pass it to Lightning trainer (line 32 in main.py), and then comment the logging code in Trainer.py (line 70 and 95). By default Lightning will log to tensorboard logger.
[Optional] Change the parameters (batch_size, lr, epochs etc) in config.py.
Run main.py using python main.py
Model will be trained and best checkpoint will be saved.

To train this on any dialogue act dataset

Paste your data into data/, your dataset should have following structure
- dataset_name
  - dataset_name_train.csv
  - dataset_name_valid.csv
  - dataset_name_test.csv
[Optional] If you don't have separate test and validation data, copy the test/valid and rename it as valid/test, this both validation and test data will be same.
Update the num_classes param in config.py line 18 according to your dataset.
Follow from steps 5 of the switchboard.

Note: Feel free to create to an issue if you find any problem. Also you're welcome to create PR if you want to add something. Here is the list of components one can add:

Hyperparameter Search
More dialogue act classification models which are not open-sourced.

References

[1]: Raheja, V., & Tetreault, J. (2019). Dialogue Act Classification with Context-Aware Self-Attention. ArXiv, abs/1904.02594.

[2]: Lin, Z., Feng, M., Santos, C.D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A Structured Self-attentive Sentence Embedding. ArXiv, abs/1703.03130.

[3]: Switchboard Dialogue Act corpus: http://compprag.christopherpotts.net/swda.html

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
dataset		dataset
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Trainer.py		Trainer.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASA-Dialogue-Act-Classifier

Kaggle Kernel to train it on switchboard dialogue act corpus

To train this on switchboard dialogue act dataset:

To train this on any dialogue act dataset

References

About

Releases

Packages

Contributors 2

Languages

License

macabdul9/CASA-Dialogue-Act-Classifier

Folders and files

Latest commit

History

Repository files navigation

CASA-Dialogue-Act-Classifier

Kaggle Kernel to train it on switchboard dialogue act corpus

To train this on switchboard dialogue act dataset:

To train this on any dialogue act dataset

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages