CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Code for the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums (COLING 2018).

Description

In this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content- and context-driven modeling for sarcasm detection in online social media discussions (Reddit).

Requirements

Python (2.7 or 3)
Tensorflow (1.4.0)
FastText pre-trained embeddings
Download and save user_gcca_embeddings.npz at ./CASCADE/users/user_embeddings/

Optional

To train user-embeddings, download the dataset file: comments.json [1] from this link and train-balanced.csv from this link and save it inside folder: ./CASCADE/data/.

Preprocessing

User Embeddings: Stylometric features

The file ./CASCADE/data/comments.json has users and their corresponding tweets. Per user, there might be multiple number of tweets. Hence, we concatenate all the tweets corresponding to a user with the <END> tag:

1. cd users
2. python create_per_user_paragraph.py

The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

3. python train_stylometric.py

generate user_stylometric.csv (user stlyometric features) using the trained model:

4. python generate_stylometric.py

User Embeddings: Personality features

Pre-train a cnn-based model to detect personality features from text. The code utilizes two datasets to train. The second dataset [2] can be obatined by requesting the original authors.

5. python process_data.py [path/to/FastText_embedding]
6. python train_personality.py

To use the pre-trained model from our experiments, download the model weights: personality_model_weights.zip

and unzip inside folder: ./CASCADE/user/

generate user_personality.csv (user personality features) using this model:

7. python generate_user_personality.py

User Embeddings: Multi-view fusion

Merge the user_stylometric.csv and user_personality.csv into a single merged user_view_vectors.csv file:

8. python merge_user_views.py

Multi-view fusion of the user views (stylometric and personality) is performed using GCCA (~ CCA for two views). Generate fused user embeddings user_gcca_embeddings.npz using the following command:

9. python user_wgcca.py --input ./user_embeddings/user_view_vectors.csv --output ./user_embeddings/user_gcca_embeddings.npz --k 100 --no_of_views 2

This implementation of gcca has been adapted from https://github.com/abenton/wgcca .

Discourse Embeddings

Similar to user stylometric features, create the discourse features for each discussion forum (sub-reddit):

10. cd discourse
11. python create_per_discourse_paragraph.py

The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

12. python train_discourse.py

generate discourse.csv (user stlyometric features) using the trained model:

13. python generate_discourse.py

Running CASCADE

Hybrid CNN combining user-embeddings and discourse-features with textual modeling.

14. cd src
15. python process_data.py [path/to/FastText_embedding]
16. python train_cascade.py

The CNN codebase has been adapted from https://github.com/dennybritz/cnn-text-classification-tf

Citation

If you use this code in your work then please cite the paper - CASCADE: Contextual Sarcasm Detection in Online Discussion Forums with the following:

@article{hazarika2018cascade,
  title={CASCADE: Contextual Sarcasm Detection in Online Discussion Forums},
  author={Hazarika, Devamanyu and Poria, Soujanya and Gorantla, Sruthi and Cambria, Erik and Zimmermann, Roger and Mihalcea, Rada},
  journal={arXiv preprint arXiv:1805.06413},
  year={2018}
}

References

[1]. Khodak, Mikhail, Nikunj Saunshi, and Kiran Vodrahalli. "A large self-annotated corpus for sarcasm." arXiv preprint arXiv:1704.05579 (2017).

[2]. Celli, Fabio, et al. "Workshop on computational personality recognition (shared task)." Proceedings of the Workshop on Computational Personality Recognition. 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
discourse		discourse
src		src
users		users
README.md		README.md
cca.jpg		cca.jpg
overall_model.jpg		overall_model.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Description

Requirements

Optional

Preprocessing

User Embeddings: Stylometric features

User Embeddings: Personality features

User Embeddings: Multi-view fusion

Discourse Embeddings

Running CASCADE

Citation

References

About

Releases

Packages

Languages

soujanyaporia/CASCADE--ContextuAl-SarCAsm-DEtector

Folders and files

Latest commit

History

Repository files navigation

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Description

Requirements

Optional

Preprocessing

User Embeddings: Stylometric features

User Embeddings: Personality features

User Embeddings: Multi-view fusion

Discourse Embeddings

Running CASCADE

Citation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages