Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

This repository contains the dataset and code for our paper Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

We release the MaSaC dataset which is a multimodal Code-Mixed corpus(Hindi-English) for detection of sarcasm as well as humour. The dataset is compiled from popular hindi TV series Sarabhai vs Sarabhai. The dataset consists of code-mixed utterences which are accompanied by the speaker level information for those utterances. The utterances are annotated with respective sarcasm and humour labels. For every utterrence, we also include audio features with respect to the given utterrence, which are included to provide additional understanding for the utterance. The audio features are extracted on utterance level from the timestamp information anotated while collecting the dataset.

Data Format

KEY	VALUE
Speaker	Speaker for the utterance
text	Utterance text to classify
Audio_features	Extracted mfcc features from audio file corresponding to the current utterance
Sarcasm	Binary label for sarcasm tag
Humour	Binary label for humour tag

Raw Audios

The raw audio files for each utterance is provided in the Google drive folder. For each utterence, the name structure for the audio files can be found in the "Audio_Filename.txt" file. In addition, the episode wise dialogue information is also provided in the file "Episodewise_dialoguelabels.txt" to identify the set of dialogues to which a particular utterance belongs. The audio features are also included seperately in a pickle file which are pre loaded by the model.

Video Feature Extraction

Refer to this Github repo.

Running the Code

Download the pre-trained Fasttext multilingual word embeddings anywhere in the directory.

Extract the embedding matrix.p file to get the pickled version.

Check for the configuration from the config.py file as per convinience.

The audio features are already extracted and used as a pickle file.

For running the configuration directly:

python Sarhum.py

Citation

@ARTICLE {9442359,
author = {M. Bedi and S. Kumar and M. Akhtar and T. Chakraborty},
journal = {IEEE Transactions on Affective Computing},
title = {Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations},
issn = {1949-3045},
keywords = {task analysis;visualization;semantics;context modeling;acoustics;switches;planning},
doi = {10.1109/TAFFC.2021.3083522},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {may}
}

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Audio FeatureExtraction		Audio FeatureExtraction
Data		Data
Models		Models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

Data Format

Raw Audios

Video Feature Extraction

Running the Code

Citation

About

Releases

Packages

Contributors 2

Languages

LCS2-IIITD/MSH-COMICS

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

Data Format

Raw Audios

Video Feature Extraction

Running the Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages