We provide the first definition of the Jujeop comments (주접 댓글) in South Korea and human-annotated 8.6K Jujeop corpus. Jujeop is a type of pun and a unique way for fans to express their love for the K-pop stars they follow using Korean. One of the unique characteristics of Jujeop is its use of exaggerated expressions to compliment K-pop stars, which contain or lead to humor. Based on this characteristic, Jujeop can be separated into four distinct types, with their own lexical collocations:
(1) Fragmenting words to createa twist, (2) Homophones and Homographs, (3) Repetition, and (4) Nonsense.
The dataset for each result condition can be downloaded by running the file in the dataset
directory. All the Jujeop file consist of .txt file type that include title, text, label and type. Not Jujeop data file is provided as not_jujeop.txt
that also includes title, text, label, type. Additionally, we also provide a video channel list file as channel.txt
that includes youtube video query. We uploaded the Youtube crawler crawler.py
, we implemented to collect video title, comments, user name, and number of likes.
We employed deep neural network models to classify Jujeop for verification of the annotated corpus quality.
Within the models/binary
folder we uploaded classification models to binarize comments into Jujeop and non-Jujeop types. Additionally, we conducted multi-class classification for each Jujeop type which uploaded in models/multiclass
folder. We're always welcome to get feedback for improving model performance! 😊
- Python >= 3.6
- TensorFlow >= 1.7
- Keras >= 2.1.5
- Pytorch >= 1.7.0
- transformers >= 3.5.0
- sentencepiece==0.1.85
- MXNet >= 1.4.0
- onnxruntime >= 0.3.0
- git+https://git@github.com/SKTBrain/KoBERT.git@master
- gluonnlp
- tqdm
@inproceedings{oh2021jujeop,
title={Jujeop: Korean Puns for K-pop Stars on Social Media},
author={Oh, Soyoung and Kim, Jisu and Lee, Seungpeel and Park, Eunil},
booktitle={Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media},
pages={170--177},
year={2021}
}