Skip to content

merry555/Jujeop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

We provide the first definition of the Jujeop comments (주접 댓글) in South Korea and human-annotated 8.6K Jujeop corpus. Jujeop is a type of pun and a unique way for fans to express their love for the K-pop stars they follow using Korean. One of the unique characteristics of Jujeop is its use of exaggerated expressions to compliment K-pop stars, which contain or lead to humor. Based on this characteristic, Jujeop can be separated into four distinct types, with their own lexical collocations:

(1) Fragmenting words to createa twist, (2) Homophones and Homographs, (3) Repetition, and (4) Nonsense.

Jujeop Data Description

The dataset for each result condition can be downloaded by running the file in the dataset directory. All the Jujeop file consist of .txt file type that include title, text, label and type. Not Jujeop data file is provided as not_jujeop.txt that also includes title, text, label, type. Additionally, we also provide a video channel list file as channel.txt that includes youtube video query. We uploaded the Youtube crawler crawler.py, we implemented to collect video title, comments, user name, and number of likes.

Fragmenting words to create a twist Homophones and Homographs
The comments in this type intentionally fragment aspecific word and extract/concentrate a single character from the word to disguise the word’s full meaning (e.g., ‘pretty’ to ‘t’), in order to create a twist in the sentence meaning. The examples are attached as below. Users can employ specific lexical features of homophones and homographs to make a Jujeop comment. After a user makes his/her first sentence with the original meanings of words, they employ other word meanings in the second sentence to compliment the K-pop stars while allowing other users to enjoy the fun.
Repetition Nonsense
This is a type of repetition of thesame phrase. As presented in the following example, the comments in this type employ repetition to emphasize the complimentary meanings on the K-pop stars. The comments in this type includethe K-pop stars within fictions. The majority of such comments flatter the stars by using exaggerated and almost nonsensical, over the top expressions.

Experiment

We employed deep neural network models to classify Jujeop for verification of the annotated corpus quality.

Within the models/binary folder we uploaded classification models to binarize comments into Jujeop and non-Jujeop types. Additionally, we conducted multi-class classification for each Jujeop type which uploaded in models/multiclass folder. We're always welcome to get feedback for improving model performance! 😊

Requirements

  • Python >= 3.6
  • TensorFlow >= 1.7
  • Keras >= 2.1.5

If you want to implement KoBERT

Reference

@inproceedings{oh2021jujeop,
  title={Jujeop: Korean Puns for K-pop Stars on Social Media},
  author={Oh, Soyoung and Kim, Jisu and Lee, Seungpeel and Park, Eunil},
  booktitle={Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media},
  pages={170--177},
  year={2021}
}

         

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages