Skip to content
/ CMCD Public

A Cross-Modal Classification Dataset on Social Network (NLPCC2020)

Notifications You must be signed in to change notification settings

nghuyong/CMCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

A Cross-Modal Classification Dataset on Social Network

Intro

图片名称

The Cross-Modal Classification Dataset (CMCD) is a large-scale dataset constructed from Weibo, which consists of 85,860 tweets from 18 general categories, and all of them have been manually labelled and adversarial filtered. Tweets in the dataset has three modalities of text, image and video, and 64.4% of tweets contain images and 16.2% of tweets contain videos. We hope this dataset could promote the research on cross-modal classification on social network.

Download

To respect the privacy of personal information of the original source, we cannot give directly download link. If you want to acquire the corpus, please fill the application form and send to Yong Hu (huyong@bit.edu.cn)

Citation

@InProceedings{10.1007/978-3-030-60450-9_55,
  author="Hu, Yong
  and Huang, Heyan
  and Chen, Anfan
  and Mao, Xian-Ling",
  editor="Zhu, Xiaodan
  and Zhang, Min
  and Hong, Yu
  and He, Ruifang",
  title="A Cross-Modal Classification Dataset on Social Network",
  booktitle="Natural Language Processing and Chinese Computing",
  year="2020",
  publisher="Springer International Publishing",
  address="Cham",
  pages="697--709",
  isbn="978-3-030-60450-9"
}

About

A Cross-Modal Classification Dataset on Social Network (NLPCC2020)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published