A Cross-Modal Classification Dataset on Social Network

Intro

The Cross-Modal Classification Dataset (CMCD) is a large-scale dataset constructed from Weibo, which consists of 85,860 tweets from 18 general categories, and all of them have been manually labelled and adversarial filtered. Tweets in the dataset has three modalities of text, image and video, and 64.4% of tweets contain images and 16.2% of tweets contain videos. We hope this dataset could promote the research on cross-modal classification on social network.

Download

To respect the privacy of personal information of the original source, we cannot give directly download link. If you want to acquire the corpus, please fill the application form and send to Yong Hu (huyong@bit.edu.cn)

Citation

@InProceedings{10.1007/978-3-030-60450-9_55,
  author="Hu, Yong
  and Huang, Heyan
  and Chen, Anfan
  and Mao, Xian-Ling",
  editor="Zhu, Xiaodan
  and Zhang, Min
  and Hong, Yu
  and He, Ruifang",
  title="A Cross-Modal Classification Dataset on Social Network",
  booktitle="Natural Language Processing and Chinese Computing",
  year="2020",
  publisher="Springer International Publishing",
  address="Cham",
  pages="697--709",
  isbn="978-3-030-60450-9"
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

readme.md

readme.md

Repository files navigation

A Cross-Modal Classification Dataset on Social Network

Intro

Download

Citation

About

Releases

Packages

nghuyong/CMCD

Folders and files

Latest commit

History

.github

.github

readme.md

readme.md

Repository files navigation

A Cross-Modal Classification Dataset on Social Network

Intro

Download

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages