Dataset citeulike-t for CTRSR

This dataset, citeulike-t, was used in the paper 'Collaborative Topic Regression with Social Regularization' [Wang, Chen and Li]. It was collected from CiteULike and Google Scholar. CiteULike allows users to create their own collections of articles. There are abstracts, titles, and tags for each article. Other information like authors, groups, posting time, and keywords is not used in this paper 'Collaborative Topic Regression with Social Regularization' [Wang, Chen and Li]. The details can be found at http://www.citeulike.org/faq/data.adp.

It is collected by us independently from the dataset citeulike-a. We manually select 273 seed tags and collect all the articles with at least one of these tags. We also crawl the citations between the articles from Google Scholar. Note that the final number of tags associated with all the collected articles is far more than the number (273) of seed tags.

The text information (item content) of citeulike-a is preprocessed by following the same procedure as that in citeulike-a. After removing the stop words, we choose the top 20000 discriminative words according to the tf-idf values as our vocabulary.

Some statistics are listed as follows:

Entity	Total Number
users	7947
items	25975
tags	52946
citations	32565
user-item pairs	134860

DATA FILES citations.dat citations between articles tag-item.dat articles corresponding to tags, one line corresponds to articles relating to the same tags (note that it is different from the other dataset citeulike-a and that this is the version prior to preprocess thus would have more tags than used in the paper) mult.dat bag of words for each article rawtext.dat raw data tags.dat tags, sorted by tag-id's users.dat rating matrix (user-item matrix) vocabulary.dat corresponding words for file mult.dat

Reference:

Collaborative Topic Regression with Social Regularization

@inproceedings{DBLP:conf/ijcai/WangCL13,
  author    = {Hao Wang and
               Binyi Chen and
               Wu-Jun Li},
  title     = {Collaborative Topic Regression with Social Regularization
               for Tag Recommendation},
  booktitle = {IJCAI},
  year      = {2013}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset citeulike-t for CTRSR

See Also

Reference:

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
citations.dat		citations.dat
mult.dat		mult.dat
rawtext.dat		rawtext.dat
tag-item.dat		tag-item.dat
tags.dat		tags.dat
users.dat		users.dat
vocabulary.dat		vocabulary.dat

js05212/citeulike-t

Folders and files

Latest commit

History

Repository files navigation

Dataset citeulike-t for CTRSR

See Also

Reference:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages