Skip to content

gennsev/tclass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tclass

Text classifier tool.

Data

Train and test data from https://www.cs.umb.edu/~smimarog/textmining/datasets/

Text Datasets

EN_5BBC_labels is source data from public data set on BBC news articles: D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. [PDF] [BibTeX].http://mlg.ucd.ie/datasets/bbc.html Get Cleaned up version : https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv

EN_5H_labels contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost. Data from https://www.huffpost.com/

EN_4_labels(CTMH) is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity.Data from http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.

References

About

Text classifier tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published