tclass

Text classifier tool.

Data

Train and test data from https://www.cs.umb.edu/~smimarog/textmining/datasets/

Text Datasets

EN_5BBC_labels is source data from public data set on BBC news articles: D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. [PDF] [BibTeX].http://mlg.ucd.ie/datasets/bbc.html Get Cleaned up version : https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv

EN_5H_labels contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost. Data from https://www.huffpost.com/

EN_4_labels(CTMH) is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity.Data from http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.

References

Developed based on exercises from the NLP Udemy course by Lazy Programmer

Course URL:

https://deeplearningcourses.com/c/natural-language-processing-with-deep-learning-in-python

https://udemy.com/natural-language-processing-with-deep-learning-in-python

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
Data		Data
Extractor		Extractor
FasttextClassifier		FasttextClassifier
Notebooks		Notebooks
glove		glove
metrics		metrics
word2vec		word2vec
.gitignore		.gitignore
README.md		README.md
tclass.py		tclass.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tclass

Data

Text Datasets

References

About

Releases

Packages

Contributors 2

Languages

gennsev/tclass

Folders and files

Latest commit

History

Repository files navigation

tclass

Data

Text Datasets

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages