Skip to content

In this project, Naive Bayes and Logistic Regression models are used to develop a text classification system for Turkish news articles.

License

Notifications You must be signed in to change notification settings

sdakansu/Naive-Bayes-Text-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naive Bayes and Logistic Regression Text Classification

Input

There are train and test datasets. Input format is following: id,text,label

Dataset is obtained from SuDer Turkish News Collections.

Preprocessing

  • Lowercase conversion
  • Category --> Integer
  • Tokenize

TFIDFVectorizer

Term Frequency - Inverse Document Frequency is a type of word representation according to word frequency and document frequency. It converts words to numerical vectors. Each vector represents a word. Therefore we can obtain a vector space that represents words. For more information, click here. Also package is accessible here.

GridSearchCV

GridSearchCV finds the best combination of given parameters. It is used for both Naive Bayes and Logistic Regression. For more information, you can click here.

Results

Results are measured through test data. Naive Bayes has an accuracy of 0.702 and logistic regression has an accuracy of 0.824.

About

In this project, Naive Bayes and Logistic Regression models are used to develop a text classification system for Turkish news articles.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages