Skip to content

enesdoruk/Reuters-Text-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This project comprises the first homework assignment for CS 549, where the renowned Reuters dataset is utilized. The primary objective is to explore various preprocessing techniques applied to the dataset and construct unigram, bigram, and trigram models.

The project involves the following key steps:

  • Utilizing the Reuters dataset
  • Applying a variety of preprocessing methods
  • Building unigram, bigram, and trigram models
  • Evaluating model performance using a test set
  • Employing metrics such as recall, precision, and F1 score for assessment

This homework assignment serves as an introduction to text data preprocessing and n-gram modeling techniques, with a focus on practical implementation and evaluation using real-world data.

Installation

This project is compiled by python 3.8

pip install -r requirements.txt

Run

Before run the code, you should change dataset path in main.py file. default is 'path = 'reuters21578''

python main.py

main.py file print metrics to the terminal.

About

Hand Crafted Based Text Classification on Reuters Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages