Text Classifier (Decision Tree)

Overview

This project implements a text classifier in Java using a decision-tree representation. The classifier is trained on labeled text data loaded from CSV files, builds a tree using word-frequency features and numeric thresholds, and can classify new text inputs by traversing the learned tree.

The classifier also supports saving and reloading trained models using a preorder tree format.

This project was completed as part of CSE 123 (Computer Programming) at the University of Washington.

Features

Decision-tree classifier with recursive traversal
Word-frequency feature extraction
Threshold-based branching decisions
CSV-based training and testing data loading
Save and reload trained classifier trees
Accuracy evaluation on test datasets

Key Concepts Used

Binary trees
Recursion
Object-oriented design
File I/O
Feature extraction from text
Data-driven algorithm design

Project Structure

Classifier.java // Core classifier logic and tree operations
TextBlock.java // Represents labeled text and word-frequency data
DataLoader.java // Loads training and testing data
CsvReader.java // Parses CSV files into usable records
Client.java // Runs training, testing, and classification

How It Works

Text Representation

Each document is represented as a TextBlock.
A TextBlock stores:
- A label (classification category)
- A mapping of words to occurrence counts
- The total number of words in the document
Word probabilities are computed as: count(word) / totalWords

Training the Classifier

The classifier is trained on a list of labeled TextBlock objects.
Internal decision nodes store:
A feature word
A numeric threshold
During training, examples that are misclassified cause the tree to grow by introducing a new decision node that separates examples based on whether the word probability meets the threshold.

Classification

To classify new text, the classifier:

Starts at the root of the tree
Evaluates the stored feature against its threshold
Recursively traverses left or right
Returns the label stored at the leaf node

Saving and Loading Models

The classifier can be saved to a file using a preorder traversal format.
Each node is written as either:
A feature/threshold pair (branch node), or
A label (leaf node)
The saved file can later be reloaded to reconstruct the exact same tree structure.

Example Workflow

Load training data from a CSV file
Train the classifier on labeled text examples
Evaluate accuracy on a separate test dataset
Save the trained classifier to disk
Reload the classifier and classify new inputs

What I Learned

How decision trees encode conditional logic
How recursion simplifies tree traversal and reconstruction
How feature selection and thresholds affect classification behavior
How to design programs that separate data loading, modeling, and execution

Future Improvements

Support multi-feature splits
Improve feature selection heuristics
Visualize the decision tree structure
Add probabilistic confidence scores to classifications

Notes

This project was completed as coursework. All implementation and design decisions are my own.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Classifier (Decision Tree)

Overview

Features

Key Concepts Used

Project Structure

How It Works

Text Representation

Training the Classifier

Classification

Saving and Loading Models

Example Workflow

What I Learned

Future Improvements

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Classifier.java		Classifier.java
Client.java		Client.java
CsvReader.java		CsvReader.java
DataLoader.java		DataLoader.java
README.md		README.md
TextBlock.java		TextBlock.java

kkirke/classifier

Folders and files

Latest commit

History

Repository files navigation

Text Classifier (Decision Tree)

Overview

Features

Key Concepts Used

Project Structure

How It Works

Text Representation

Training the Classifier

Classification

Saving and Loading Models

Example Workflow

What I Learned

Future Improvements

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages