Sentiment Classification with Numpy

A multiclass sentiment analysis model built entirely with NumPy, no PyTorch, no TensorFlow. This project implements tokenization, word embeddings, a forward pass, backpropagation, and evaluation from the ground up.

Built as a learning exercise to understand what happens under the hood of modern ML frameworks.

How it works

Text → Tokenization → Word Embeddings → Mean Pooling → Linear Layer → Softmax → Prediction

Raw text is cleaned and tokenized into vocabulary indices
Each word is mapped to a 10-dimensional embedding vector, initialized randomly
Sentence representation is computed by averaging all word embeddings
A linear classifier maps the sentence vector to class probabilities via softmax
Weights, biases, and embeddings are updated each epoch through gradient descent and backpropagation
Training stops when parameter updates fall below a set tolerance

Requirements

numpy
pandas

Install with:

pip install numpy pandas

Usage

Place your dataset as sentiment_analysis.csv in the project root. The file should have two columns: text and sentiment.

# Train
check = Model(mode="train", learning_step=1, tolerance=1e-3)
check.Training()
check.Evaluate(check.weights, check.bias, check.embedding)

# Evaluate on test set
check_test = Model(mode="test")
check_test.Evaluate(check.weights, check.bias, check.embedding)

Run with:

python sentiment_classification.py

Results

Split	Accuracy
Training	94.4%
Test	22.0%

The gap between training and test accuracy is expected given the simplicity of the architecture — mean pooling loses word order and a 10-dimensional embedding is a limited representation. This is discussed in detail in the accompanying blog post.

Blog

Full walkthrough: Building a Sentiment Analysis Model from Scratch

Limitations

No sequence modeling — word order is lost through mean pooling
Small embedding dimension (10) limits representational capacity
Linear classifier only — no hidden layers
No out-of-vocabulary handling

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
sentiment_analysis.csv		sentiment_analysis.csv
sentiment_classifcation.py		sentiment_classifcation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Classification with Numpy

How it works

Requirements

Usage

Results

Blog

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment Classification with Numpy

How it works

Requirements

Usage

Results

Blog

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages