Skip to content
NBSVM baseline for topic and sentiment classification.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


An sklearn-compatible classifier for benchmarking NLP classification problems. The model used is the NBSVM described in section 2.3 of the paper Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. The authors provide their own (matlab) implementation.


Simply clone the repo, cd into the project root directory and install into a python environment with pip install . For example:

python3 -m venv venv
source venv/bin/activate
git clone
cd nbsvm
pip install -r requirements.txt
pip install .


The NBSVM classifier is intended to be used on features transformed by either CountVectorizer or TfidfVectorizer.

Example usage looks like this:

from nbsvm import NBSVM

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

news = fetch_20newsgroups()

vectorizer = CountVectorizer(binary=True)

X = vectorizer.fit_transform(
y =

model = NBSVM(), y)


There are a handful of unit tests for the public interface of the NBSVM class. To run these locally, install the dependencies in requirements.txt into a clean environment and simply call pytest in the root directory of the project. The first time the tests run, they will fetch a subset of the 20newsgroups dataset, which may take a few moments. Tests should run in seconds after the initial download. By default, the data will download to ~/scikit_learn_data (in your home directory), which can be changed by modifying the source.

You can’t perform that action at this time.