Course project (COMP551) at McGill University, to classify movie reviews straight out of IMDb as positive or negative.
Implementation from scratch in Python 3 of a Bernouilli Naïve Bayes classifier and text-processing features, mainly lexicons and n-grams.
The course hosted a Kaggle competition. This model as-is, without external Machine Learning libraries got the following score on the test set.
Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA