Classifies if your message is English, Singlish, or Indian English
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
multibayes @ 1715744
.gitignore
.gitmodules
README
classify.py

README

A classifier that determines if your message was in English, Singlish, or Indian English.

The most recent results:

Training set size: 31792, Test set size: 7949
5933 correct/1913 incorrect of 7846 examples (accuracy: 75.62%)

Misclassified Singlish as English: 753 times
Misclassified Indian English as Singlish: 441 times
Misclassified Indian English as English: 294 times
Misclassified Singlish as Indian English: 193 times
Misclassified English as Singlish: 188 times
Misclassified English as Indian English: 44 times

Which makes sense. It's easy for a Singlish writer to write proper English, but you'll never see an American or Brit write a Singlish sentence. The words, they only swing one way.