Programming Language Classifier
Switch branches/tags
Nothing to show
Clone or download
Latest commit 461a5da Aug 3, 2015

Mockingbird Build Status


Linguist's Classifier in Go.

Linguist can be used as a Go package by

import ""

and it also has a CLI (command line interface) in cli/

$ cd cli/
$ ./
$ ./mockingbird --help

Command Line Interface Usage

Preparing LIBSVM format dataset

Collect Rosetta Code

  1. Clone the RosettaCodeData
git clone
  1. Build this cli executable
cd cli/
  1. Run the collectRosetta according to the cloned RosettaCodeData, and collect files to ../samples
./mockingbird collectRosetta path/to/clones/RosettaCodeData ../samples

Build Bag-of-Words and Convert Samples to Libsvm

Build from scratch

./mockingbird convertLibsvm ../samples ../

This will save libsvm.samples and bow.gob to ../. The bow.gob is the parameters for constructing bag-of-words. This can be used afterward:

./mockingbird convertLibsvm ../samples ../ --bowPath ../bow.gob

Train and Predict


For example, train a logisitic regression classifier:

./mockingbird train --sample=./test_fixture/test_samples.libsvm --solver 1

This will save a model file in $PWD/model/lr.model, which can be used in later prediction.

Full usage:

usage: mockingbird train [<flags>]

Train Classifier

  --help            Show help (also see --help-long and --help-man).
                    Path for samples (in libsvm format)
  --output="model"  Path for saving trained model
  --solver=0        0 = NaiveBayes, 1 = LogisticRegression


For example, make prediction via previously trained logisitic regression classifier:

./mockingbird predict --model=./model/lr.model --data=./test_fixture/test_samples.libsvm --solver=1

Full usage:

usage: mockingbird predict --data=DATA [<flags>]

Predict via trained Classifier

  --help       Show help (also see --help-long and --help-man).
               Path for loading saved model
  --data=DATA  Path for testing data (in libsvm format)
  --solver=0   0 = NaiveBayes, 1 = LogisticRegression