Linear and quadratic discriminant analysis gem
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Harlequin is a gem that allows easy access to the linear and quadratic discriminant analysis functions of R. To use harlequin, initialize a DiscriminantAnalysis object with an array of variable names for analysis, and a classification variable name as a second argument, like so:

analysis =[:weight, :height], :gender)

Training rows should be formatted as hashes with pairs of the form variable_name => value. For example, we can add some rows to the analysis above with

                           { :weight => 200, :height => 72, :gender => 'male' },
                           { :weight => 205, :height => 71, :gender => 'male' },
                           { :weight => 140, :height => 63, :gender => 'female'},
                           { :weight => 130, :height => 61, :gender => 'female'}

(Note that there must be more than 1 of each classification value represented in the training data, and variable values must not be constant within a class.)

Initialize linear or quadratic analysis with #init_lda_analysis or #init_qda_analysis, respectively. Then we can predict the class of new rows, also given as hashes:

analysis.predict(:weight => 180, :height => 68) # => {:class=>"male", :confidence=>0.9999999999666846}

Multiple predictions can be computed at once in the same way as adding multiple training rows.

In order to assess the effectiveness of adding a variable, the DiscriminantAnalysis class includes access to the two-sample t-test for difference in means between classes. This currently works for binary classification only.

analysis.t_test(:weight) # => {:t_statistic=>12.0748, :degrees_of_freedom=>1.471, :p_value=>0.01898}


A Ruby script using Harlequin requires an R instance, so make sure you have a working copy of R installed on your system. The OSX binaries for R can be found here. See the documentation for Rinruby for more details.

You will also need the additional R packages MASS and alr3. These can be installed with the R command line by first choosing a mirror with chooseCRANmirror() and then installing with install.packages(c("MASS"), c("alr3")).