-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3219 from Saurabh7/ldacb
lda cookbook
- Loading branch information
Showing
3 changed files
with
86 additions
and
1 deletion.
There are no files selected for viewing
Submodule data
updated
2 files
+12 −0 | testsuite/meta/classifier/LDA.dat | |
+10 −0 | testsuite/meta/clustering/hierarchical.dat |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
============================= | ||
Linear Discriminant Analysis | ||
============================= | ||
|
||
LDA learns a linear classifier via finding a projection matrix that maximally discriminates the provided classes. The learned linear classification rule is optimal under the assumption that both classes a gaussian distributed with equal co-variance. To find a linear separation :math:`{\bf w}` in training, the in-between class variance is maximized and the within class variance is minimized. | ||
The projection matrix is computed by maximizing the following objective: | ||
|
||
.. math:: | ||
J({\bf w})=\frac{{\bf w^T} S_B {\bf w}}{{\bf w^T} S_W {\bf w}} | ||
where :math:`{\bf S_B}` is between class scatter matrix and :math:`{\bf S_W}` is within class scatter matrix. | ||
The above derivation of LDA requires the invertibility of the within class matrix. This condition however, is violated when there are fewer data-points than dimensions. In this case SVD is used to compute projection matrix using an orthonormal basis :math:`{\bf Q}` | ||
|
||
.. math:: | ||
{\bf W} := {\bf Q} {\bf{W^\prime}} | ||
See Chapter 16 in :cite:`barber2012bayesian` for a detailed introduction. | ||
|
||
------- | ||
Example | ||
------- | ||
|
||
We create CDenseFeatures (here 64 bit floats aka RealFeatures) and :sgclass:`CBinaryLabels` from files with training and test data. | ||
|
||
.. sgexample:: lda.sg:create_features | ||
|
||
We create an instance of the :sgclass:`CLDA` classifier and set features and labels. By default, Shogun automatically chooses the decomposition method based on :math:`{N<=D}` or :math:`{N>D}`. | ||
|
||
.. sgexample:: lda.sg:create_instance | ||
|
||
Then we train and apply it to test data, which here gives :sgclass:`CBinaryLabels`. | ||
|
||
.. sgexample:: lda.sg:train_and_apply | ||
|
||
We can extract weights :math:`{\bf w}`. | ||
|
||
.. sgexample:: lda.sg:extract_weights | ||
|
||
We can evaluate test performance via e.g. :sgclass:`CAccuracyMeasure`. | ||
|
||
.. sgexample:: lda.sg:evaluate_accuracy | ||
|
||
---------- | ||
References | ||
---------- | ||
:wiki:`Linear_discriminant_analysis` | ||
|
||
.. bibliography:: ../../references.bib | ||
:filter: docname in docnames |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
CSVFile f_feats_train("../../data/classifier_binary_2d_linear_features_train.dat") | ||
CSVFile f_feats_test("../../data/classifier_binary_2d_linear_features_test.dat") | ||
CSVFile f_labels_train("../../data/classifier_binary_2d_linear_labels_train.dat") | ||
CSVFile f_labels_test("../../data/classifier_binary_2d_linear_labels_test.dat") | ||
|
||
#![create_features] | ||
RealFeatures features_train(f_feats_train) | ||
RealFeatures features_test(f_feats_test) | ||
BinaryLabels labels_train(f_labels_train) | ||
BinaryLabels labels_test(f_labels_test) | ||
#![create_features] | ||
|
||
#![create_instance] | ||
LDA lda() | ||
lda.set_features(features_train) | ||
lda.set_labels(labels_train) | ||
#![create_instance] | ||
|
||
#![train_and_apply] | ||
lda.train() | ||
BinaryLabels labels_predict = lda.apply_binary(features_test) | ||
#![train_and_apply] | ||
|
||
#![extract_weights] | ||
RealVector w = lda.get_w() | ||
#![extract_weights] | ||
|
||
#![evaluate_accuracy] | ||
AccuracyMeasure eval() | ||
real accuracy = eval.evaluate(labels_predict, labels_test) | ||
#![evaluate_accuracy] | ||
|
||
#additional integration testing variables | ||
RealVector output = labels_predict.get_labels() |