Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add least squares regression cookbook #3064

Merged
merged 1 commit into from Mar 14, 2016

Conversation

sanuj
Copy link
Contributor

@sanuj sanuj commented Mar 12, 2016

I'm not sure if i should add the shogun-data subproject thing in this commit.
@karlnapf Please review.

@@ -1 +1 @@
Subproject commit c70a2dc726f7dfb6d60813e87fbd5a3fe3372069
Subproject commit 0d5c0f60c839d5ddbbff61735a713cf17c209e9a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does the job for data

@karlnapf
Copy link
Member

The above build automatically generates this

@karlnapf
Copy link
Member

I will review this soon, gotta run now

Example
-------

Imagine we have files with training and test data. We create `CDenseFeatures` (here 64 bit floats aka RealFeatures) and :sgclass:`CRegressionLabels` as
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting :sgclass:CDenseFeatures gives link error. It's not there in knn.sg also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.shogun-toolbox.org/CDenseFeatures redirects to http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDenseFeatures.html which does not exist. The correct link for CDenseFeatures is http://www.shogun-toolbox.org/doc/en/latest/singletonshogun_1_1CDenseFeatures.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is a bug in doxygen. For now we cannot link against CDenseFeatures. But that is easy to fix via grep later

@sanuj sanuj force-pushed the cookbook branch 3 times, most recently from 7879a8a to 4ca7374 Compare March 12, 2016 19:03
Least Squares Regression
========================

A Linear regression model can be defined as :math:`y_i = \bf{w}.\bf{x_i}` where :math:`y_i` is the predicted value, :math:`\bf{x_i}` is a feature vector and :math:`\bf{w}` is the weight vector. We aim to find the linear function that best explains the data, i.e. minimizes the error or loss function :math:`E(\bf{w})` by finding appropriate :math:`\bf{w}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either use \cdot or eve better w^\top x_i (also below)

@karlnapf
Copy link
Member

Good page!
Sorry about all the change requests ... I am still making up my mind about how to do this best. But I think we are almost there.

BTW, also please add a test file in data/testsuite/meta/regression/least_squares_regression.dat in the data repository

One can differentiate :math:`E(\bf{w})` with respect to :math:`\bf{w}` and equate to zero to determine the :math:`\bf{w}` that minimizes :math:`E(\bf{w})`. This leads to solution of the form:

.. math::
{\bf w} = \left(\sum_{i=1}^N{\bf x}_i{\bf x}_i^T\right)^{-1}\left(\sum_{i=1}^N y_i{\bf x}_i\right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw here you used ^T, which is not consistent with the above. Also in LaTeX, this is ^\top

@karlnapf
Copy link
Member

OK one more thing. This class is just a wrapper for linear ridge regression with 0 regulariser.
Could you just write a page for the ridge regression, and then mention that there is a wrapper class called LeastSquaresRegression which just sets the regulariser to 0?

@sanuj sanuj force-pushed the cookbook branch 3 times, most recently from cba74f3 to c6b85d6 Compare March 13, 2016 19:22
@sanuj
Copy link
Contributor Author

sanuj commented Mar 13, 2016

@karlnapf Updated the commit.


where :math:`t_i` is a true label and :math:`N` is the number of testing samples.

:sgclass:`CLeastSquaresRegression` is a wrapper class for :sgclass:`CLinearRidgeRegression` with regularization coefficient :math:`\tau = 0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry you got me wrong here. I meant, rename the example to be one for CLinearRidgeRegression (including the math, which is almost the same), and then mention that LeastSquares is a special case -- rather than the other way around. Get what I mean?

@karlnapf
Copy link
Member

Sweet, this is starting to look good.
I wrote some more comments, mostly minor -- but I really want to get the first few pages right so that people have an easier time writing the next ones.
Most importantly, make this an example for ridge regression. No need to have one for least squares. You can just mention this in the text. Ah I just realise that the data file then also has to be renamed and re-calculated -- I should not have merged it

@sanuj
Copy link
Contributor Author

sanuj commented Mar 14, 2016

@karlnapf sorry i didn't pay attention to the mse-accuracy thing. Everything has been corrected now.

A linear ridge regression model can be defined as :math:`y_i = \bf{w}^\top\bf{x_i}` where :math:`y_i` is the predicted value, :math:`\bf{x_i}` is a feature vector and :math:`\bf{w}` is the weight vector. We aim to find the linear function that best explains the data, i.e. minimizes the error or loss function :math:`E(\bf{w})` by finding appropriate :math:`\bf{w}`.

.. math::
E({\bf{w}}) = \sum_{i=1}^N(y_i-{\bf w}^\top {\bf x}_i)^2 + \tau||{\bf w}||^2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go, see my earlier comment


.. sgexample:: linear_ridge_regression.sg:create_features

We create an instance of :sgclass:`CLinearRidgeRegression` classifier, passing it training data and labels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf
Copy link
Member

All comments are minor. The data file does not have to be touched anymore. Ill merge data and wait for the final update on this one.

NICE! :)

@sanuj
Copy link
Contributor Author

sanuj commented Mar 14, 2016

@karlnapf updated but the one travis build got terminated.

@vigsterkr
Copy link
Member

@sanuj there's something else with this because i've re-ran the gcc task and it's still timing out.

Linear Ridge Regression
=======================

A linear ridge regression model can be defined as :math:`y_i = \bf{w}^\top\bf{x_i}` where :math:`y_i` is the predicted value, :math:`\bf{x_i}` is a feature vector and :math:`\bf{w}` is the weight vector. We aim to find the linear function that best explains the data, i.e. minimizes the error or loss function :math:`E(\bf{w})` by finding appropriate :math:`\bf{w}`. One can show the solution can be written as:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E(w) is not defined and should not be used. Just say "minimizes the squared loss plus a :math:L_2 regularisation term"

(also remove: by finding appropriate w)

@karlnapf
Copy link
Member

no idea what is wrong with travis?

@karlnapf
Copy link
Member

the knn example times out ...

@karlnapf
Copy link
Member

I just tried this locally and it did work. No idea :(
Just compiling locally with the same cmake options

@karlnapf
Copy link
Member

Very weird. travis passed after a few restarts....I'll merge for now

@vigsterkr any ideas?

karlnapf added a commit that referenced this pull request Mar 14, 2016
add least squares regression cookbook
@karlnapf karlnapf merged commit 32270be into shogun-toolbox:develop Mar 14, 2016
@karlnapf
Copy link
Member

Something cheesy is going on there. The merged build failed.
I then managed to reproduce this freeze locally, running cmake with only -DENABLE_TESTING=ON
@sanuj can you investigate this a bit? I will also tomorrow

@karlnapf
Copy link
Member

This has high priority as the develop build is broken

@vigsterkr
Copy link
Member

i told you not to merge this....

@sanuj
Copy link
Contributor Author

sanuj commented Mar 15, 2016

All tests pass with -DENABLE_TESTING=ON:

.
.
.
        Start 275: generated_cpp-kernel_ridge_regression
275/275 Test #275: generated_cpp-kernel_ridge_regression ..............................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 275

Total Test time (real) =  15.84 sec

With -DENABLE_TESTING=ON and -DPythonModular=ON a total of 209 tests fail (python modular and generated python) on my local. This has happened before and got fixed automatically but it's happening again (I don't know why).
In both the cases, sometimes the build gets stuck on Unit-LaRank.
@karlnapf do you think updating the data dir with new commits can fix it?

@karlnapf
Copy link
Member

@sanuj @vigsterkr the timeout has nothing to do with this cookbook page here -- it happened before and it is the knn that times out. @sanuj It also has nothing to do with modular tests failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants