New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #2747 - Using linalg everywhere #4058
Conversation
Thanks for the patch! Please take a look at the Travis CI status, the tests are apparently failing. @karlnapf can you take a look? |
@lisitsyn I have been through the travis build. The failure is caused by the style checker. I'll push another commit shortly to fix it. Sorry :( |
point[0] = 0; | ||
cov(0,0) = 1.0; | ||
|
||
CGaussian *gauss = new CGaussian(mean, cov, FULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do auto gaussian = some<CGaussian>(mean, cov, FULL)
SGMatrix<float64_t> cov(2,2); | ||
|
||
mean[0] = mean[1] = 0; | ||
// Random variables are independent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this comment
cov(1,1) = 1.0; | ||
|
||
auto gauss = new CGaussian(mean, cov, FULL); | ||
// Find the log of the distribution function at the mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for comment
// Find the log of the distribution function at the mean | ||
// position. | ||
float64_t log_pdf = gauss->compute_log_PDF(mean); | ||
EXPECT_NEAR(log_pdf, -1.83787706641, 1e-8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float64 is 16 digits, not just 8
|
||
auto mean = gauss->get_mean(); | ||
auto cov = gauss->get_cov(); | ||
EXPECT_NEAR(mean[0], 0.0, 1e-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather compute the mean by hand and check whether it is the same here.
CGaussian *gauss = new CGaussian(); | ||
auto train_features = new CDenseFeatures<float64_t>(data); | ||
gauss->train(train_features); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments above, also hold here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks for the patch.
I made a few comments, let me know if you need help addressing them
@@ -0,0 +1,149 @@ | |||
/* | |||
* Copyright (c) The Shogun Machine Learning Toolbox | |||
* Written (w) 2014 Parijat Mazumdar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put your name here :)
@karlnapf I am getting precision errors when testing with eps set to 1e-16. I checked the DotFeatures unit tests and they are only checked for 1e-8. |
@karlnapf I'm also concerned about the usefulness of these tests. The Gaussian mean and covariance matrices are generated from DotFeatures, and so this feels like a more elaborate test for DotFeatures than for Gaussian. |
TEST(Gaussian, train_univariate) | ||
{ | ||
float64_t eps = 1e-8; | ||
sg_rand->set_seed(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need a fixed seed when you test for the estimation? Should be ok everytime no?
auto mean = gauss->get_mean(); | ||
auto cov = gauss->get_cov(); | ||
|
||
for (int32_t i = 0; i < sample_size; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for auto i : range(sample_size)
... minor
SGMatrix<float64_t> data(1, 500); | ||
|
||
int64_t sample_size = 500; | ||
float64_t mn = 0.0, cv = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like this variable names. What about something meaningful? mean and covariance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mu and Sigma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "mean_est" or so
|
||
SGMatrix<float64_t> data(2, sample_size); | ||
|
||
for(int32_t i = 0; i < 2; i++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr don't we have ways to fill arrays with random gaussian numbers?
linalg::zero(sample_cov); | ||
|
||
for (int32_t i = 0; i < sample_size; i++) | ||
train_features->add_to_dense_vec(1.0, i, sample_mean.vector, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, this should not be in here
I think there are linalg methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the tests need some.
Actually, you KNOW the true mean and covariance, why don't you test against those if you are using EXPECT_NEAR
with low epsilon
If you really want to check the exact thing, I would rather compute the mean and cov for small examples by hand (pen and paper!) and then compare against that.
your examples/tests are too big
One thing you could test for is buffer overflows/underflows for really large amounts of data (and make sure they work)
But this is already improving, let's iterate a few more times, and this will be good to merge. |
This PR is a work in progress which will transition shogun to using linalg for linear algebra operations. Minimal unit tests will be written for required classes.
#2747