Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #2747 - Using linalg everywhere #4058

Closed
wants to merge 4 commits into from

Conversation

ckousik
Copy link
Contributor

@ckousik ckousik commented Dec 29, 2017

This PR is a work in progress which will transition shogun to using linalg for linear algebra operations. Minimal unit tests will be written for required classes.
#2747

@lisitsyn
Copy link
Member

lisitsyn commented Jan 1, 2018

Thanks for the patch!

Please take a look at the Travis CI status, the tests are apparently failing.

@karlnapf can you take a look?

@ckousik
Copy link
Contributor Author

ckousik commented Jan 1, 2018

@lisitsyn I have been through the travis build. The failure is caused by the style checker. I'll push another commit shortly to fix it. Sorry :(

point[0] = 0;
cov(0,0) = 1.0;

CGaussian *gauss = new CGaussian(mean, cov, FULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do auto gaussian = some<CGaussian>(mean, cov, FULL)

SGMatrix<float64_t> cov(2,2);

mean[0] = mean[1] = 0;
// Random variables are independent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this comment

cov(1,1) = 1.0;

auto gauss = new CGaussian(mean, cov, FULL);
// Find the log of the distribution function at the mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for comment

// Find the log of the distribution function at the mean
// position.
float64_t log_pdf = gauss->compute_log_PDF(mean);
EXPECT_NEAR(log_pdf, -1.83787706641, 1e-8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float64 is 16 digits, not just 8


auto mean = gauss->get_mean();
auto cov = gauss->get_cov();
EXPECT_NEAR(mean[0], 0.0, 1e-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather compute the mean by hand and check whether it is the same here.

CGaussian *gauss = new CGaussian();
auto train_features = new CDenseFeatures<float64_t>(data);
gauss->train(train_features);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments above, also hold here

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for the patch.
I made a few comments, let me know if you need help addressing them

@@ -0,0 +1,149 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
* Written (w) 2014 Parijat Mazumdar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put your name here :)

@ckousik
Copy link
Contributor Author

ckousik commented Jan 6, 2018

@karlnapf I am getting precision errors when testing with eps set to 1e-16. I checked the DotFeatures unit tests and they are only checked for 1e-8.

@ckousik
Copy link
Contributor Author

ckousik commented Jan 6, 2018

@karlnapf I'm also concerned about the usefulness of these tests. The Gaussian mean and covariance matrices are generated from DotFeatures, and so this feels like a more elaborate test for DotFeatures than for Gaussian.

TEST(Gaussian, train_univariate)
{
float64_t eps = 1e-8;
sg_rand->set_seed(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a fixed seed when you test for the estimation? Should be ok everytime no?

auto mean = gauss->get_mean();
auto cov = gauss->get_cov();

for (int32_t i = 0; i < sample_size; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for auto i : range(sample_size) ... minor

SGMatrix<float64_t> data(1, 500);

int64_t sample_size = 500;
float64_t mn = 0.0, cv = 0.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like this variable names. What about something meaningful? mean and covariance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mu and Sigma?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "mean_est" or so


SGMatrix<float64_t> data(2, sample_size);

for(int32_t i = 0; i < 2; i++){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr don't we have ways to fill arrays with random gaussian numbers?

linalg::zero(sample_cov);

for (int32_t i = 0; i < sample_size; i++)
train_features->add_to_dense_vec(1.0, i, sample_mean.vector, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, this should not be in here
I think there are linalg methods

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the tests need some.

Actually, you KNOW the true mean and covariance, why don't you test against those if you are using EXPECT_NEAR with low epsilon

If you really want to check the exact thing, I would rather compute the mean and cov for small examples by hand (pen and paper!) and then compare against that.
your examples/tests are too big

One thing you could test for is buffer overflows/underflows for really large amounts of data (and make sure they work)

@karlnapf
Copy link
Member

But this is already improving, let's iterate a few more times, and this will be good to merge.
THANKS! :)

@ckousik ckousik closed this Mar 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants