Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the CLibLinear test codes and reduced them from 1300 lines to 470. #4208

Merged
merged 9 commits into from Apr 4, 2018

Conversation

FaroukY
Copy link
Contributor

@FaroukY FaroukY commented Mar 17, 2018

Regarding issue #4207.

@FaroukY
Copy link
Contributor Author

FaroukY commented Mar 17, 2018

The Pull request seems to have duplicated the commits from the previous Pull request. The only files that should be changed here is the last one (LibLinear_unittest.cc). All the other commits are from other pull requests. I'll fix that in the next revision if any is needed.

@vigsterkr
Copy link
Member

@FaroukY yeah i think you forked your branch feature_fix_lib_linear_doc not from develop but from the other fix' branch. just checkout develop pull the latest changes and then simply cherry pick 505f6ab into that and forcepush it to your remote feature_fix_lib_linear_doc, this way you dont even need to create a new PR ;)

SGMatrix<float64_t> train_matrix = train_feats->get_feature_matrix();
SGMatrix<float64_t>::transpose_matrix(train_matrix.matrix,
train_matrix.num_rows, train_matrix.num_cols);
CLibLinear* ll = new CLibLinear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use some in many of these cases, so you dont need to call SG_UNREF :)
for example:
auto ll = some<CLibLinear>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

SG_UNREF(train_feats);
CBinaryLabels* pred = NULL;
float64_t liblin_accuracy;
CContingencyTableEvaluation* eval = new CContingencyTableEvaluation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above:
auto eval = some<CContingencyTableEvaluation>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ground_truth = new CBinaryLabels(labels);
ll->set_liblinear_solver_type(liblinear_solver_type);
ll->train();
pred = ll->apply_binary(test_feats);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be simply

auto pred = ll->apply_binary(test_feats);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then you can remove the declaration of CBinaryLabels* pred = NULL;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
labels[i] = (i < num_samples/2) ? 1.0 : -1.0;
}
CLibLinear* ll = new CLibLinear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above... this code shares quite a lot of lines with train_with_solver... we could combine them :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about this, but when I did it, it had a lot of if statements and it wasn't very readable (for a test case). Are we sure we wanna do this?

*/

index_t num_samples = 50;
CMath::init_random(5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sg_rand->set_seed plz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#include <shogun/classifier/svm/LibLinear.h>
#include <shogun/features/DataGenerator.h>
#include <shogun/features/DenseFeatures.h>
#include <shogun/evaluation/ContingencyTableEvaluation.h>
#include <shogun/mathematics/Math.h>
#include <gtest/gtest.h>
#include <map>
#include <string>
#include <vector>

using namespace shogun;

#ifdef HAVE_LAPACK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can go as this was required for the gaussian blob generator, but that's not LAPACK only anymore...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the string and vector in the tests, so I will only delete , is that okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lapack can still go imo, no?

@karlnapf
Copy link
Member

This needs a rebase as I fixed some missing SG_REF recently.
Should be pretty minimal.

@FaroukY
Copy link
Contributor Author

FaroukY commented Mar 25, 2018

@karlnapf Going through the diff, the only difference I see in this file is the SG_REF(pred). Is that safe to assume, or am I missing another SG_REF?

FaroukY added a commit to FaroukY/shogun that referenced this pull request Mar 25, 2018
@FaroukY
Copy link
Contributor Author

FaroukY commented Mar 25, 2018

Okay, So I verified it, rebased, and addressed the problems

@vigsterkr
Copy link
Member

@FaroukY ok in order to be able to merge this you need to remove the unrelated commits in this PR

@FaroukY
Copy link
Contributor Author

FaroukY commented Mar 29, 2018

@vigsterkr Removed all the unrelated commits. This should be okay to merge now.

SG_UNREF(test_feats);
SG_UNREF(ground_truth);
}
//Helper that tests can call to drastically reduce code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the comment and let the code speak for itself :) minor

@karlnapf
Copy link
Member

karlnapf commented Apr 2, 2018

The liblinear test segfaults under clang.
Could you pls run it with valgrind and make sure it neither leaks nor does uninitialised memory reads?
https://travis-ci.org/shogun-toolbox/shogun/jobs/359675411#L4561

Pls check those things yourself next time, it is all there....

@karlnapf
Copy link
Member

karlnapf commented Apr 2, 2018

Windows build passed, but that is since it doesnt have LAPACK installed and you have the test still guarded so it is not executed

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 3, 2018

@karlnapf @vigsterkr Got rid of the memory leak. It seems like Travis timed out on two jobs, but the remaining ones passed. Any ideas?

@karlnapf
Copy link
Member

karlnapf commented Apr 3, 2018

clang passed so the tests executed in there were fine. good.
the timeout is not your fault.

I think this is OK to be merged soon.
A made a few more comments inline

}
void generate_data_l1_simple()
{
generate_data_simple("L1_SIMPLE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not do this string argument to determine the behaviour of the function. Just split it a bit further into helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it and simplified the functions with extra helpers

{
generate_data_simple("L2_SIMPLE");
}
void generate_data(std::string type) //Type either "L1" or "L2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said this big function should be split into multiple helpers and pls not string arguments to determine behaviour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

SG_UNREF(pred);
}
SGMatrix<float64_t> data =
CDataGenerator::generate_gaussians(num_samples, 2, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you pls check whether this class is guarded with lapack, and if not, remove the lapack guard around the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't use lapack, so I removed the guard.

TEST(LibLinear,train_L2R_L2LOSS_SVC)
{
LIBLINEAR_SOLVER_TYPE liblinear_solver_type = L2R_L2LOSS_SVC;
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting of this comment is weird. Also the comment is superflous: if you name your variables nicely (they are) then the code is self explaining and we dont need this comment as it doesnt add any information

We have to transpose the data if its l2. If it is l1, then leave it as it is (Since this is the data of l1 originally)
*/
index_t num_samples = 10;
std::vector<int32_t> x{0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can initialise SGVector in the same way btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I replaced vector with SGVector and removed both includes for string and vector

if (type=="L1_SIMPLE")
train_data(x[i],y[i])=z[i];
else
train_data(y[i], x[i])=z[i]; //transpose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite cryptic, why not do a transpose_matrix call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I replaced this with transpose_matrix after the training data is filled in.

SGMatrix<float64_t>::transpose_matrix(train_matrix.matrix,
train_matrix.num_rows, train_matrix.num_cols);

SG_UNREF(train_feats);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing the old one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also CDenseFeatures<ST>::get_transposed()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you dont need an own method for this, just use the above call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the transpose_feats_matrix() method and replaced it with the calls with get_transposed().

ll->train();
pred = ll->apply_binary(test_feats);
SG_REF(pred);
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: could you make those 3 lines one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which 3 lines. Do you mean the if statement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment ... but it is minor

void initialize_train_and_test_and_ground_simple()
{
SGMatrix<float64_t> train_data(2, 10);
SGMatrix<float64_t> test_data(2, 10); //Always 2x10, doesn't matter l1 or l2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove this comment, it doesnt help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
void generate_data_l1_simple()
{
initialize_train_and_test_and_ground_simple();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about removing the _simple suffix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already initialize_train_and_test_and_ground() for the normal data, so I used the _simple suffix to indicate this is the simple data initializer (which is entirely different from the non simple one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I agree, I got rid of both initialize_train_and_test_and_ground and initialize_train_and_test_and_ground_simple. Thanks!

@@ -6,7 +6,6 @@
*
* Written (W) 2014 pl8787
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it, could you make the license BSD? just copy it from anothoer file and maintain the old authors (add yourself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed it to BSD license.

ll->set_bias_enabled(biasEnable);
ll->set_features(train_feats);
if (C_value)
ll->set_C(0.1,0.1); //Only in the case of L2R_L1LOSS_SVC_DUAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment doesnt help.
Also formatting issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
else
{
for(int i=0;i<t_w.vlen;i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (auto i : range(t_w.vlen)) ... but this is minor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it.

liblin_accuracy = eval->evaluate(pred, ground_truth);
SG_REF(ground_truth);
}
void generate_data_l1()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this wrapper, you could just call the thing inside :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
void generate_data_l2()
{
initialize_train_and_test_and_ground();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, the wrapper doesnt help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, did some renaming to get rid of initialize_train_and_test_and_ground() and initialize_train_and_test_and_ground_simple()

}
void generate_data_l2_simple()
{
initialize_train_and_test_and_ground_simple();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting is all over the place here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

TEST(LibLinear,simple_set_train_L2R_LR)
/*
* --------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! One more round and this should be ready!
Thanks!

…t_transposed, and removed entirely two functions since they were unnecessairy wrappers
@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 3, 2018

@karlnapf Done.

* Written (W) 2014 pl8787
* Authors:
* Written (W) 2014 pl8787
* Refactored (R) 2018 Elfarouk Yasser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add yourself as author in the standard way...makes later processing of headers easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

//Transpose the train_feats matrix
auto old_train_feats = train_feats;
train_feats = train_feats->get_transposed();
SG_UNREF(old_train_feats);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the unref here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf Doesn't get_transposed() create a new transposed matrix? So I'm just deallocating the old matrix (that i transposed)? Or does get_transposed() act in place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would just delete the old one from the same scope that you created it in (tearDown or whatever), but this is also minor.

Make sure to run the test with valgrind to see whether the memory is fine btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see, get_transposed() handles the memory deallocation inside of it. So no need for SG_UNREF :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it does ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea i got a bit confused. Since I store the transposed matrix always in train_feats, I need to deallocate the memory before I leave the function or else we leak. The TearDown will release the transposed matrix (which is not stored in train_feats)

{
initialize_train_and_test_and_ground_simple();
generate_data_l2_simple();
//transpose train_feats matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is useless ... super minor :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it :)

@karlnapf
Copy link
Member

karlnapf commented Apr 3, 2018

Ok cool! This is much better. Lets wait for the CI and then merge it!

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 3, 2018

@karlnapf I readded the SG_UNREFS before the get_transposed() since after rereading it, it doesn't handle the memory deallocation. The SG_UNREF for the transposed matrix will be called in TearDown so it should be okay now.

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 3, 2018

Last CI passed, let's wait for CI for now after the last commit and merge it :)

@karlnapf
Copy link
Member

karlnapf commented Apr 4, 2018

Did run valgrind to check the memory usage of this test? No leaks, no uninitialized reads?

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 4, 2018

@karlnapf
Valgrind says no leaks:

https://ibb.co/iLR5ux

@karlnapf
Copy link
Member

karlnapf commented Apr 4, 2018

Great!
Thanks so much for this clean-up. Very useful

@karlnapf karlnapf merged commit 63594a4 into shogun-toolbox:develop Apr 4, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
… to 470. (shogun-toolbox#4208)

* Reduced the test lines using a few design patterns
* Fixed the required changes discussed in pr shogun-toolbox#4208
* Remove LAPACK requirement since it wasn't needed
* added BSD license, removed transpose function and replaced it with get_transposed, and removed entirely two functions since they were unnecessary wrappers
* Fixed BSD authors
* Removed unnecessairy comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants