New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored the CLibLinear test codes and reduced them from 1300 lines to 470. #4208
Refactored the CLibLinear test codes and reduced them from 1300 lines to 470. #4208
Conversation
The Pull request seems to have duplicated the commits from the previous Pull request. The only files that should be changed here is the last one (LibLinear_unittest.cc). All the other commits are from other pull requests. I'll fix that in the next revision if any is needed. |
@FaroukY yeah i think you forked your branch |
SGMatrix<float64_t> train_matrix = train_feats->get_feature_matrix(); | ||
SGMatrix<float64_t>::transpose_matrix(train_matrix.matrix, | ||
train_matrix.num_rows, train_matrix.num_cols); | ||
CLibLinear* ll = new CLibLinear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use some in many of these cases, so you dont need to call SG_UNREF :)
for example:
auto ll = some<CLibLinear>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SG_UNREF(train_feats); | ||
CBinaryLabels* pred = NULL; | ||
float64_t liblin_accuracy; | ||
CContingencyTableEvaluation* eval = new CContingencyTableEvaluation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above:
auto eval = some<CContingencyTableEvaluation>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ground_truth = new CBinaryLabels(labels); | ||
ll->set_liblinear_solver_type(liblinear_solver_type); | ||
ll->train(); | ||
pred = ll->apply_binary(test_feats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be simply
auto pred = ll->apply_binary(test_feats);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then you can remove the declaration of CBinaryLabels* pred = NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
labels[i] = (i < num_samples/2) ? 1.0 : -1.0; | ||
} | ||
CLibLinear* ll = new CLibLinear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above... this code shares quite a lot of lines with train_with_solver
... we could combine them :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about this, but when I did it, it had a lot of if statements and it wasn't very readable (for a test case). Are we sure we wanna do this?
*/ | ||
|
||
index_t num_samples = 50; | ||
CMath::init_random(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use sg_rand->set_seed
plz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
#include <shogun/classifier/svm/LibLinear.h> | ||
#include <shogun/features/DataGenerator.h> | ||
#include <shogun/features/DenseFeatures.h> | ||
#include <shogun/evaluation/ContingencyTableEvaluation.h> | ||
#include <shogun/mathematics/Math.h> | ||
#include <gtest/gtest.h> | ||
#include <map> | ||
#include <string> | ||
#include <vector> | ||
|
||
using namespace shogun; | ||
|
||
#ifdef HAVE_LAPACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this can go as this was required for the gaussian blob generator, but that's not LAPACK only anymore...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the string and vector in the tests, so I will only delete , is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lapack can still go imo, no?
This needs a rebase as I fixed some missing SG_REF recently. |
@karlnapf Going through the diff, the only difference I see in this file is the SG_REF(pred). Is that safe to assume, or am I missing another SG_REF? |
505f6ab
to
9220e42
Compare
Okay, So I verified it, rebased, and addressed the problems |
@FaroukY ok in order to be able to merge this you need to remove the unrelated commits in this PR |
c4fc6c4
to
c7a4a20
Compare
@vigsterkr Removed all the unrelated commits. This should be okay to merge now. |
SG_UNREF(test_feats); | ||
SG_UNREF(ground_truth); | ||
} | ||
//Helper that tests can call to drastically reduce code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the comment and let the code speak for itself :) minor
The liblinear test segfaults under clang. Pls check those things yourself next time, it is all there.... |
Windows build passed, but that is since it doesnt have LAPACK installed and you have the test still guarded so it is not executed |
@karlnapf @vigsterkr Got rid of the memory leak. It seems like Travis timed out on two jobs, but the remaining ones passed. Any ideas? |
clang passed so the tests executed in there were fine. good. I think this is OK to be merged soon. |
} | ||
void generate_data_l1_simple() | ||
{ | ||
generate_data_simple("L1_SIMPLE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not do this string argument to determine the behaviour of the function. Just split it a bit further into helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it and simplified the functions with extra helpers
{ | ||
generate_data_simple("L2_SIMPLE"); | ||
} | ||
void generate_data(std::string type) //Type either "L1" or "L2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as said this big function should be split into multiple helpers and pls not string arguments to determine behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SG_UNREF(pred); | ||
} | ||
SGMatrix<float64_t> data = | ||
CDataGenerator::generate_gaussians(num_samples, 2, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you pls check whether this class is guarded with lapack, and if not, remove the lapack guard around the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't use lapack, so I removed the guard.
TEST(LibLinear,train_L2R_L2LOSS_SVC) | ||
{ | ||
LIBLINEAR_SOLVER_TYPE liblinear_solver_type = L2R_L2LOSS_SVC; | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the formatting of this comment is weird. Also the comment is superflous: if you name your variables nicely (they are) then the code is self explaining and we dont need this comment as it doesnt add any information
We have to transpose the data if its l2. If it is l1, then leave it as it is (Since this is the data of l1 originally) | ||
*/ | ||
index_t num_samples = 10; | ||
std::vector<int32_t> x{0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can initialise SGVector in the same way btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I replaced vector with SGVector and removed both includes for string and vector
if (type=="L1_SIMPLE") | ||
train_data(x[i],y[i])=z[i]; | ||
else | ||
train_data(y[i], x[i])=z[i]; //transpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite cryptic, why not do a transpose_matrix call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I replaced this with transpose_matrix after the training data is filled in.
SGMatrix<float64_t>::transpose_matrix(train_matrix.matrix, | ||
train_matrix.num_rows, train_matrix.num_cols); | ||
|
||
SG_UNREF(train_feats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removing the old one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also CDenseFeatures<ST>::get_transposed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you dont need an own method for this, just use the above call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the transpose_feats_matrix() method and replaced it with the calls with get_transposed().
ll->train(); | ||
pred = ll->apply_binary(test_feats); | ||
SG_REF(pred); | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: could you make those 3 lines one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which 3 lines. Do you mean the if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment ... but it is minor
void initialize_train_and_test_and_ground_simple() | ||
{ | ||
SGMatrix<float64_t> train_data(2, 10); | ||
SGMatrix<float64_t> test_data(2, 10); //Always 2x10, doesn't matter l1 or l2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove this comment, it doesnt help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
void generate_data_l1_simple() | ||
{ | ||
initialize_train_and_test_and_ground_simple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about removing the _simple suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already initialize_train_and_test_and_ground() for the normal data, so I used the _simple suffix to indicate this is the simple data initializer (which is entirely different from the non simple one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I agree, I got rid of both initialize_train_and_test_and_ground and initialize_train_and_test_and_ground_simple. Thanks!
@@ -6,7 +6,6 @@ | |||
* | |||
* Written (W) 2014 pl8787 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we are at it, could you make the license BSD? just copy it from anothoer file and maintain the old authors (add yourself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed it to BSD license.
ll->set_bias_enabled(biasEnable); | ||
ll->set_features(train_feats); | ||
if (C_value) | ||
ll->set_C(0.1,0.1); //Only in the case of L2R_L1LOSS_SVC_DUAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment doesnt help.
Also formatting issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
else | ||
{ | ||
for(int i=0;i<t_w.vlen;i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto i : range(t_w.vlen))
... but this is minor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it.
liblin_accuracy = eval->evaluate(pred, ground_truth); | ||
SG_REF(ground_truth); | ||
} | ||
void generate_data_l1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this wrapper, you could just call the thing inside :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
void generate_data_l2() | ||
{ | ||
initialize_train_and_test_and_ground(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the wrapper doesnt help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, did some renaming to get rid of initialize_train_and_test_and_ground() and initialize_train_and_test_and_ground_simple()
} | ||
void generate_data_l2_simple() | ||
{ | ||
initialize_train_and_test_and_ground_simple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting is all over the place here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
|
||
TEST(LibLinear,simple_set_train_L2R_LR) | ||
/* | ||
* -------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! One more round and this should be ready!
Thanks!
…t_transposed, and removed entirely two functions since they were unnecessairy wrappers
@karlnapf Done. |
* Written (W) 2014 pl8787 | ||
* Authors: | ||
* Written (W) 2014 pl8787 | ||
* Refactored (R) 2018 Elfarouk Yasser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add yourself as author in the standard way...makes later processing of headers easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
//Transpose the train_feats matrix | ||
auto old_train_feats = train_feats; | ||
train_feats = train_feats->get_transposed(); | ||
SG_UNREF(old_train_feats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the unref here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf Doesn't get_transposed() create a new transposed matrix? So I'm just deallocating the old matrix (that i transposed)? Or does get_transposed() act in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would just delete the old one from the same scope that you created it in (tearDown or whatever), but this is also minor.
Make sure to run the test with valgrind to see whether the memory is fine btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see, get_transposed() handles the memory deallocation inside of it. So no need for SG_UNREF :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it does ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea i got a bit confused. Since I store the transposed matrix always in train_feats, I need to deallocate the memory before I leave the function or else we leak. The TearDown will release the transposed matrix (which is not stored in train_feats)
{ | ||
initialize_train_and_test_and_ground_simple(); | ||
generate_data_l2_simple(); | ||
//transpose train_feats matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is useless ... super minor :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it :)
…o need for SG_REF
Ok cool! This is much better. Lets wait for the CI and then merge it! |
@karlnapf I readded the SG_UNREFS before the get_transposed() since after rereading it, it doesn't handle the memory deallocation. The SG_UNREF for the transposed matrix will be called in TearDown so it should be okay now. |
Last CI passed, let's wait for CI for now after the last commit and merge it :) |
Did run |
@karlnapf |
Great! |
… to 470. (shogun-toolbox#4208) * Reduced the test lines using a few design patterns * Fixed the required changes discussed in pr shogun-toolbox#4208 * Remove LAPACK requirement since it wasn't needed * added BSD license, removed transpose function and replaced it with get_transposed, and removed entirely two functions since they were unnecessary wrappers * Fixed BSD authors * Removed unnecessairy comments
Regarding issue #4207.