Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor linear machine api #4319

Closed
wants to merge 25 commits into from

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented May 31, 2018

[WIP]
Adapt machine api to fit + predict
This need some heavy refactor, so let's start from linear machines and iterate a few times.
The idea is to keep both old and new apis working and then remove old apis gradually.
train_machine(CFeatures*, CLabels*) is added to machines that need labels. The old method, train_machine(CFeatures*) will redirect to the new api and pass m_labels as the labels argument. In this way, the old api (set_labels + train(CFeatures*)) still works.

Roadmap

  • Add void fit(CFeatures*), void fit(CFeatures*, CLabels*) to CMachine
  • Redirect bool train_machine(CFeatures*) to void train_machine(CFeatures*, CLabels*) in LinearMachine
  • Calling new api in unittests
  • Move api redirection to LinearMachine as a base class method after the new api works for all LinearMachine subclasses
  • Make bool train_machine(CFeatures*) and bool train_machine_templated(CFeatures*) return void (The latter one should be easier as it is added last year and we haven't used in many places)

Known issues for moving to const methods:

  • ref counting
  • labels factory (e.g. regression_labels) doesn't accept const args

@@ -40,59 +32,51 @@ void CAveragedPerceptron::init()
SG_ADD(&learn_rate, "learn_rate", "Learning rate.", MS_AVAILABLE);
}

bool CAveragedPerceptron::train_machine(CFeatures* data)
void CAveragedPerceptron::train_machine(CFeatures* features, CLabels* labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we want this to be const parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course. unfortunately we can't use const for the time being because there are many non-const methods that should be const logically, e.g. get_feature_matrix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sure
(though get_feature_matrix can be made const without problem)

{
output[i] = features->dense_dot(i, w.vector, w.vlen) + bias;
output[i] = dot_features->dense_dot(i, w.vector, w.vlen) + bias;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this could be done via DotIterator

bias=tmp_bias/(num_vec*iter);

SG_FREE(output);
SG_FREE(tmp_w);

set_w(w);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer to remove this and udpate the state vector itself inside the main loop.
@shubham808 can elaborate as he is doing similar stuff for the perceptron

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long time ago this comment. But basically we want to start updating model states inside the loops for iterative machines

@shubham808 comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf I think set_w has been already called inside the iteration function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你是对的,对不起

{
ASSERT(m_labels)
if (!features->has_property(FP_DOT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not as? would be a bit cleaner. Training is expensive, so the additional costs shouldnt matter or?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can provide more informative error message here, but as is as good

@@ -66,6 +66,23 @@ bool CMachine::train(CFeatures* data)
return result;
}

void CMachine::fit(CFeatures* features)
{
REQUIRE(train(features), "Failed to fit machine %s\n", get_name());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should throw an exception rather than this boolean stuff?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require failed -> sg_io->msg(MSG_ERROR, "blabla") -> https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/io/SGIO.cpp#L125

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but I mean, shouldn't the train method itself just throw an exception?
We don't get any context information here. If train would throw an exception, we could catch it here and then say "training failed for reason X". No?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to just throw exceptions in fit. but for now train returns boolean, i keep this so that it is consistent that fit always throw exceptions when failure.

@vigsterkr vigsterkr force-pushed the feature/transformers branch 2 times, most recently from b7d2ff9 to f211b65 Compare June 8, 2018 14:15
@vigsterkr vigsterkr force-pushed the feature/transformers branch 3 times, most recently from a241ab1 to dbfd69e Compare July 12, 2018 16:02
@karlnapf
Copy link
Member

Hi!
Shall we pick this up again?

@vinx13
Copy link
Member Author

vinx13 commented Dec 12, 2018

@karlnapf I found there are too many things involved here. When you change the Machine base class, almost everything need to be updated. I would suggest start some small refactor first, such as Distance or Kernel. We may further discuss on irc.

@vinx13 vinx13 changed the base branch from feature/transformers to develop December 12, 2018 15:45
@vinx13 vinx13 force-pushed the feature/machine_api branch 3 times, most recently from a2078d9 to fb4930c Compare December 15, 2018 06:30

ASSERT(m_labels)
init_linear_term();
void CLibLinear::train_machine(CFeatures* features, CLabels* labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many non-const methods algorithm internal methods. We can either use non-const arguments here, or use const_cast to drop const in internal methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf One problem ref counting is IterativeMachine. We need to increase ref count of features and labels in init_model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah crap yes, the ref counting is non-const....
ok so we need to postpone the const making for now until we have another way to do the reference counting

@karlnapf
Copy link
Member

Nice! Let me know when we should have a look at this...

@karlnapf
Copy link
Member

I like the idea of porting old methods to the new nicer api using const casts. Although we might run into see trouble doing that.... @lisitsyn @iglesias @vigsterkr ?


virtual void CLPBoost::train_machine(CFeatures* features, CLabels* labels);
{
ASSERT(labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think handling those asserts happens (should happen?) in the base class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it should be in the base class. I will check if it is possible for now.

@@ -567,6 +567,20 @@ class CSGObject
demangled_type<T>().c_str());
return nullptr;
}

template <class T> const T* as() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are using this below I assume when changing the signatures of some methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful for const pointers (const CFeatures*). Although we decided to move to const methods later, I leave it here for possible future usage.

@karlnapf
Copy link
Member

karlnapf commented Jan 3, 2019

Cool some progress :) Let us know how you are getting on here

@vinx13 vinx13 force-pushed the feature/machine_api branch 2 times, most recently from eefbcf8 to 7caeb67 Compare January 7, 2019 16:43
@@ -24,4 +18,3 @@ CLeastSquaresRegression::CLeastSquaresRegression(CDenseFeatures<float64_t>* data
: CLinearRidgeRegression(0, data, lab)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf Got some linking issue

Undefined symbols for architecture x86_64:
2019-01-07T16:59:36.8868140Z   "bool shogun::CLinearRidgeRegression::train_machine_templated<double>(shogun::CDenseFeatures<double> const*)", referenced from:
2019-01-07T16:59:36.8911800Z       shogun::CDenseRealDispatch<shogun::CLinearRidgeRegression, shogun::CLinearMachine>::train_dense(shogun::CFeatures*) in LeastSquaresRegression.cpp.o
2019-01-07T16:59:36.9066190Z   "bool shogun::CLinearRidgeRegression::train_machine_templated<long double>(shogun::CDenseFeatures<long double> const*)", referenced from:
2019-01-07T16:59:36.9110490Z       shogun::CDenseRealDispatch<shogun::CLinearRidgeRegression, shogun::CLinearMachine>::train_dense(shogun::CFeatures*) in LeastSquaresRegression.cpp.o
2019-01-07T16:59:36.9263540Z   "bool shogun::CLinearRidgeRegression::train_machine_templated<float>(shogun::CDenseFeatures<float> const*)", referenced from:
2019-01-07T16:59:36.9306660Z       shogun::CDenseRealDispatch<shogun::CLinearRidgeRegression, shogun::CLinearMachine>::train_dense(shogun::CFeatures*) in LeastSquaresRegression.cpp.o

CLinearRidgeRegression::train_machine_templated is a template method, its definition is in LinearRidgeRegression.cpp and is not visible in this cpp file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Saurabh7 this would be something for you to check/figure out? I think we have seen and solved something similar before. Unless you have ideas @vinx13 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf I only saw this error on CI. I cannot reproduce this locally with docker image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I remember this issue somehow with the same reproducibility issues. Maybe @Saurabh7 remembers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need to explicit instantiate templates in cpp files to make it linkable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is solved.

Very Interesting PR btw ! So we are planning to change all APIs to .fit , .predict ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I actually meant to ping @shubham808 ;)
But yes @Saurabh7 we are intending to do that

@vinx13 vinx13 force-pushed the feature/machine_api branch 2 times, most recently from 779b11a to 29d3bb6 Compare January 18, 2019 13:08

SG_SERROR(
"Object of type %s cannot be converted to type %s.\n",
demangled_type<std::remove_pointer_t<decltype(this)>>().c_str(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw can we re-use code from the non-const version in here? Like one calls the other?

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 26, 2020
@gf712
Copy link
Member

gf712 commented Feb 26, 2020

keeping this alive as I think this is still the direction we want to go in

@stale stale bot removed the stale label Feb 26, 2020
@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 24, 2020
@karlnapf
Copy link
Member

bump

@stale stale bot removed the stale label Aug 28, 2020
@stale
Copy link

stale bot commented Feb 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 25, 2021
@stale
Copy link

stale bot commented Mar 4, 2021

This issue is now being closed due to a lack of activity. Feel free to reopen it.

@stale stale bot closed this Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants