Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shogun base classes #2880

Open
karlnapf opened this issue Aug 3, 2015 · 27 comments
Open

Shogun base classes #2880

karlnapf opened this issue Aug 3, 2015 · 27 comments

Comments

@karlnapf
Copy link
Member

karlnapf commented Aug 3, 2015

Shall we add another one in the layer

  1. A minimal base class CSGObject that just allows for reference counting and all the basic Shogun stuff.
  2. A second one CSerializableObject that adds all the parameter stuff and inherits from the first?

I guess this could minimize Shoguns overhead quite a bit (many classes dont use parameters)

@yorkerlin @lisitsyn @sonney2k @vigsterkr @lambday @iglesias @tklein23 @besser82 what are your thoughts?

@karlnapf
Copy link
Member Author

karlnapf commented Aug 3, 2015

Could also add another layer that provides all the sg_* fields

@lisitsyn
Copy link
Member

lisitsyn commented Aug 3, 2015

We would have to use multiple inheritance, it could be PITA you know :)

@karlnapf
Copy link
Member Author

karlnapf commented Aug 5, 2015

Why multiple inheritance? That's not my idea.

If the base classes are a hierarchy, one could just inherit from a certain stage in there?

@sonney2k
Copy link
Member

sonney2k commented Aug 5, 2015

I did add SGReferencedData (or so) at some point that did exactly that.
I guess it did get removed for some reason by someone who knows why that
was maybe not a good idea?

On Wed, 2015-08-05 at 01:43 -0700, Heiko Strathmann wrote:

Why multiple inheritance? That's not my idea.

If the base classes are a hierarchy, one could just inherit from a
certain stage in there?


Reply to this email directly or view it on GitHub.

@karlnapf
Copy link
Member Author

karlnapf commented Aug 5, 2015

Thats very similar, but for the SGMatrix, SGVector classes.
Exactly that type of thing is what I want to do for the base class CSGObject.

@karlnapf
Copy link
Member Author

karlnapf commented Aug 5, 2015

What about this:

CReferencedObject -> CSerialisableObject -> CSGObject

Where the first one contains only the reference counting, the second adds the parameters/serialisation, and the third adds all the Shogun stuff

@sonney2k
Copy link
Member

sonney2k commented Aug 5, 2015

checkout

9305aa9

and look at src/shogun/base/SGRefObject.h

On Wed, 2015-08-05 at 04:12 -0700, Heiko Strathmann wrote:

What about this:

CReferencedObject -> CSerialisableObject -> CSGObject

Where the first one contains only the reference counting, the second adds the parameters/serialisation, and the third adds all the Shogun stuff


Reply to this email directly or view it on GitHub:
#2880 (comment)

@lisitsyn
Copy link
Member

lisitsyn commented Aug 5, 2015

@karlnapf okay this works. Though we don't really know whether referenced is always serializable or serializable is always referenced :)

@karlnapf
Copy link
Member Author

karlnapf commented Aug 6, 2015

I see now. This is good enough for @yorkerlin s patch. Thanks I did not know

@iglesias
Copy link
Collaborator

The idea makes sense to me, and I like it because it would separate in each of the classes Heiko has mentioned above reference counting, serialisation, and rest of Shogun stuff (although I am not sure what this rest would be :-)

I am not entirely sure however how much overhead it would reduce. What overhead are we talking about? SWIG or memory footprint per SGObject?

Another (very) wild idea somewhat related would be to use smart pointers as they are now part of the default C++ compiler. Then we could get rid of our own reference counting.

@karlnapf
Copy link
Member Author

yeah smart pointers would be best.

@lisitsyn what s your take on all this?

  • making the base class overhead much smaller
  • serialisation
  • reference counting?

@karlnapf
Copy link
Member Author

@sonney2k why was the SGRefObject removed?

@sonney2k
Copy link
Member

Do not remember. Git logs say that Thorsten removed it. "Obsolete ".

On August 10, 2015 11:30:21 PM GMT+02:00, Heiko Strathmann notifications@github.com wrote:

@sonney2k why was the SGRefObject removed?


Reply to this email directly or view it on GitHub:
#2880 (comment)

Sent from Kaiten Mail. Please excuse my brevity.

@iglesias
Copy link
Collaborator

The class was removed during the last summit when we were reducing SWIG overhead, in terms of memory used to compile or, equivalently, the size of the (huge) file SWIG creates. Before it was removed, it looked like this SGRefObject -> CSGObject, and all the classes were inheriting from CSGObject, not from SGRefObject.

Although it should be possible to use SGRefOject for what you have mentioned above, it was not used like that. It was just pulling out the reference counting logic from CSGObject. Thoralf removed it to reduce the number of classes (by one, in this case) that are exposed via SWIG.

@iglesias
Copy link
Collaborator

#2581

@lisitsyn
Copy link
Member

@karlnapf

making the base class overhead much smaller

I like the principle, although it is not really clear for me what to remove.

serialisation

I think it should stay as long as we are rare ML library with this feature.

reference counting?

I believe it should be done via shared pointers.

@karlnapf
Copy link
Member Author

I dont get the point of removing the class. We could just hide it from SWIG?
If one class contains what two classes used to contain, nothing is really gained. We could revert that?

@lisitsyn

  • remove migration at least
  • cant we use a library for serialisation? why re-invent the wheel there (and we do it badly, its slow as f***)
  • +1 for shared pointers

@iglesias
Copy link
Collaborator

There is more dicussion about it here #1853 #1764.

Reverting is easy peasy with git revert #2883. But I suggest not to merge this until we have checked the above pull requests.

@iglesias
Copy link
Collaborator

Hmm, it seems that SGRefObject was not introduced due to what @sonney2k pointed out above: #1770. Still, I believe it should be possible to use it for what we were talking about.

@iglesias
Copy link
Collaborator

This is the pr where it was merged #1771.

@yorkerlin
Copy link
Member

@lisitsyn
@lambday
Do you guys plan to rewrite SGObject?
Do we use the existing parameter selection (model selection) framework? The existing model selection framework is not flexible.
Currently, the existing framework searches for all model parameters since class Parameter only supports add but does not support remove

The existing framework only supports:

  • const/fixed parameters
  • model parameters (need to be "optimized" in model selection)

I want the framework supports variational parameters (need to be optimized but are not model parameters)

In future, I may have to extend the framework in the following ways:

  • Enable/Disable a specific model parameter in model selection ( my first priority)
  • support variational parameters
  • support Bayesian model selection (eg, @karlnapf Bayesian optimization for hyper-parameters using GP)

@lambday
Copy link
Member

lambday commented Aug 23, 2015

Hi @yorkerlin . If we in fact happen to rewrite SGObject, probably quite a few things would change. @lisitsyn proposed having a class Property for each parameter which fits nicely and intuitively for model selection. In this scenario, I think we'd be able to implement the enable/disable feature you're suggesting quite easily.

I am unaware of variational parameters ATM. If they are expected to be optimized differently, maybe we can have a separate class for them and provide its functionality accordingly.

@yorkerlin
Copy link
Member

@lambday
Cool!
variational parameters are auxiliary variables.

For batch inference,

  • step1: given model parameters \theta^{t}
  • step2: we usually optimize variational parameters until converge given \theta^{t} are fixed
  • step3: then update model parameters \theta^{t+1} and go back to step 1 until some conditions meet

For stochastic inference,We can update model parameters and variational parameters at the same time.

  • step1: draw sample data point(s)
  • step2: update model parameters and variational parameters and go back to step 1 until some conditions meet

For stochastic inference, variaitonal parameters can be viewed as model parameters.
Enable/Disable some parameters will be the key!

@yorkerlin
Copy link
Member

CMap does not support serialization. However, CMap is a subclass of SGObject.
At least I want CStringMap (that is CMap<std::string, SGVector<float64_t> >) is serialable.

ref
#2903

@lambday
Copy link
Member

lambday commented Aug 25, 2015

@yorkerlin from the requirements, going by policy based design for the parameters sound reasonable to me, where the policies are different optimization policies. So laying down the requirements

  • We can use Parameter or Property class for normal/fixed parameters. Present TParameter and Parameter can be an inspiration for that. Getting rid of the ADD things would be nice. We don't have to specify MS_AVAILABLE or MS_NOT_AVAILABLE every time.
  • We need another ModelParameter class for parameters that participate in model selection. This can be a policy based class with default policy being the present model selection scenario. For variational parameters we'll use a separate policy implementing the optimization technique for them as required. ModelParameter would have enable/disable feature. In general I like policies more since they go well with combinatorial scenario without having to enforce a is-a-relationship .
  • We'll use pimpl [1][2] pattern the classes which helps the software be backward compatible and also reduce significant compilation time. Header inclusion should be minimal and we have to take that extra care. All the data would go inside the Impl classes and eventually will be saved inside the parameter_map in SGObjectImpl. Getting rid of unnecessary setters/getters would also be nice [3][4][5]

[1] http://www.gotw.ca/publications/mill04.htm
[2] http://www.gotw.ca/publications/mill05.htm
[3] http://stackoverflow.com/questions/8447972/how-to-combine-auto-gettersetter-with-pimpl-design-pattern-in-a-public-api-inte#
[4] http://www.idinews.com/quasiClass.pdf
[5] http://www.javaworld.com/article/2073723/core-java/why-getter-and-setter-methods-are-evil.html

@yorkerlin
Copy link
Member

@lambday
@lisitsyn
some old issues related to this
#1251
#779

@lambday
Copy link
Member

lambday commented Aug 31, 2015

Thanks @yorkerlin for tagging the related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants