Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tree algorithms to be serializable #4242

Closed
karlnapf opened this issue Apr 13, 2018 · 4 comments
Closed

Refactor tree algorithms to be serializable #4242

karlnapf opened this issue Apr 13, 2018 · 4 comments

Comments

@karlnapf
Copy link
Member

karlnapf commented Apr 13, 2018

Problem

The current implementation of tree algorithms are based on a class CTreeMachineNode, which is templated, with a templated field T data to carry tree node data for the various algorithms The template argument is then set to custom data structures, like CARTreeNodeData for the CARTree algorithm.

While this is reasonable design, unfortunately, Shogun template classes only can be set to basic types, such as float64_t, int32_t, .... If this (unwritten) rule is violated, then something fundamental stops working:

Creating an empty object via class_list.cpp (this is decribed in #3481 ) will be impossible. The class list only supports basic template types (those defined by EPrimitiveType) and not custom data structures. Even worse, creating empty objects is needed for both cloning and serializing objects. This arises from the current design of Shogun at a very low level.

Workaround

A simple way to fix the problem, which requires only a bit of refactoring (unfortunately, all tree algorithms will need to be touched):

Instead of templating the tree nodes, we can rather use OOP and subclasses/virtual methods to represent nodes that behave the same but carry different data. Instead of a template member field T data, we can introduce a field/class CTreeNodeData* data. This class should inherit from CSGObject, register parameters as usual. All the tree node data structs then would be converted into subclasses (with public member field). This makes the algorithm refactoring straight-forward.

This is big patch, however, conceptually quite simple. It is actually a cool exercise for GSoC students to learn a lot about some of the internals of shogun.

@karlnapf
Copy link
Member Author

Actually this is quite high priority, so we will try to get this fixed asap

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 26, 2020
@gf712
Copy link
Member

gf712 commented Feb 26, 2020

@karlnapf @vigsterkr is this still an issue with the new serialisation framework?

@stale stale bot removed the stale label Feb 26, 2020
@vigsterkr
Copy link
Member

it's done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants