Refactor GridSearch and parameter tree #4598

gf712 · 2019-03-31T14:35:15Z

Proposed API (in python):

OPTION 1: change nested object parameter values with a string

import numpy as np   
import shogun as sg   
kernel = sg.kernel("GaussianKernel")   
svm = sg.machine("LibSVM", kernel=kernel)   
ps = sg.GridParameters(svm) 
ps.attach("kernel::log_width", np.array([1., 2.])) 
ps.attach("kernel::combined_kernel_weight", np.array([1., 2.])) 
ps.attach("C1", np.array([1., 2.])) 
ps.attach("C2", np.array([1., 2.])) 
for x in range(16):
    print(ps.next_combination())

OPTION 2: change nested object parameters by manipulating the nested node directly, before adding it to the tree

import numpy as np   
import shogun as sg   
kernel = sg.kernel("GaussianKernel")   
svm = sg.machine("LibSVM")
ps = sg.GridParameters(svm) 
kernel1 = sg.kernel("PolyKernel") 
node1 = sg.GridParameters(kernel1) 
node1.attach("degree", np.array([1.,2,3])) 
ps.attach("kernel", node1) 
ps.attach("C1", np.array([1., 2.])) 
ps.attach("C2", np.array([1., 2.]))
for x in range(12):
    print(ps.next_combination())

OPTION 1 and 2: combine both options, but option 1 becomes unavailable when there are several nodes representing a single object, i.e. kernel=[GaussianKernel, PolyKernel]

import numpy as np 
import shogun as sg 

kernel1 = sg.kernel("GaussianKernel")  
kernel2 = sg.kernel("PolyKernel") 
svm = sg.machine("LibSVM", kernel=kernel1)

node = sg.GridParameters(svm)
node.attach("kernel::log_width", np.array([1.]))
node.attach("C1", np.array([1., 2., 3.])) 
node.attach("C2", np.array([1., 2., 3.])) 


node2=sg.GridParameters(kernel2)
node2.attach("degree", np.array([1.,2.]))

node.attach("kernel", node2)

for x in range(20):
    print(node.next_combination())

implement option 1
implement option 2
implement GridSearch engine

karlnapf · 2019-04-01T14:10:58Z

Would it be possible to merge GridParameters and GridSearch?

gf712 · 2019-04-01T14:12:35Z

Working on that bit! But yes that would work, I called it ModelSelectionTree for now

gf712 · 2019-04-01T14:15:35Z

src/shogun/base/SGObject.h

+
+	private:
+		std::stringstream* m_stream;
+	};


Reminder to self, once #4594 is done can put this back into place and use SGVector to_string method

gf712 · 2019-04-01T14:16:23Z

src/shogun/modelselection/parameter_tree.h

+		 * @param ss
+		 * @param visitor
+		 */
+		void to_string_helper(


and when #4594 is done also refactor this

karlnapf · 2019-04-01T14:19:45Z

Working on that bit! But yes that would work, I called it ModelSelectionTree for now

Actually I think GridSearch is pretty good name wise :) No need to make "tree" part if it imo

karlnapf · 2019-04-01T14:23:22Z

I think you forgot to update the gpl submodule ...

gf712 · 2019-04-01T14:24:14Z

@karlnapf
this is how I see it working for now:

import numpy as np 
import shogun as sg 
kernel = sg.kernel("GaussianKernel") 
svm = sg.machine("LibSVM", kernel=kernel) 
ps = sg.ModelSelectionTree(svm) 
ps.attach("GaussianKernel::log_width", np.array([1., 2., 3.])) 
ps.attach("C1", np.array([1., 2., 3.])) 
ps.next_combination()
ps.next_combination()
...

And next_combination is called within GridSearch, so have:

import numpy as np 
import shogun as sg 
kernel = sg.kernel("GaussianKernel") 
svm = sg.machine("LibSVM", kernel=kernel) 
gs = sg.GridSearch(svm) 
gs.attach("GaussianKernel::log_width", np.array([1., 2., 3.])) 
gs.attach("C1", np.array([1., 2., 3.]))
gs.train(...)

Working on the second part now where multiple kernels could be added

karlnapf · 2019-04-01T14:51:48Z

src/interfaces/swig/ModelSelection.i

@@ -10,6 +10,8 @@
 %newobject CParameterCombination::leaf_sets_multiplication();
 %newobject CModelSelectionParameters::get_combinations();
 %newobject CModelSelectionParameters::get_single_combination();
+%newobject ParameterNode::attach();


I get that we need attach, but why would we need to call next_combination from swig?

that's just for debugging right now

karlnapf · 2019-04-01T14:52:11Z

src/interfaces/swig/shogun.i

@@ -300,5 +300,17 @@ PUT_ADD(CTokenizer)
 %template(kernel) kernel<float64_t, float64_t>;
 %template(features) features<float64_t>;

+%template(attach) ParameterNode::attach<int32_t>;


best way to test this is to add a meta example

karlnapf · 2019-04-01T14:53:55Z

src/shogun/lib/type_case.h

-	struct is_sg_matrix<T> : public std::true_type                             \
-	{                                                                          \
-	};
+	#define SG_ADD_TYPE(T, type_)                                                  \


maybe do this into a separate PR?

karlnapf · 2019-04-01T14:54:49Z

src/shogun/modelselection/parameter_tree.cpp

+	m_current_node = m_nodes.begin();
+}
+
+void ModelSelectionTree::reset()


what is that needed for?

resets the internal state of a node, so that it goes back from the beginning when called from the parent node.

karlnapf · 2019-04-01T14:55:33Z

src/shogun/modelselection/parameter_tree.cpp

+
+		tree->attach(
+				param, *any_cast<decltype(val.begin())>(m_param_iter[param]));
+		SG_PRINT("CURRENT: %s, PARAM: %s\n", m_current_param.c_str(), param.c_str())


you can make those prints DEBUG and then leave them in actually, might be useful for future debugging?

yup, I'll do that in the end, I just don't want to set to debug level because then it becomes too verbose

karlnapf · 2019-04-01T14:56:54Z

src/shogun/modelselection/parameter_tree.h

+		 * @param name
+		 * @param node
+		 */
+		ParameterNode* attach(


I would hide most things from swig in here, e.g. this one

This will be needed to add another GridSearch instance

karlnapf · 2019-04-01T14:57:33Z

src/shogun/modelselection/parameter_tree.h

+		ParameterNode* attach(const std::string& param, T value)
+		{
+			if (!set_param_helper(param, make_any(value)))
+				SG_SERROR("Could not attach %s", param.c_str());


if this is user facing, maybe put some more infos in there? So if users chain multiple attachments, they know which one failed from the error message and see what has been built so far?

karlnapf · 2019-04-01T14:58:16Z

src/shogun/modelselection/parameter_tree.h

+namespace shogun
+{
+
+	class ParameterNode : public CSGObject


I wonder ... does this need to be a subclass of SGObject? Would it ever need to be serialized? Would it ever need to be cloned/equal? Probably not or?

karlnapf · 2019-04-01T14:59:34Z

src/shogun/modelselection/parameter_tree.h

+namespace shogun
+{
+
+	class ParameterNode : public CSGObject


It would be good to have some kind of high level explain of the parts involved in here and how they play together, in some doxygen comment. For future development of the stuff. Not now, but definitely later

karlnapf

Say we wanted to do distributed grid searches. Would this design be more or less compatible? What about multi-core?

karlnapf

I think an example would be quite nice :)

Really cool that you pulled this off so quickly!

gf712 · 2019-04-01T15:03:15Z

I will have to go through the reference to check how smart pointers and stl containers handle race conditions, but I think it should be fine

karlnapf · 2019-04-01T15:07:12Z

Also in terms of instantiating the combinations.... I would imagine that a central generator creates combinations which are then sent to workers to solve, and in particular multiple combinations need to be instantiated before it is clear which ones came back yet.... but ok, let's think about that later :)

vigsterkr · 2019-04-01T18:59:23Z

src/shogun/base/SGObject.h

@@ -613,6 +617,9 @@ class CSGObject
 	 */
 #ifndef SWIG // SWIG should skip this part
 	std::map<std::string, std::shared_ptr<const AnyParameter>> get_params() const;
+
+	std::map<std::string, std::shared_ptr<const AnyParameter>> get_params(ParameterProperties) const;


what about having a default value that selects all.. i have this locally:
std::map<std::string, std::shared_ptr<const AnyParameter>> get_params(const ParameterProperties& p = ParameterProperties::ALL) const;

yup, once that is merged I'll rebase it. this will probably still take a while until its done..

gf712 · 2019-04-01T20:00:50Z

Also in terms of instantiating the combinations.... I would imagine that a central generator creates combinations which are then sent to workers to solve, and in particular multiple combinations need to be instantiated before it is clear which ones came back yet.... but ok, let's think about that later :)

I guess we could keep a buffer that feeds the workers? So would always have a pool of at least N jobs left, if possible, where N >= number_of_workers.

karlnapf · 2019-04-13T16:44:04Z

src/shogun/modelselection/NewGridSearch.cpp

+			    eval_machine->get<CMachine*>("machine")->to_string().c_str())
+		}
+
+		// note that this may implicitly lock and unlock the machine


I think we should totally remove this concept of locking machines for now tbh. We could add that back later. It is also not something that would be sitting inside GridSearch, but rather in the cross-validation codes, since that is where the issue of precomputing stuff when re-training multiple times using the same parameters happens

karlnapf · 2019-04-13T16:44:50Z

src/shogun/modelselection/NewGridSearch.cpp

+		SG_UNREF(result);
+	}
+
+	if (verbose)


Would remove this verbose option. We can soon just make it observable ...

karlnapf · 2019-04-13T16:46:10Z

src/shogun/util/factory.h

@@ -312,9 +295,7 @@ namespace shogun
 	 */
 	CPipelineBuilder* pipeline()
 	{
-		auto result = new CPipelineBuilder();


this is also probably material for another PR, not this one :)

yup, I just needed to fix this because it was causing "false positive" memory leaks

karlnapf

I made a whole bunch of comments. Most things are minor, but there is the thing with cloning the objects that makes me worry a bit.

added 'new' keyword in meta language to give alternative to `wrap` for non CSGObject classes

MachineEvaluation is still leaking

karlnapf · 2019-04-18T10:15:58Z

examples/meta/src/model_selection/grid_search.sg

-c2_param[0] = 0.1
-c2_param[1] = 1
-c2_param[2] = 10
+RealVector c1_param([0.1, 1.0, 10.0])


stale · 2020-02-26T15:52:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gf712 · 2020-02-26T16:28:01Z

keeping it alive for another 180 days, might pick up during GSoC

gf712 · 2020-05-20T10:13:46Z

for reference here is a much simpler cartesian product implementation where we can just use visitor patterns (std::string becomes shogun::Any):

std::vector<std::vector<std::string>> cartesian_product(
    const std::vector<std::vector<std::string>>& inputs) {
    std::vector<std::vector<std::string>> result;
    if (std::all_of(inputs.begin(), inputs.end(), [](const auto& el) {return el.empty();})) {
        return result;
    }

    for (auto& el: inputs.front()) {
        result.push_back({el});
    }

    auto size = std::accumulate(inputs.begin(), inputs.end(), 1,
        [](const auto& lhs, const auto& rhs){
            return lhs * rhs.size();
    });

    std::vector<std::vector<std::string>> temp;
    temp.reserve(size);

    for (auto i = 1; i < inputs.size(); ++i) {
        temp.clear();
        for (const auto& e: result) {
            for (const auto& f: inputs[i]) {
                temp.emplace_back(e).push_back(f);
            }
        }
        result = std::move(temp);
    }
    return result;
}

int main()
{
    std::vector<std::string> a {"a1","a2","a3"};
    std::vector<std::string> b {"b1","b2","b3"};
    std::vector<std::string> c {"c1","c2"};

    auto result = cartesian_product({a, b, c});

    std::cout << "Result: \n";
    for (const auto& vec: result) {
        for (const auto& el: vec)
            std::cout << el << ',';
        std::cout << '\n';
    }
}

The generator style approach one is much nicer, but it is currently flawed and would need more fixing.

stale · 2020-11-16T12:02:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-11-23T15:56:42Z

This issue is now being closed due to a lack of activity. Feel free to reopen it.

gf712 force-pushed the parameter_search_refactor branch from 3495ea8 to a1bbf1d Compare April 1, 2019 14:13

gf712 commented Apr 1, 2019

View reviewed changes

karlnapf reviewed Apr 1, 2019

View reviewed changes

vigsterkr reviewed Apr 1, 2019

View reviewed changes

gf712 force-pushed the parameter_search_refactor branch 7 times, most recently from 2d49dfb to 4d2c9df Compare April 2, 2019 12:15

karlnapf reviewed Apr 13, 2019

View reviewed changes

gf712 and others added 14 commits April 17, 2019 11:42

first refactor draft [SKIP CI]

915d52d

option 2 draft [SKIP CI]

84cbd75

added tests

9562636

added 'new' keyword in meta language to give alternative to `wrap` for non CSGObject classes

fixed memory issue with m_parent [SKIP CI]

e26113a

fixed is_complete method [SKIP CI]

925c827

completed to_object method

d349579

grid search example [SKIP CI]

d685ca4

made template sgobject unique_ptr

f58f8b8

added copy constructor

4599ab0

fixed refcount issue

efa3464

fixed leaks in GridSearch

eeca389

MachineEvaluation is still leaking

mem stuff

e6d6deb

expanded example

3de8c28

replaced visitor pattern with sg_any_dispatch for string representation

a3d2d80

gf712 force-pushed the parameter_search_refactor branch from b354917 to a3d2d80 Compare April 17, 2019 12:10

list init meta example

76682b7

karlnapf reviewed Apr 18, 2019

View reviewed changes

stale bot added the stale label Feb 26, 2020

stale bot removed the stale label Feb 26, 2020

stale bot added the stale label Nov 16, 2020

stale bot closed this Nov 23, 2020

Refactor GridSearch and parameter tree #4598

Refactor GridSearch and parameter tree #4598

Conversation

gf712 commented Mar 31, 2019 • edited

karlnapf commented Apr 1, 2019

gf712 commented Apr 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf commented Apr 1, 2019

karlnapf commented Apr 1, 2019

gf712 commented Apr 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf left a comment

Choose a reason for hiding this comment

karlnapf left a comment

Choose a reason for hiding this comment

gf712 commented Apr 1, 2019

karlnapf commented Apr 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf712 commented Apr 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Feb 26, 2020

gf712 commented Feb 26, 2020

gf712 commented May 20, 2020 • edited

stale bot commented Nov 16, 2020

stale bot commented Nov 23, 2020

gf712 commented Mar 31, 2019 •

edited

gf712 commented May 20, 2020 •

edited