New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GridSearch and parameter tree #4598
Conversation
Would it be possible to merge |
Working on that bit! But yes that would work, I called it ModelSelectionTree for now |
3495ea8
to
a1bbf1d
Compare
src/shogun/base/SGObject.h
Outdated
|
||
private: | ||
std::stringstream* m_stream; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to self, once #4594 is done can put this back into place and use SGVector to_string method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged :)
* @param ss | ||
* @param visitor | ||
*/ | ||
void to_string_helper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when #4594 is done also refactor this
Actually I think GridSearch is pretty good name wise :) No need to make "tree" part if it imo |
I think you forgot to update the gpl submodule ... |
@karlnapf import numpy as np
import shogun as sg
kernel = sg.kernel("GaussianKernel")
svm = sg.machine("LibSVM", kernel=kernel)
ps = sg.ModelSelectionTree(svm)
ps.attach("GaussianKernel::log_width", np.array([1., 2., 3.]))
ps.attach("C1", np.array([1., 2., 3.]))
ps.next_combination()
ps.next_combination()
... And import numpy as np
import shogun as sg
kernel = sg.kernel("GaussianKernel")
svm = sg.machine("LibSVM", kernel=kernel)
gs = sg.GridSearch(svm)
gs.attach("GaussianKernel::log_width", np.array([1., 2., 3.]))
gs.attach("C1", np.array([1., 2., 3.]))
gs.train(...) Working on the second part now where multiple kernels could be added |
@@ -10,6 +10,8 @@ | |||
%newobject CParameterCombination::leaf_sets_multiplication(); | |||
%newobject CModelSelectionParameters::get_combinations(); | |||
%newobject CModelSelectionParameters::get_single_combination(); | |||
%newobject ParameterNode::attach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that we need attach, but why would we need to call next_combination
from swig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just for debugging right now
@@ -300,5 +300,17 @@ PUT_ADD(CTokenizer) | |||
%template(kernel) kernel<float64_t, float64_t>; | |||
%template(features) features<float64_t>; | |||
|
|||
%template(attach) ParameterNode::attach<int32_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best way to test this is to add a meta example
struct is_sg_matrix<T> : public std::true_type \ | ||
{ \ | ||
}; | ||
#define SG_ADD_TYPE(T, type_) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do this into a separate PR?
m_current_node = m_nodes.begin(); | ||
} | ||
|
||
void ModelSelectionTree::reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is that needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resets the internal state of a node, so that it goes back from the beginning when called from the parent node.
|
||
tree->attach( | ||
param, *any_cast<decltype(val.begin())>(m_param_iter[param])); | ||
SG_PRINT("CURRENT: %s, PARAM: %s\n", m_current_param.c_str(), param.c_str()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make those prints DEBUG and then leave them in actually, might be useful for future debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I'll do that in the end, I just don't want to set to debug level because then it becomes too verbose
* @param name | ||
* @param node | ||
*/ | ||
ParameterNode* attach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hide most things from swig in here, e.g. this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be needed to add another GridSearch instance
ParameterNode* attach(const std::string& param, T value) | ||
{ | ||
if (!set_param_helper(param, make_any(value))) | ||
SG_SERROR("Could not attach %s", param.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is user facing, maybe put some more infos in there? So if users chain multiple attachments, they know which one failed from the error message and see what has been built so far?
namespace shogun | ||
{ | ||
|
||
class ParameterNode : public CSGObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder ... does this need to be a subclass of SGObject? Would it ever need to be serialized? Would it ever need to be cloned/equal? Probably not or?
namespace shogun | ||
{ | ||
|
||
class ParameterNode : public CSGObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some kind of high level explain of the parts involved in here and how they play together, in some doxygen comment. For future development of the stuff. Not now, but definitely later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say we wanted to do distributed grid searches. Would this design be more or less compatible? What about multi-core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an example would be quite nice :)
Really cool that you pulled this off so quickly!
I will have to go through the reference to check how smart pointers and stl containers handle race conditions, but I think it should be fine |
Also in terms of instantiating the combinations.... I would imagine that a central generator creates combinations which are then sent to workers to solve, and in particular multiple combinations need to be instantiated before it is clear which ones came back yet.... but ok, let's think about that later :) |
@@ -613,6 +617,9 @@ class CSGObject | |||
*/ | |||
#ifndef SWIG // SWIG should skip this part | |||
std::map<std::string, std::shared_ptr<const AnyParameter>> get_params() const; | |||
|
|||
std::map<std::string, std::shared_ptr<const AnyParameter>> get_params(ParameterProperties) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about having a default value that selects all.. i have this locally:
std::map<std::string, std::shared_ptr<const AnyParameter>> get_params(const ParameterProperties& p = ParameterProperties::ALL) const;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, once that is merged I'll rebase it. this will probably still take a while until its done..
I guess we could keep a buffer that feeds the workers? So would always have a pool of at least N jobs left, if possible, where N >= number_of_workers. |
2d49dfb
to
4d2c9df
Compare
eval_machine->get<CMachine*>("machine")->to_string().c_str()) | ||
} | ||
|
||
// note that this may implicitly lock and unlock the machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should totally remove this concept of locking machines for now tbh. We could add that back later. It is also not something that would be sitting inside GridSearch, but rather in the cross-validation codes, since that is where the issue of precomputing stuff when re-training multiple times using the same parameters happens
SG_UNREF(result); | ||
} | ||
|
||
if (verbose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would remove this verbose option. We can soon just make it observable ...
src/shogun/util/factory.h
Outdated
@@ -312,9 +295,7 @@ namespace shogun | |||
*/ | |||
CPipelineBuilder* pipeline() | |||
{ | |||
auto result = new CPipelineBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also probably material for another PR, not this one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I just needed to fix this because it was causing "false positive" memory leaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a whole bunch of comments. Most things are minor, but there is the thing with cloning the objects that makes me worry a bit.
added 'new' keyword in meta language to give alternative to `wrap` for non CSGObject classes
MachineEvaluation is still leaking
b354917
to
a3d2d80
Compare
c2_param[0] = 0.1 | ||
c2_param[1] = 1 | ||
c2_param[2] = 10 | ||
RealVector c1_param([0.1, 1.0, 10.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
keeping it alive for another 180 days, might pick up during GSoC |
for reference here is a much simpler cartesian product implementation where we can just use visitor patterns ( std::vector<std::vector<std::string>> cartesian_product(
const std::vector<std::vector<std::string>>& inputs) {
std::vector<std::vector<std::string>> result;
if (std::all_of(inputs.begin(), inputs.end(), [](const auto& el) {return el.empty();})) {
return result;
}
for (auto& el: inputs.front()) {
result.push_back({el});
}
auto size = std::accumulate(inputs.begin(), inputs.end(), 1,
[](const auto& lhs, const auto& rhs){
return lhs * rhs.size();
});
std::vector<std::vector<std::string>> temp;
temp.reserve(size);
for (auto i = 1; i < inputs.size(); ++i) {
temp.clear();
for (const auto& e: result) {
for (const auto& f: inputs[i]) {
temp.emplace_back(e).push_back(f);
}
}
result = std::move(temp);
}
return result;
}
int main()
{
std::vector<std::string> a {"a1","a2","a3"};
std::vector<std::string> b {"b1","b2","b3"};
std::vector<std::string> c {"c1","c2"};
auto result = cartesian_product({a, b, c});
std::cout << "Result: \n";
for (const auto& vec: result) {
for (const auto& el: vec)
std::cout << el << ',';
std::cout << '\n';
}
} The generator style approach one is much nicer, but it is currently flawed and would need more fixing. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is now being closed due to a lack of activity. Feel free to reopen it. |
Proposed API (in python):
OPTION 1: change nested object parameter values with a string
OPTION 2: change nested object parameters by manipulating the nested node directly, before adding it to the tree
OPTION 1 and 2: combine both options, but option 1 becomes unavailable when there are several nodes representing a single object, i.e. kernel=[GaussianKernel, PolyKernel]