New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String enum mapping #4537
String enum mapping #4537
Conversation
a8ecb82
to
0ff60d7
Compare
Minor: It would be cool if the visitor pattern that prints the instances would also look up the string for the enum so that users would see that rather than the int |
Same with get actually, so that the enums would be strings only in the interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! I made a few comments on the user perspective before doing a review. Think it only needs minor tweaking
src/interfaces/swig/SGBase.i
Outdated
|
||
auto enum_to_string_map = $self->get_enum_to_string_map(); | ||
|
||
if (enum_to_string_map.find($self->get_name()) == enum_to_string_map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I’d remove this and just print and error if the requested parameter is not an enum type
@@ -423,6 +423,27 @@ namespace shogun | |||
SG_SERROR("There is no parameter called '%s' in %s", name.c_str(), $self->get_name()); | |||
} | |||
|
|||
std::vector<std::string> param_options(const std::string& name) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems good like this
src/shogun/base/SGObject.h
Outdated
@@ -49,6 +50,8 @@ struct TParameter; | |||
template <class T> class DynArray; | |||
template <class T> class SGStringList; | |||
|
|||
using stringToEnumMapType = std::unordered_map<std::string, std::unordered_map<std::string, std::unordered_map<std::string, machine_int_t>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
src/shogun/base/SGObject.h
Outdated
template <class T> | ||
void put(const std::string& name, const std::string& value) noexcept(false) | ||
{ | ||
if (m_enum_to_string_map.find(get_name()) == m_enum_to_string_map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would remove and just keep the below error. Same as above, only confuses users, it’s more an internal thing
src/shogun/base/SGObject.h
Outdated
auto string_to_enum = string_to_enum_map[name]; | ||
|
||
if (string_to_enum.find(value) == string_to_enum.end()) | ||
SG_ERROR("There is no enum mapping for '%s' in %s::%s", value, get_name(), name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this further: I wouldnt even talk about enum mappings, but just “options”. Users don’t need to know that there is an underlying enum
(machine_int_t*)&m_liblinear_regression_type, | ||
"liblinear_regression_type", "Type of LibLinear regression."); | ||
m_enum_to_string_map[get_name()]["liblinear_regression_type"] = { | ||
{"L2R_L2LOSS_SVR", 0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the enum value here rather than the int to make it readable, and they are available here
SG_ADD( | ||
(machine_int_t*)&m_liblinear_regression_type, | ||
"liblinear_regression_type", "Type of LibLinear regression."); | ||
m_enum_to_string_map[get_name()]["liblinear_regression_type"] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since this happens at runtime now, we could rather have one options map for each class, stored locally, and then a method “add_option(string, int)” that is called this method here. Or even better
“add_options(string param, map<string, int>)” which then can be called by directly writing out the map as {“foo”: SOLVER::FOO, “bar”: SOLVER::BAR} in a single call
I didn’t realise yesterday that the map doesn’t need to be gobal nor static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good point, I didn't quite like the fact that there was this static variable around! So add_options(string param, map<string, int>)
just sits in SGObject, and then just need a mapping from param to string_enum_map, which makes it much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even: Do you think using a macro, we could avoid having to write the enum value name AND a string ?
Like
ADD_OPTION(param_name, FOO); which adds the mapping “FOO”->FOO where FOO is one option of the enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we’d have to write things one by one but on the other hand it avoids the redundancy of explicitly naming the option string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to pass the literal variable name in a macro yes. It's a shame there isn't a universal name demangling across compilers, would be quite useful in this case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, c++ lacks quite some things ... 😱
@@ -147,7 +147,7 @@ class CLibLinearRegression : public CLinearMachine | |||
protected: | |||
|
|||
/** train machine */ | |||
virtual bool train_machine(CFeatures* data = NULL); | |||
bool train_machine(CFeatures* data = NULL) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing using override and not using it causes warnings iirc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, but at some point would probably start adding override no? Doesn't it come at a cost when you just add more and more virtual class members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes at some point. I would just leave this for another or or the detox gsoc project
4fd2301
to
27017a1
Compare
src/interfaces/swig/SGBase.i
Outdated
|
||
if (param_to_enum_map.find(name) == param_to_enum_map.end()) | ||
{ | ||
SG_SERROR("There are no options for the parameter '%s'", name.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor: maybe " ..for parameter %s::%s" and print classname
src/shogun/base/SGObject.h
Outdated
@@ -100,6 +103,16 @@ template <class T> class SGStringList; | |||
|
|||
#define SG_ADD(...) VARARG(SG_ADD, __VA_ARGS__) | |||
|
|||
#define VALUE_TO_STRING_MACRO(s) #s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long live the preprocessor :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is so funny that this old school stuff still is necessary/useful in places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes unfortunately this is the simplest way..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like it might be possible to demangle it https://github.com/google/glog/blob/5c576f78c49b28d89b23fbb1fc80f54c879ec02e/src/demangle.cc#L1042, but would take a while to strip out the code we would need
src/shogun/base/SGObject.h
Outdated
m_string_to_enum_map[param_name][VALUE_TO_STRING_MACRO(enum_value)] = \ | ||
enum_value; \ | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls undefine your helper macro after having used it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't :( the macro is needed by SG_ADD_OPTION
, which is used later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, well ok
src/shogun/base/SGObject.h
Outdated
if (string_to_enum.find(value) == string_to_enum.end()) | ||
{ | ||
SG_ERROR( | ||
"There is no enum mapping for '%s' in %s::%s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rather say something like "illegal option %s for parameter %s::%s" ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! Was just trying to follow the pattern from the other error messages
(machine_int_t*)&m_liblinear_regression_type, | ||
"liblinear_regression_type", "Type of LibLinear regression."); | ||
SG_ADD_OPTION("liblinear_regression_type", L2R_L2LOSS_SVR); | ||
SG_ADD_OPTION("liblinear_regression_type", L2R_L1LOSS_SVR_DUAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace going wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about this, but what about actually merging all those lines into one with a multi-argument macro?
SG_ADD_OPTION("liblinear_regression_type", L2R_L2LOSS_SVR, L2R_L1LOSS_SVR_DUAL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that would work though. Because you would then have a recursive macro, which then needs to be manually coded, i.e. SG_ADD_OPTION_1
, SG_ADD_OPTION_2
, SG_ADD_OPTION_3
right?
6955e6d
to
26bfb3c
Compare
I think given that we only ever have a few options, why not do the recursive manual macro thing somewhere hidden? Say for up to 5 for a start? It’s not pretty but it’s only at one place whereas the repeated parametername would be in dozens of places |
So for the C++ API we would still get an enum/int with get? |
hmmm ok, I'll write something for that and think if there is something nicer we could do here which would be more sustainable |
@@ -958,7 +958,19 @@ std::string CSGObject::to_string() const | |||
{ | |||
ss << it->first.name() << "="; | |||
auto value = it->second.get_value(); | |||
if (value.visitable()) | |||
if (m_string_to_enum_map.find(it->first.name()) != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf is this fine for the object representation? It works fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In c++ it would have to be int as the enum definitions go hide in the plugins. So quite unsafe actually. No harm in keeping the get for that for now but also offer the one that returns a string.... need to see how it plays out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the output look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LibLinearRegression(C=1,bias=0,data_locked=false,features=null,labels=null,liblinear_regression_type=L2R_L2LOSS_SVR,m_tube_epsilon=0.01,max_iter=10000,max_train_time=0,solver_type=0,store_model_features=false,use_bias=false,w=Vector<double>(0): [])
@karlnapf I don't really like the fact that we need to continuously add more macros, but this works for now. Until c++ has support for enum to string conversions, or we add such a library, or write our own implementation, this will have to do... |
|
||
#define VARARG_IMPL2(base, count, ...) base##count(__VA_ARGS__) | ||
#define VARARG_IMPL(base, count, ...) VARARG_IMPL2(base, count, __VA_ARGS__) | ||
#define VARARG(base, ...) VARARG_IMPL(base, VA_NARGS(__VA_ARGS__), __VA_ARGS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf I added these macros to this header because I need them for SG_ADD_OPTIONS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing
89fecab
to
8df33da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things need to be polished and then it’s good from my side. Maybe request a review from @lisitsyn since I discussed this quite a bit with him
SG_ADD(&m_linear_term, "linear_term", "Linear Term"); | ||
SG_ADD( | ||
(machine_int_t*)&liblinear_solver_type, "liblinear_solver_type", | ||
"Type of LibLinear solver."); | ||
SG_ADD_OPTIONS( | ||
"liblinear_solver_type", L2R_LR, L2R_L2LOSS_SVC_DUAL, L2R_L2LOSS_SVC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t look too bad now! :)
src/shogun/base/macros.h
Outdated
{ \ | ||
static_assert( \ | ||
std::is_enum<decltype(enum_value)>::value, "Expected an enum!"); \ | ||
m_string_to_enum_map[param_name][VALUE_TO_STRING_MACRO(enum_value)] = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also do a runtime assertion that the parameter was actually registered, so that only options for already registered parameters can be added.
Another point would be to assert that the registered parameter is in fact a machine_int_t so the we cannot register options for other types of parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with asserting if a parameter is of type machine_int_t
is that we can't tell the difference from type int32_t
, so the check will not work all the time..
I'm also trying to get the tag dispatching to work such that it is possible to get a string for the enums instead of a |
I am not sure how to get |
5270249
to
38abdf7
Compare
Re asserting the correct type I think one way might be to actually combined the registration with adding the option. Then we can force the right enum type and inside hat method cast it to int, register ad parameter and then attach the options. See my point? |
Re the get o think first checking the map and then the parameters might be ok no? |
yup, I was thinking about something along those lines, but then will have to change SG_ADD a bit, and agree on a good signature.
But need to have a get that has accepts two types: the cast and then the return type, because in this case the value is casted to an int, but then we need to lookup in the map and return a string. this is easy with partial specialisation of get, but SWIG doesn't support that. what I had (and worked fine) was template <typename T, typename U = void, typename std::enable_if_t<std::is_same<T, machine_int_t>::value && std::is_same<U, std::string>::value>* = nullptr>
std::string get(const std::string& name) const noexcept(false)
{
Tag<T> tag(name);
if (m_string_to_enum_map.find(name) != m_string_to_enum_map.end())
{
return string_enum_reverse_lookup(name, get(tag));
}
else
{
SG_ERROR("There are no options in %s::%s", get_name(), name.c_str());
}
return nullptr;
} and then for example in the to_string I had something like this:
but this can't be templated in shogun.i along with the other |
fixed indents
- added macro to add enums - cleanup error messages - fixed SG_ERROR
unlike other getters, this one requests an int in the casting and them looks up the string representation and returns std::string type
msvc fix
7224abe
to
751a0f9
Compare
751a0f9
to
c7e485e
Compare
@karlnapf it seems to be working fine now on all platforms. the only issue is that linear regression is sometimes causing issues on clang, but that is an unrelated issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, good user experience improvement!
* added enum mapping * enum mapping example in LibLinearRegression * print overload for enums * SG_ADD_OPTIONS macro * added enum getter * added checks for int to enum in SWIG put * renamed enum to option * merged SG_ADD_OPTIONS and SG_ADD * added meta example of enum put * get_option meta example * SG_OPTIONS for readability
* added enum mapping * enum mapping example in LibLinearRegression * print overload for enums * SG_ADD_OPTIONS macro * added enum getter * added checks for int to enum in SWIG put * renamed enum to option * merged SG_ADD_OPTIONS and SG_ADD * added meta example of enum put * get_option meta example * SG_OPTIONS for readability
So here is an attempt to start mapping strings from the interface to enums in the library.
So this works now:
Need to think about a neater way to register enums and their mappings though..