Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String enum mapping #4537

Merged
merged 15 commits into from Mar 3, 2019
Merged

Conversation

gf712
Copy link
Member

@gf712 gf712 commented Feb 21, 2019

So here is an attempt to start mapping strings from the interface to enums in the library.

So this works now:

In [1]: import shogun as sg                                                                                                                                                                          

In [2]: lr = sg.machine("LibLinearRegression")                                                                                                                                                       

In [3]: lr                                                                                                                                                                                           
Out[3]: LibLinearRegression(C=1,bias=0,data_locked=false,features=null,labels=null,liblinear_regression_type=1,m_tube_epsilon=0.01,max_iter=10000,max_train_time=0,solver_type=0,store_model_features=false,use_bias=false,w=Vector<double>(0): [])

In [4]: lr.param_options("liblinear_regression_type")                                                                                                                                                
Out[4]: ('L2R_L2LOSS_SVR_DUAL', 'L2R_L1LOSS_SVR_DUAL', 'L2R_L2LOSS_SVR')

In [5]: lr.put("liblinear_regression_type", 'L2R_L2LOSS_SVR_DUAL')                                                                                                                                   

In [6]: lr                                                                                                                                                                                           
Out[6]: LibLinearRegression(C=1,bias=0,data_locked=false,features=null,labels=null,liblinear_regression_type=2,m_tube_epsilon=0.01,max_iter=10000,max_train_time=0,solver_type=0,store_model_features=false,use_bias=false,w=Vector<double>(0): [])

In [7]: lr.get("liblinear_regression_type")                                                                                                                                                         
Out[7]: 1

In [8]: lr.put("liblinear_regression_type", 'L2R_L2LOSS_SVR_DUAL')                                                                                                                                  

In [9]: lr.get("liblinear_regression_type")                                                                                                                                                         
Out[9]: 2

Need to think about a neater way to register enums and their mappings though..

@karlnapf
Copy link
Member

Minor: It would be cool if the visitor pattern that prints the instances would also look up the string for the enum so that users would see that rather than the int

@karlnapf
Copy link
Member

Same with get actually, so that the enums would be strings only in the interfaces

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start! I made a few comments on the user perspective before doing a review. Think it only needs minor tweaking


auto enum_to_string_map = $self->get_enum_to_string_map();

if (enum_to_string_map.find($self->get_name()) == enum_to_string_map.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I’d remove this and just print and error if the requested parameter is not an enum type

@@ -423,6 +423,27 @@ namespace shogun
SG_SERROR("There is no parameter called '%s' in %s", name.c_str(), $self->get_name());
}

std::vector<std::string> param_options(const std::string& name) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems good like this

@@ -49,6 +50,8 @@ struct TParameter;
template <class T> class DynArray;
template <class T> class SGStringList;

using stringToEnumMapType = std::unordered_map<std::string, std::unordered_map<std::string, std::unordered_map<std::string, machine_int_t>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

template <class T>
void put(const std::string& name, const std::string& value) noexcept(false)
{
if (m_enum_to_string_map.find(get_name()) == m_enum_to_string_map.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would remove and just keep the below error. Same as above, only confuses users, it’s more an internal thing

auto string_to_enum = string_to_enum_map[name];

if (string_to_enum.find(value) == string_to_enum.end())
SG_ERROR("There is no enum mapping for '%s' in %s::%s", value, get_name(), name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this further: I wouldnt even talk about enum mappings, but just “options”. Users don’t need to know that there is an underlying enum

(machine_int_t*)&m_liblinear_regression_type,
"liblinear_regression_type", "Type of LibLinear regression.");
m_enum_to_string_map[get_name()]["liblinear_regression_type"] = {
{"L2R_L2LOSS_SVR", 0},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the enum value here rather than the int to make it readable, and they are available here

SG_ADD(
(machine_int_t*)&m_liblinear_regression_type,
"liblinear_regression_type", "Type of LibLinear regression.");
m_enum_to_string_map[get_name()]["liblinear_regression_type"] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since this happens at runtime now, we could rather have one options map for each class, stored locally, and then a method “add_option(string, int)” that is called this method here. Or even better
“add_options(string param, map<string, int>)” which then can be called by directly writing out the map as {“foo”: SOLVER::FOO, “bar”: SOLVER::BAR} in a single call
I didn’t realise yesterday that the map doesn’t need to be gobal nor static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point, I didn't quite like the fact that there was this static variable around! So add_options(string param, map<string, int>) just sits in SGObject, and then just need a mapping from param to string_enum_map, which makes it much simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even: Do you think using a macro, we could avoid having to write the enum value name AND a string ?
Like
ADD_OPTION(param_name, FOO); which adds the mapping “FOO”->FOO where FOO is one option of the enum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we’d have to write things one by one but on the other hand it avoids the redundancy of explicitly naming the option string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to pass the literal variable name in a macro yes. It's a shame there isn't a universal name demangling across compilers, would be quite useful in this case...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, c++ lacks quite some things ... 😱

@@ -147,7 +147,7 @@ class CLibLinearRegression : public CLinearMachine
protected:

/** train machine */
virtual bool train_machine(CFeatures* data = NULL);
bool train_machine(CFeatures* data = NULL) override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing using override and not using it causes warnings iirc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, but at some point would probably start adding override no? Doesn't it come at a cost when you just add more and more virtual class members?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes at some point. I would just leave this for another or or the detox gsoc project

@gf712 gf712 force-pushed the feature/string_enum_mapping branch from 4fd2301 to 27017a1 Compare February 21, 2019 15:58

if (param_to_enum_map.find(name) == param_to_enum_map.end())
{
SG_SERROR("There are no options for the parameter '%s'", name.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor: maybe " ..for parameter %s::%s" and print classname

@@ -100,6 +103,16 @@ template <class T> class SGStringList;

#define SG_ADD(...) VARARG(SG_ADD, __VA_ARGS__)

#define VALUE_TO_STRING_MACRO(s) #s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long live the preprocessor :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is so funny that this old school stuff still is necessary/useful in places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes unfortunately this is the simplest way..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like it might be possible to demangle it https://github.com/google/glog/blob/5c576f78c49b28d89b23fbb1fc80f54c879ec02e/src/demangle.cc#L1042, but would take a while to strip out the code we would need

m_string_to_enum_map[param_name][VALUE_TO_STRING_MACRO(enum_value)] = \
enum_value; \
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls undefine your helper macro after having used it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I can't :( the macro is needed by SG_ADD_OPTION, which is used later on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, well ok

if (string_to_enum.find(value) == string_to_enum.end())
{
SG_ERROR(
"There is no enum mapping for '%s' in %s::%s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rather say something like "illegal option %s for parameter %s::%s" ... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Was just trying to follow the pattern from the other error messages

(machine_int_t*)&m_liblinear_regression_type,
"liblinear_regression_type", "Type of LibLinear regression.");
SG_ADD_OPTION("liblinear_regression_type", L2R_L2LOSS_SVR);
SG_ADD_OPTION("liblinear_regression_type", L2R_L1LOSS_SVR_DUAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace going wrong here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about this, but what about actually merging all those lines into one with a multi-argument macro?
SG_ADD_OPTION("liblinear_regression_type", L2R_L2LOSS_SVR, L2R_L1LOSS_SVR_DUAL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that would work though. Because you would then have a recursive macro, which then needs to be manually coded, i.e. SG_ADD_OPTION_1, SG_ADD_OPTION_2, SG_ADD_OPTION_3 right?

@gf712 gf712 force-pushed the feature/string_enum_mapping branch 4 times, most recently from 6955e6d to 26bfb3c Compare February 21, 2019 16:53
@karlnapf
Copy link
Member

I think given that we only ever have a few options, why not do the recursive manual macro thing somewhere hidden? Say for up to 5 for a start? It’s not pretty but it’s only at one place whereas the repeated parametername would be in dozens of places

@gf712
Copy link
Member Author

gf712 commented Feb 21, 2019

Same with get actually, so that the enums would be strings only in the interfaces

So for the C++ API we would still get an enum/int with get?

@gf712
Copy link
Member Author

gf712 commented Feb 21, 2019

I think given that we only ever have a few options, why not do the recursive manual macro thing somewhere hidden? Say for up to 5 for a start? It’s not pretty but it’s only at one place whereas the repeated parametername would be in dozens of places

hmmm ok, I'll write something for that and think if there is something nicer we could do here which would be more sustainable

@@ -958,7 +958,19 @@ std::string CSGObject::to_string() const
{
ss << it->first.name() << "=";
auto value = it->second.get_value();
if (value.visitable())
if (m_string_to_enum_map.find(it->first.name()) !=
Copy link
Member Author

@gf712 gf712 Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf is this fine for the object representation? It works fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In c++ it would have to be int as the enum definitions go hide in the plugins. So quite unsafe actually. No harm in keeping the get for that for now but also offer the one that returns a string.... need to see how it plays out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the output look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LibLinearRegression(C=1,bias=0,data_locked=false,features=null,labels=null,liblinear_regression_type=L2R_L2LOSS_SVR,m_tube_epsilon=0.01,max_iter=10000,max_train_time=0,solver_type=0,store_model_features=false,use_bias=false,w=Vector<double>(0): [])

@gf712
Copy link
Member Author

gf712 commented Feb 22, 2019

@karlnapf I don't really like the fact that we need to continuously add more macros, but this works for now. Until c++ has support for enum to string conversions, or we add such a library, or write our own implementation, this will have to do...


#define VARARG_IMPL2(base, count, ...) base##count(__VA_ARGS__)
#define VARARG_IMPL(base, count, ...) VARARG_IMPL2(base, count, __VA_ARGS__)
#define VARARG(base, ...) VARARG_IMPL(base, VA_NARGS(__VA_ARGS__), __VA_ARGS__)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf I added these macros to this header because I need them for SG_ADD_OPTIONS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

@gf712 gf712 force-pushed the feature/string_enum_mapping branch from 89fecab to 8df33da Compare February 22, 2019 14:04
Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things need to be polished and then it’s good from my side. Maybe request a review from @lisitsyn since I discussed this quite a bit with him

SG_ADD(&m_linear_term, "linear_term", "Linear Term");
SG_ADD(
(machine_int_t*)&liblinear_solver_type, "liblinear_solver_type",
"Type of LibLinear solver.");
SG_ADD_OPTIONS(
"liblinear_solver_type", L2R_LR, L2R_L2LOSS_SVC_DUAL, L2R_L2LOSS_SVC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t look too bad now! :)

{ \
static_assert( \
std::is_enum<decltype(enum_value)>::value, "Expected an enum!"); \
m_string_to_enum_map[param_name][VALUE_TO_STRING_MACRO(enum_value)] = \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also do a runtime assertion that the parameter was actually registered, so that only options for already registered parameters can be added.
Another point would be to assert that the registered parameter is in fact a machine_int_t so the we cannot register options for other types of parameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with asserting if a parameter is of type machine_int_t is that we can't tell the difference from type int32_t, so the check will not work all the time..

@gf712 gf712 requested a review from lisitsyn February 23, 2019 11:35
@gf712
Copy link
Member Author

gf712 commented Feb 23, 2019

I'm also trying to get the tag dispatching to work such that it is possible to get a string for the enums instead of a int in the SWIG interfaces..

@gf712
Copy link
Member Author

gf712 commented Feb 23, 2019

I am not sure how to get get to work with SWIG, because it uses partial specialisation. Basically we would have to use template deduction and SWIG doesn't support it yet. I tried a bit with tag dispatching but the issue is the same, we can only tell if an int should be returned as an int or std::string after checking the enum_string map, at runtime...

@gf712 gf712 force-pushed the feature/string_enum_mapping branch from 5270249 to 38abdf7 Compare February 23, 2019 15:11
@karlnapf
Copy link
Member

Re asserting the correct type I think one way might be to actually combined the registration with adding the option. Then we can force the right enum type and inside hat method cast it to int, register ad parameter and then attach the options. See my point?

@karlnapf
Copy link
Member

Re the get o think first checking the map and then the parameters might be ok no?

@gf712
Copy link
Member Author

gf712 commented Feb 24, 2019

Re asserting the correct type I think one way might be to actually combined the registration with adding the option. Then we can force the right enum type and inside hat method cast it to int, register ad parameter and then attach the options. See my point?

yup, I was thinking about something along those lines, but then will have to change SG_ADD a bit, and agree on a good signature.

Re the get o think first checking the map and then the parameters might be ok no?

But need to have a get that has accepts two types: the cast and then the return type, because in this case the value is casted to an int, but then we need to lookup in the map and return a string. this is easy with partial specialisation of get, but SWIG doesn't support that. what I had (and worked fine) was

template <typename T, typename U = void, typename std::enable_if_t<std::is_same<T, machine_int_t>::value && std::is_same<U, std::string>::value>* = nullptr>
    std::string get(const std::string& name) const noexcept(false)
    {
        Tag<T> tag(name);

        if (m_string_to_enum_map.find(name) != m_string_to_enum_map.end())
        {
            return string_enum_reverse_lookup(name, get(tag));
        }
        else
        {
            SG_ERROR("There are no options in %s::%s", get_name(), name.c_str());
        }
        return nullptr;
    }

and then for example in the to_string I had something like this:

auto value = get<machine_int_t, std::string>(param);

but this can't be templated in shogun.i along with the other get (I tried %template() get<machine_int_t, std::string>) which I am guessing is related to this

@gf712 gf712 force-pushed the feature/string_enum_mapping branch 8 times, most recently from 7224abe to 751a0f9 Compare March 2, 2019 13:20
@gf712 gf712 force-pushed the feature/string_enum_mapping branch from 751a0f9 to c7e485e Compare March 2, 2019 15:19
@gf712
Copy link
Member Author

gf712 commented Mar 2, 2019

@karlnapf it seems to be working fine now on all platforms. the only issue is that linear regression is sometimes causing issues on clang, but that is an unrelated issue.

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, good user experience improvement!

@karlnapf karlnapf merged commit 66e9864 into shogun-toolbox:develop Mar 3, 2019
vigsterkr pushed a commit to vigsterkr/shogun that referenced this pull request Mar 9, 2019
* added enum mapping
* enum mapping example in LibLinearRegression
* print overload for enums
* SG_ADD_OPTIONS macro
* added enum getter
* added checks for int to enum in SWIG put
* renamed enum to option
* merged SG_ADD_OPTIONS and SG_ADD
* added meta example of enum put
* get_option meta example
* SG_OPTIONS for readability
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
* added enum mapping
* enum mapping example in LibLinearRegression
* print overload for enums
* SG_ADD_OPTIONS macro
* added enum getter
* added checks for int to enum in SWIG put
* renamed enum to option
* merged SG_ADD_OPTIONS and SG_ADD
* added meta example of enum put
* get_option meta example
* SG_OPTIONS for readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants