Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/factory #4198

Merged
merged 31 commits into from Mar 20, 2018
Merged

Feature/factory #4198

merged 31 commits into from Mar 20, 2018

Conversation

karlnapf
Copy link
Member

@karlnapf karlnapf commented Mar 5, 2018

No description provided.

@karlnapf karlnapf force-pushed the feature/factory branch 3 times, most recently from e8ca7d1 to 822504d Compare March 7, 2018 13:08
@@ -11,7 +11,7 @@
"Comment": "//$comment\n",
"Init": {
"Construct": "auto $name = some<C$typeName>($arguments)$kwargs",
"Copy": "auto $name = $expr$kwargs",
"Copy": "auto $name = wrap($expr)$kwargs",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisitsyn re-introduced wrap since we will get memory errors otherwise (things are not ref'ed and then deleted by objects)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit strange. Copy ctor doesn't ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is if you do
'labels = svm->precict'
then Shogun returns a raw pointer. Now if you pass this to another object that increases the ref-counter, and then decreases it, the object is destroyed. So calling a method on it subsequently crashes. This is why I re-added it, just like when you introduced it in the first place.
Or you mean something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I've got it now. I guess we change it once we get rid of raw pointers.

@@ -2,6 +2,7 @@ STRING(REGEX REPLACE "(.*)/narray.*$" "\\1" NARRAY_PATH ${NARRAY_LIB})

LIST(APPEND EXCLUDED_RUBY_META_EXAMPLES
base_api-put_get
base_api-factory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr same problem as for put: the check for SGMatrix/object bails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so I blacklisted it for now)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that should be fixed in develop, so rebase over develop, otherwise i dont understand what's the problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in develop? I don't remember us fixing that SWIG bug? Did I miss it?
#4177

File features_file = csv_file("../../data/classifier_binary_2d_nonlinear_features_train.dat")
Features features_from_file = features(features_file, enum EPrimitiveType.PT_FLOAT64)

File labels_file = csv_file("../../data/classifier_binary_2d_nonlinear_labels_train.dat")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels from file will be CDenseLabels which are then converted in the classes (needs some refactor in algorithms, see below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have just added the factory.sg could you change the relative path (../../) to an env var? i'm happy to add the support for that for every language 💃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as if we install every these meta examples, the location of data should be somewhere very else :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep absoltely agree. But I think maybe let's do that for all meta examples in another PR?
I can do that next

if (!object)
{
SG_SERROR("No such class %s", name);
SG_SERROR(
"Class %s with primitive type %s does not exist.\n", name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we need std::iostream for our logs, then we could just overload the << operator and dont have to introduce all these to_string methods ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it looks shitty but operator << will not solve your problem to convert some objects to string.. that only works with primitive types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand having these macros is nice in a long run as we could switch finally our logging backend. maybe spdlog? that allows us to redirect the logs into any sink not only stdout/err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I can overload the << thing, no? Then I dont need the function calls when I write error messages, but can collect them somewhere else?
Totally up for doing a new logging backend. GSoC project part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new logging backend = just change the macro definitions.
yeah you can offload the operator << but that means that you force to use std::iostream-like interface to be used, which is not the most liked IO back-end (for good reasons) in c++ community

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand you can do the same thing, i.e. implement to_string() for that object where you wanna have operator <<, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can introduce an std::cout-like object which supports <<. Internally it could use any logger we want.

@@ -48,7 +48,7 @@ bool CLibSVM::train_machine(CFeatures* data)
struct svm_node* x_space;

ASSERT(m_labels && m_labels->get_num_labels())
ASSERT(m_labels->get_label_type() == LT_BINARY)
auto binary_labels = m_labels->as_binary();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisitsyn @vigsterkr this is how labels could be converted in algorithms, if they need an interface to CBinaryLabels for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

template<class ST> CSparseFeatures<ST>::CSparseFeatures(CDenseFeatures<ST>* dense)
template <class ST>
CSparseFeatures<ST>::CSparseFeatures(CDenseFeatures<ST>* dense)
: CDotFeatures(0), feature_cache(NULL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was causing a segfault in python legacy examples every now and then


using namespace shogun;

CBinaryLabels::CBinaryLabels() : CDenseLabels()
{
}

CBinaryLabels::CBinaryLabels(const CBinaryLabels& orig) : CDenseLabels(orig)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave labels copy constructors to the sake of converting them, see as_binary below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a copy? i mean it's over the same datastruct no? do we really want the as_binary be a copy? dont we wanna use c++11's feature of std::move and rvalue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about this. The reason why I made it a new instance was the fact that in the as_binary below, all other cases return a new instance, so to be consistent.
And also then the as_binary method can be const. On the other hand, as_binary could return const pointer ...
If you think it is better to just cast, I can do that. Not sure I get what you mean with the move and rvalue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not so sure if i get what this has to do with constness...
i'm talking about the fact that instead of copying the label, what you want is the same memory/object & everything just as a different class, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so then use rvalue/move instead of this pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you want to deal with converting say multiclass labels (that only contain two classes) to binary? For this, the actual vector needs to be changed, so a new instance is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For converting the same datastruct, I see the point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to copy BinaryLabels most of the time. What should be rather copy-on-write is the data beneath.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually prefer if this thing gave you a read-only labels object (we wanna have immutable labels anyways)
Note that memory is not copied here.
Also, after reading on it, I think the move constructor stuff doesnt really help here

@@ -77,15 +83,15 @@ void CDenseLabels::set_labels(SGVector<float64_t> v)
m_labels = v;
}

SGVector<float64_t> CDenseLabels::get_labels()
SGVector<float64_t> CDenseLabels::get_labels() const
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made lots of getters const


virtual const char* get_name() const override
{
return "DenseLabels";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisitsyn made the class non-abstract, for the sake of loading from files (one doenst know what type of labels it is at that point, only when the algorithm tries to convert to say binary labels we know)

{
return m_current_values;
}

Some<CBinaryLabels> CLabels::as_binary() const
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned conversion method to convert label types into each other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above regarding move/rvalue

@@ -927,7 +927,7 @@ void SGVector<T>::scale(T alpha)

template<class T> void SGVector<T>::load(CFile* loader)
{
REQUIRE(loader, "Require a valid 'c FILE pointer'\n");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great error message for users

@@ -66,7 +67,7 @@ struct CQuadraticTimeMMD::Self
SGVector<float64_t> gamma_fit_null();

CQuadraticTimeMMD& owner;
unique_ptr<CMultiKernelQuadraticTimeMMD> multi_kernel;
shared_ptr<CMultiKernelQuadraticTimeMMD> multi_kernel;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta examples now use some so cannot have a unique ptr inside shogun and expose the raw pointer via SWIG

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this way you loose:

  1. std::move the thing around - maybe not so important
  2. enforce sole ownership of the resource.... @lambday was this intentional to have it as unique? i would presume yes....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with him, he was fine with this change. Alternative would be to always return a new instance in the ::multiclass() method (which currently returns the raw pointer)

return result;
}

CLabels* labels(CFile* file)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr @lisitsyn this is the stuff behind the factory API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach. It would become the main interface.

@vigsterkr
Copy link
Member

vigsterkr commented Mar 7, 2018 via email

@vigsterkr
Copy link
Member

vigsterkr commented Mar 7, 2018 via email

@@ -48,7 +48,7 @@ bool CLibSVM::train_machine(CFeatures* data)
struct svm_node* x_space;

ASSERT(m_labels && m_labels->get_num_labels())
ASSERT(m_labels->get_label_type() == LT_BINARY)
auto binary_labels = m_labels->as_binary();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.


using namespace shogun;

CBinaryLabels::CBinaryLabels() : CDenseLabels()
{
}

CBinaryLabels::CBinaryLabels(const CBinaryLabels& orig) : CDenseLabels(orig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to copy BinaryLabels most of the time. What should be rather copy-on-write is the data beneath.

* @param pt primitive type
* @return string for primitive type
*/
std::string ptype(EPrimitiveType pt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like primitive_name a bit better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

return result;
}

CLabels* labels(CFile* file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach. It would become the main interface.

@karlnapf
Copy link
Member Author

We could also have a constructor in CBinaryLabels that takes a CLabels pointer and then does the same thing as this introduced as_binary method. Not sure that conflicts with a copy constructor in some way. I actually like the explicitness

@karlnapf
Copy link
Member Author

@lisitsyn see updated version

* @return The requested type
*/
template <class T>
const T* as() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't copy-paste this... some parts of it at least can be merged

@@ -153,6 +153,11 @@ namespace shogun
return value;
}

inline const char* wrap(const char* ptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for inline-ness plz use the macro: SG_FORCED_INLINE
which will make it sure that it's really inlined

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just replace it with the given macro above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case you insist on using inline... otherwise just drop the keyword inline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't insist, I am just following the rest of the file, also it doesnt have any shogun includes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but let me remove the keyword

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont get it why is it better to remove instead of using a macro that ensures the inline-ess... hence make the whole code faster... ¯_(ツ)_/¯


namespace shogun
{
Some<CBinaryLabels> binary_from_binary(CBinaryLabels* orig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

que?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are helper methods, could also move the contents directly into the switch below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp only, nevermind

labels[i])
}
}
auto result = new CBinaryLabels(labels);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik this copies the whole vvector itself.... do we really need to copy the data where it doesn't change at all....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it copies indeed, which I actually didnt realise. Why does it copy? Doesnt seem to make sense to me. And agree, it should not

return Some<CBinaryLabels>::from_raw(orig);
}

Some<CBinaryLabels> binary_from_dense(CDenseLabels* orig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

que?
do we need to spell this one out? does this need to be expose to anybody apart from ourself internally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is in .cpp. nevermind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is only internally, didnt put it in the header.
You think it should be just inside the switch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the 2nd comment... it's foobar, just ignore the first comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

browser didnt reload ;)

(*std::min_element(unique.begin(), unique.end())) == 0 &&
(*std::max_element(unique.begin(), unique.end())) == (index_t)unique.size()-1,
"Multiclass labels must be contiguous integers in [0, ..., num_classes -1].\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fixme/todo this... as this should not be a hard requirement, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean one would like to have something even like a: ['foo', 'bar', 'zelda'] multiclass labeling possible sooner than later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

@@ -62,6 +83,6 @@ int main(int argc, const char *argv[])
SG_UNREF(a_ref);

exit_shogun();
return equal != true;
return (loaded_equals_ref && ref_equals_loaded) != true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could drop the != true part and just have a negation, right?

ASSERT_TRUE(cast != nullptr);
auto loaded_mat = cast->get_feature_matrix();
EXPECT_TRUE(loaded_mat.equals(mat));
delete obj;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the file as well plz.

make some methods const, add copy constructors, add CBinaryLabels::from(...)
Populate for kernel, machine, features, labels
Includes LibSVM using new CLabels::as_binary()
-don't copy memory when converting from dense to binary labels
-delete temporary files in unit tests
-clean ups
-remove unused const version of CSGObject::as
-add == operator for Some
-clean-ups
@karlnapf
Copy link
Member Author

Alright, let's see what the buildbot says to this...

@karlnapf karlnapf merged commit 2362d20 into shogun-toolbox:develop Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants