New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add features and labels view #4352
Add features and labels view #4352
Conversation
src/shogun/features/Features.cpp
Outdated
@@ -265,6 +265,24 @@ void CFeatures::unset_property(EFeatureProperty p) | |||
properties &= (properties | p) ^ p; | |||
} | |||
|
|||
CFeatures* CFeatures::view(const SGVector<index_t>& subset) | |||
{ | |||
auto feats_view = this->duplicate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would prefer a copy ctor here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a virtual function that calls copy ctor of subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry I see now, totally justified!
src/shogun/features/Features.cpp
Outdated
return feats_view; | ||
} | ||
|
||
CFeatures* CFeatures::view(const std::vector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesnt belong into CFeatures imo, this is conversion code between std::vector and SGVector and therefore should sit in SGVector
features->view(SGVector<float64_t>(my_std_vector))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even done implicitly (if possible)
@@ -156,6 +156,11 @@ CBinaryLabels::CBinaryLabels(const CDenseLabels& dense) : CDenseLabels(dense) | |||
ensure_valid(); | |||
} | |||
|
|||
CLabels* CBinaryLabels::duplicate() const | |||
{ | |||
return new CBinaryLabels(*this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have this method why not just use the copy ctor?
src/shogun/labels/Labels.cpp
Outdated
{ | ||
auto labels_view = this->duplicate(); | ||
|
||
auto sg_subset = SGVector<index_t>(subset.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
// apply subset | ||
if (m_subset_frac!=1.0) | ||
apply_subset(feats,interf); | ||
const auto result = get_subset(feats, interf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why all these changes? shouldnt we focus on the view first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is taken from michele's pr, we need small refactor on this to deploy views
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,
I technically would prefer first adding the view and then have a second Pr to deploy it.
But I think others see that differently, so ok leave it in :)
{ | ||
REQUIRE(m_labels,"training labels not set!\n") | ||
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls avoid such whitespace changes in PRs.
They should be in a sep PR. It just pollutes the diff (i.e. hard to review, takes longer)
@@ -273,8 +271,14 @@ CRegressionLabels* CStochasticGBMachine::compute_pseudo_residuals(CRegressionLab | |||
return new CRegressionLabels(residuals); | |||
} | |||
|
|||
void CStochasticGBMachine::apply_subset(CDenseFeatures<float64_t>* f, CLabels* interf) | |||
std::tuple<Some<CDenseFeatures<float64_t>>, Some<CRegressionLabels>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain the reasonaing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since view create a new instance, using smart pointers prevent ref/unref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt this create the same problem with Some?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually no, this function is only used internally, so we know the static type of return value based on types of args
@@ -519,7 +516,7 @@ TEST(CARTree, form_t1_test) | |||
lab[3]=1; | |||
lab[4]=0; | |||
|
|||
CDenseFeatures<float64_t>* feats=new CDenseFeatures<float64_t>(data); | |||
auto feats = some<CDenseFeatures<float64_t>>(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I would prefer those changes to happen in a separate minimal PR
src/shogun/features/Features.cpp
Outdated
@@ -272,17 +272,6 @@ CFeatures* CFeatures::view(const SGVector<index_t>& subset) | |||
return feats_view; | |||
} | |||
|
|||
CFeatures* CFeatures::view(const std::vector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain ?
We need to use the base class interface in things like xvalidation ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry nevermind I didnt see the arugment was the std
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead: yes ! :)
src/shogun/features/Features.cpp
Outdated
CFeatures* CFeatures::view(const SGVector<index_t>& subset) | ||
{ | ||
auto feats_view = this->duplicate(); | ||
feats_view->add_subset(subset.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder, do we want to clone the subset?
I think actually we shouldnt
And also I think that add_subset
should accept a const vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe you have a different opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, we don't need clone here.
add_subset
should accept a const vector, and we need to clone the vector in ctor of CSubset
ASSERT_EQ(labels_subset->get_num_labels(), subset.vlen); | ||
for (auto i : range(subset.vlen)) | ||
{ | ||
EXPECT_EQ(labels_subset->get_int_label(i), labels_true[subset[i]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can actually systematically test all access methods of labels and features somehow?
I would also assert that the view's data pointer is the same as the original one (i.e. no copy happened)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have a way to test different features/labels types, and different access methods now.
and we can't check view's data pointer because that's a private member, calling get_labels
will create a copy if subset is present
77a9f85
to
8711ea6
Compare
d1a175a
to
ffb3fcc
Compare
a241ab1
to
dbfd69e
Compare
47a55bb
to
83884f1
Compare
83884f1
to
f444cec
Compare
src/shogun/lib/View.h
Outdated
static_assert( | ||
std::is_base_of<CFeatures, T>::value || | ||
std::is_base_of<CLabels, T>::value, | ||
"Only CFeatures and CLabels are viewable."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of error messages that contain things that might change in the future, i.e. a new class is made viewable.
I would just state: Class is not viewable. The compiler error will provide the T and also the static assert which will tell the caller what is viewable and what is not. Or?
src/shogun/lib/View.h
Outdated
{ | ||
|
||
template <class T> | ||
T* view(T* viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make viewable
const?
src/shogun/lib/View.h
Outdated
{ | ||
|
||
template <class T> | ||
T* view(T* viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
How are we going to do this without multiple inheritance?
Maybe a mixin approach would be better, see IterativeMachine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf i'm not clear, could you explain the multiple inheritance problem?
currently we expect T to be either features or labels, but we can also use viewable mixin in features and labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that I had a mistake in my thinking. This is just a global templated function so all good, nevermind!
I like this idea of having the method with the static assert!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k... note that needs extra love in case of SWIG interfaces... although it's a good question whether we actually wanna expose view mechanism for swig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no.
REQUIRE(m_labels,"training labels not set!\n") | ||
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels(); | ||
SGVector<float64_t> labels = | ||
(dynamic_cast<CDenseLabels*>(labs))->get_labels(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since there's already a change here we could just port using as<CDenseLabels>
@@ -1205,27 +1204,27 @@ void CCARTree::prune_by_cross_validation(CDenseFeatures<float64_t>* data, int32_ | |||
} | |||
|
|||
SGVector<int32_t> subset(train_indices.data(),train_indices.size(),false); | |||
data->add_subset(subset); | |||
m_labels->add_subset(subset); | |||
auto dense_labels = m_labels->as<CDenseLabels>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as agreed no need for this, drop it...
src/shogun/lib/View.h
Outdated
{ | ||
|
||
template <class T> | ||
T* view(T* viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?
src/shogun/lib/View.h
Outdated
} | ||
|
||
template <class T> | ||
T* view(Some<T> viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf @vigsterkr maybe we can just return Some<T>
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! If you pass Some you should get Some
@vinx13 yep cool! this way the c++ code became much much cleaner! |
src/shogun/lib/View.h
Outdated
* @return new viewable instance | ||
*/ | ||
template <class T> | ||
T* view(T* viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still vote for const T*
src/shogun/lib/View.h
Outdated
* @return new viewable instance | ||
*/ | ||
template <class T> | ||
Some<T> view(Some<T> viewable, const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase with latest develop and use override where necessary. this way the CIs wont fail :)
src/shogun/labels/BinaryLabels.h
Outdated
@@ -96,6 +96,8 @@ class CBinaryLabels : public CDenseLabels | |||
*/ | |||
virtual ELabelType get_label_type() const; | |||
|
|||
virtual CLabels* duplicate() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use override
* | ||
* @return labels object | ||
*/ | ||
virtual CLabels* duplicate() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use override
c6ca257
to
9c75248
Compare
9c75248
to
ec57067
Compare
Continue #3970 , but features and labels view return raw pointer now instead of
Some
because that will cause covariant type problem, andSome
is not available in SWIG