New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features and labels view #4352

Merged
merged 14 commits into from Jul 26, 2018

Conversation

Projects
None yet
3 participants
@vinx13
Contributor

vinx13 commented Jul 3, 2018

Continue #3970 , but features and labels view return raw pointer now instead of Some because that will cause covariant type problem, and Some is not available in SWIG

@@ -265,6 +265,24 @@ void CFeatures::unset_property(EFeatureProperty p)
properties &= (properties | p) ^ p;
}
CFeatures* CFeatures::view(const SGVector<index_t>& subset)
{
auto feats_view = this->duplicate();

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

i would prefer a copy ctor here

This comment has been minimized.

@vinx13

vinx13 Jul 4, 2018

Contributor

this is a virtual function that calls copy ctor of subclasses

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

ah sorry I see now, totally justified!

return feats_view;
}
CFeatures* CFeatures::view(const std::vector<index_t>& subset)

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

this doesnt belong into CFeatures imo, this is conversion code between std::vector and SGVector and therefore should sit in SGVector
features->view(SGVector<float64_t>(my_std_vector))

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

or even done implicitly (if possible)

@@ -156,6 +156,11 @@ CBinaryLabels::CBinaryLabels(const CDenseLabels& dense) : CDenseLabels(dense)
ensure_valid();
}
CLabels* CBinaryLabels::duplicate() const
{
return new CBinaryLabels(*this);

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

why have this method why not just use the copy ctor?

{
auto labels_view = this->duplicate();
auto sg_subset = SGVector<index_t>(subset.size());

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

same comment as above

// apply subset
if (m_subset_frac!=1.0)
apply_subset(feats,interf);
const auto result = get_subset(feats, interf);

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

why all these changes? shouldnt we focus on the view first?

This comment has been minimized.

@vinx13

vinx13 Jul 4, 2018

Contributor

this is taken from michele's pr, we need small refactor on this to deploy views

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

ok,
I technically would prefer first adding the view and then have a second Pr to deploy it.
But I think others see that differently, so ok leave it in :)

{
REQUIRE(m_labels,"training labels not set!\n")
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels();

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

pls avoid such whitespace changes in PRs.
They should be in a sep PR. It just pollutes the diff (i.e. hard to review, takes longer)

@@ -273,8 +271,14 @@ CRegressionLabels* CStochasticGBMachine::compute_pseudo_residuals(CRegressionLab
return new CRegressionLabels(residuals);
}
void CStochasticGBMachine::apply_subset(CDenseFeatures<float64_t>* f, CLabels* interf)
std::tuple<Some<CDenseFeatures<float64_t>>, Some<CRegressionLabels>,

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

can you explain the reasonaing here?

This comment has been minimized.

@vinx13

vinx13 Jul 4, 2018

Contributor

since view create a new instance, using smart pointers prevent ref/unref

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

doesnt this create the same problem with Some?

This comment has been minimized.

@vinx13

vinx13 Jul 4, 2018

Contributor

actually no, this function is only used internally, so we know the static type of return value based on types of args

@@ -519,7 +516,7 @@ TEST(CARTree, form_t1_test)
lab[3]=1;
lab[4]=0;
CDenseFeatures<float64_t>* feats=new CDenseFeatures<float64_t>(data);
auto feats = some<CDenseFeatures<float64_t>>(data);

This comment has been minimized.

@karlnapf

karlnapf Jul 4, 2018

Member

again, I would prefer those changes to happen in a separate minimal PR

@@ -272,17 +272,6 @@ CFeatures* CFeatures::view(const SGVector<index_t>& subset)
return feats_view;
}
CFeatures* CFeatures::view(const std::vector<index_t>& subset)

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

can you explain ?
We need to use the base class interface in things like xvalidation ..

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

ah sorry nevermind I didnt see the arugment was the std

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

instead: yes ! :)

CFeatures* CFeatures::view(const SGVector<index_t>& subset)
{
auto feats_view = this->duplicate();
feats_view->add_subset(subset.clone());

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

i wonder, do we want to clone the subset?
I think actually we shouldnt
And also I think that add_subset should accept a const vector

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

but maybe you have a different opinion?

This comment has been minimized.

@vinx13

vinx13 Jul 5, 2018

Contributor

agree, we don't need clone here.
add_subset should accept a const vector, and we need to clone the vector in ctor of CSubset

ASSERT_EQ(labels_subset->get_num_labels(), subset.vlen);
for (auto i : range(subset.vlen))
{
EXPECT_EQ(labels_subset->get_int_label(i), labels_true[subset[i]]);

This comment has been minimized.

@karlnapf

karlnapf Jul 5, 2018

Member

maybe we can actually systematically test all access methods of labels and features somehow?
I would also assert that the view's data pointer is the same as the original one (i.e. no copy happened)

This comment has been minimized.

@vinx13

vinx13 Jul 5, 2018

Contributor

we don't have a way to test different features/labels types, and different access methods now.
and we can't check view's data pointer because that's a private member, calling get_labels will create a copy if subset is present

@vigsterkr vigsterkr force-pushed the shogun-toolbox:feature/transformers branch 2 times, most recently from 77a9f85 to 8711ea6 Jul 10, 2018

@vinx13 vinx13 force-pushed the vinx13:feature/features_view branch 3 times, most recently from d1a175a to ffb3fcc Jul 10, 2018

@vigsterkr vigsterkr force-pushed the shogun-toolbox:feature/transformers branch from a241ab1 to dbfd69e Jul 12, 2018

@vinx13 vinx13 changed the base branch from feature/transformers to develop Jul 12, 2018

@vinx13 vinx13 force-pushed the vinx13:feature/features_view branch 4 times, most recently from 47a55bb to 83884f1 Jul 12, 2018

@vinx13 vinx13 force-pushed the vinx13:feature/features_view branch from 83884f1 to f444cec Jul 24, 2018

static_assert(
std::is_base_of<CFeatures, T>::value ||
std::is_base_of<CLabels, T>::value,
"Only CFeatures and CLabels are viewable.");

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

I am not a fan of error messages that contain things that might change in the future, i.e. a new class is made viewable.
I would just state: Class is not viewable. The compiler error will provide the T and also the static assert which will tell the caller what is viewable and what is not. Or?

{
template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

could we make viewable const?

{
template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

is this used anywhere?
How are we going to do this without multiple inheritance?
Maybe a mixin approach would be better, see IterativeMachine

This comment has been minimized.

@vinx13

vinx13 Jul 25, 2018

Contributor

@karlnapf i'm not clear, could you explain the multiple inheritance problem?
currently we expect T to be either features or labels, but we can also use viewable mixin in features and labels

This comment has been minimized.

@vigsterkr

vigsterkr Jul 25, 2018

Member

@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

I just realised that I had a mistake in my thinking. This is just a global templated function so all good, nevermind!
I like this idea of having the method with the static assert!

This comment has been minimized.

@vigsterkr

vigsterkr Jul 25, 2018

Member

k... note that needs extra love in case of SWIG interfaces... although it's a good question whether we actually wanna expose view mechanism for swig?

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

I think no.

REQUIRE(m_labels,"training labels not set!\n")
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels();
SGVector<float64_t> labels =
(dynamic_cast<CDenseLabels*>(labs))->get_labels();

This comment has been minimized.

@vigsterkr

vigsterkr Jul 25, 2018

Member

since there's already a change here we could just port using as<CDenseLabels>

@@ -1205,27 +1204,27 @@ void CCARTree::prune_by_cross_validation(CDenseFeatures<float64_t>* data, int32_
}
SGVector<int32_t> subset(train_indices.data(),train_indices.size(),false);
data->add_subset(subset);
m_labels->add_subset(subset);
auto dense_labels = m_labels->as<CDenseLabels>();

This comment has been minimized.

@vigsterkr

vigsterkr Jul 25, 2018

Member

as agreed no need for this, drop it...

{
template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@vigsterkr

vigsterkr Jul 25, 2018

Member

@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?

}
template <class T>
T* view(Some<T> viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@vinx13

vinx13 Jul 25, 2018

Contributor

@karlnapf @vigsterkr maybe we can just return Some<T> here?

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

Yes please! If you pass Some you should get Some

@vigsterkr

This comment has been minimized.

Member

vigsterkr commented Jul 25, 2018

@vinx13 yep cool! this way the c++ code became much much cleaner!

* @return new viewable instance
*/
template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@karlnapf

karlnapf Jul 25, 2018

Member

I still vote for const T*

* @return new viewable instance
*/
template <class T>
Some<T> view(Some<T> viewable, const SGVector<index_t>& subset)

This comment has been minimized.

@karlnapf
@vigsterkr

rebase with latest develop and use override where necessary. this way the CIs wont fail :)

@@ -96,6 +96,8 @@ class CBinaryLabels : public CDenseLabels
*/
virtual ELabelType get_label_type() const;
virtual CLabels* duplicate() const;

This comment has been minimized.

@vigsterkr

vigsterkr Jul 26, 2018

Member

use override

*
* @return labels object
*/
virtual CLabels* duplicate() const

This comment has been minimized.

@vigsterkr

vigsterkr Jul 26, 2018

Member

use override

@vinx13 vinx13 force-pushed the vinx13:feature/features_view branch from c6ca257 to 9c75248 Jul 26, 2018

@vinx13 vinx13 force-pushed the vinx13:feature/features_view branch from 9c75248 to ec57067 Jul 26, 2018

@vigsterkr vigsterkr merged commit 2ae291e into shogun-toolbox:develop Jul 26, 2018

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment