Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features and labels view #4352

Merged
merged 14 commits into from Jul 26, 2018

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Jul 3, 2018

Continue #3970 , but features and labels view return raw pointer now instead of Some because that will cause covariant type problem, and Some is not available in SWIG

@@ -265,6 +265,24 @@ void CFeatures::unset_property(EFeatureProperty p)
properties &= (properties | p) ^ p;
}

CFeatures* CFeatures::view(const SGVector<index_t>& subset)
{
auto feats_view = this->duplicate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer a copy ctor here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a virtual function that calls copy ctor of subclasses

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I see now, totally justified!

return feats_view;
}

CFeatures* CFeatures::view(const std::vector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnt belong into CFeatures imo, this is conversion code between std::vector and SGVector and therefore should sit in SGVector
features->view(SGVector<float64_t>(my_std_vector))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even done implicitly (if possible)

@@ -156,6 +156,11 @@ CBinaryLabels::CBinaryLabels(const CDenseLabels& dense) : CDenseLabels(dense)
ensure_valid();
}

CLabels* CBinaryLabels::duplicate() const
{
return new CBinaryLabels(*this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have this method why not just use the copy ctor?

{
auto labels_view = this->duplicate();

auto sg_subset = SGVector<index_t>(subset.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

// apply subset
if (m_subset_frac!=1.0)
apply_subset(feats,interf);
const auto result = get_subset(feats, interf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all these changes? shouldnt we focus on the view first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is taken from michele's pr, we need small refactor on this to deploy views

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,
I technically would prefer first adding the view and then have a second Pr to deploy it.
But I think others see that differently, so ok leave it in :)

{
REQUIRE(m_labels,"training labels not set!\n")
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls avoid such whitespace changes in PRs.
They should be in a sep PR. It just pollutes the diff (i.e. hard to review, takes longer)

@@ -273,8 +271,14 @@ CRegressionLabels* CStochasticGBMachine::compute_pseudo_residuals(CRegressionLab
return new CRegressionLabels(residuals);
}

void CStochasticGBMachine::apply_subset(CDenseFeatures<float64_t>* f, CLabels* interf)
std::tuple<Some<CDenseFeatures<float64_t>>, Some<CRegressionLabels>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the reasonaing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since view create a new instance, using smart pointers prevent ref/unref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt this create the same problem with Some?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no, this function is only used internally, so we know the static type of return value based on types of args

@@ -519,7 +516,7 @@ TEST(CARTree, form_t1_test)
lab[3]=1;
lab[4]=0;

CDenseFeatures<float64_t>* feats=new CDenseFeatures<float64_t>(data);
auto feats = some<CDenseFeatures<float64_t>>(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I would prefer those changes to happen in a separate minimal PR

@@ -272,17 +272,6 @@ CFeatures* CFeatures::view(const SGVector<index_t>& subset)
return feats_view;
}

CFeatures* CFeatures::view(const std::vector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain ?
We need to use the base class interface in things like xvalidation ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry nevermind I didnt see the arugment was the std

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead: yes ! :)

CFeatures* CFeatures::view(const SGVector<index_t>& subset)
{
auto feats_view = this->duplicate();
feats_view->add_subset(subset.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder, do we want to clone the subset?
I think actually we shouldnt
And also I think that add_subset should accept a const vector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe you have a different opinion?

Copy link
Member Author

@vinx13 vinx13 Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, we don't need clone here.
add_subset should accept a const vector, and we need to clone the vector in ctor of CSubset

ASSERT_EQ(labels_subset->get_num_labels(), subset.vlen);
for (auto i : range(subset.vlen))
{
EXPECT_EQ(labels_subset->get_int_label(i), labels_true[subset[i]]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can actually systematically test all access methods of labels and features somehow?
I would also assert that the view's data pointer is the same as the original one (i.e. no copy happened)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a way to test different features/labels types, and different access methods now.
and we can't check view's data pointer because that's a private member, calling get_labels will create a copy if subset is present

@vigsterkr vigsterkr force-pushed the feature/transformers branch 2 times, most recently from 77a9f85 to 8711ea6 Compare July 10, 2018 15:12
@vinx13 vinx13 force-pushed the feature/features_view branch 3 times, most recently from d1a175a to ffb3fcc Compare July 12, 2018 06:20
@vinx13 vinx13 changed the base branch from feature/transformers to develop July 12, 2018 16:38
@vinx13 vinx13 force-pushed the feature/features_view branch 4 times, most recently from 47a55bb to 83884f1 Compare July 15, 2018 06:40
static_assert(
std::is_base_of<CFeatures, T>::value ||
std::is_base_of<CLabels, T>::value,
"Only CFeatures and CLabels are viewable.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of error messages that contain things that might change in the future, i.e. a new class is made viewable.
I would just state: Class is not viewable. The compiler error will provide the T and also the static assert which will tell the caller what is viewable and what is not. Or?

{

template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make viewable const?

{

template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?
How are we going to do this without multiple inheritance?
Maybe a mixin approach would be better, see IterativeMachine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf i'm not clear, could you explain the multiple inheritance problem?
currently we expect T to be either features or labels, but we can also use viewable mixin in features and labels

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that I had a mistake in my thinking. This is just a global templated function so all good, nevermind!
I like this idea of having the method with the static assert!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k... note that needs extra love in case of SWIG interfaces... although it's a good question whether we actually wanna expose view mechanism for swig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no.

REQUIRE(m_labels,"training labels not set!\n")
SGVector<float64_t> labels=(dynamic_cast<CDenseLabels*>(m_labels))->get_labels();
SGVector<float64_t> labels =
(dynamic_cast<CDenseLabels*>(labs))->get_labels();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there's already a change here we could just port using as<CDenseLabels>

@@ -1205,27 +1204,27 @@ void CCARTree::prune_by_cross_validation(CDenseFeatures<float64_t>* data, int32_
}

SGVector<int32_t> subset(train_indices.data(),train_indices.size(),false);
data->add_subset(subset);
m_labels->add_subset(subset);
auto dense_labels = m_labels->as<CDenseLabels>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as agreed no need for this, drop it...

{

template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf you mean about the duplciate method or what exactly do you mean by multiple inheritance?

}

template <class T>
T* view(Some<T> viewable, const SGVector<index_t>& subset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf @vigsterkr maybe we can just return Some<T> here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! If you pass Some you should get Some

@vigsterkr
Copy link
Member

@vinx13 yep cool! this way the c++ code became much much cleaner!

* @return new viewable instance
*/
template <class T>
T* view(T* viewable, const SGVector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still vote for const T*

* @return new viewable instance
*/
template <class T>
Some<T> view(Some<T> viewable, const SGVector<index_t>& subset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase with latest develop and use override where necessary. this way the CIs wont fail :)

@@ -96,6 +96,8 @@ class CBinaryLabels : public CDenseLabels
*/
virtual ELabelType get_label_type() const;

virtual CLabels* duplicate() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use override

*
* @return labels object
*/
virtual CLabels* duplicate() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use override

@vigsterkr vigsterkr merged commit 2ae291e into shogun-toolbox:develop Jul 26, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants