Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-6494: training_frame requirements for StackedEnsemble in blending mode #3910

Merged
merged 7 commits into from
Oct 16, 2019

Conversation

sebhrusen
Copy link
Contributor

@sebhrusen sebhrusen commented Sep 23, 2019

https://0xdata.atlassian.net/browse/PUBDEV-6494
https://0xdata.atlassian.net/browse/PUBDEV-6612 (minor change, related to blending mode as well)

In blending mode:

  • training_frame is not required for the training of the SE model.
  • base models are allowed to have been trained using different training_frames.

@sebhrusen
Copy link
Contributor Author

@ledell @michalkurka @deil87 anyone to review this change? PR is currently in sleeping mode. Thx

Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me from code perspective

Copy link
Contributor

@ledell ledell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


for (Key<Model> k : _parms._base_models) {
aModel = DKV.getGet(k);
if (null == aModel) {
Log.warn("Failed to find base model; skipping: " + k);
Log.warn("Failed to find base model; skipping: "+k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o tiny question, is it a new convention that we should remove spaces in Java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but it's not an uncommon practice to remove spaces around + operator just for string concatenations (in any language that supports it) as it improves readability:

  • "+ and +" appears as placeholder delimiters without interrupting the read (we ignore the + quickly when reading):
    - "hi, my name is "+name+" and I live in "+city.
    - "hi, my name is " + name + " and I live in " + city.
  • you really see the spaces in the sentence and are less likely to forget them:
    - "hi, my name is "+name+"and I live in"+city.
    - "hi, my name is " + name + "and I live in" + city.

for all other use-cases, I use spaces, but I think that for string concat, this was a bad convention in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o i think it is subjective which way is more readable :) I don't mind any of the formatting but it is better to stick to one across the project/team. I just remember someone was suggesting me during reviews to add those spaces in Java and remove ones in python :) Never mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to compare Python and Java in terms of style.
Python is very opinionated as it aims at optimal readability (because we read 100x times more code than we write), there's even a PEP for it: https://www.python.org/dev/peps/pep-0008/
Java commonly-used code style is just a convention following old SUN internal code style, but it varies so much that people implemented tools to apply different standards by company for example...
And I noticed that you often tend to use the word "subjective" when you don't have a strong opinion on sth. Are you so sure it is? Do you think that Python is only subjectively more readable than Perl for example? Can't there be any objective truth in styling? due to our physiological/psychological/neurological limitations? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o I do have strong opinion on many things. And my strong opinion was that this is subjective :) I believe that every developer/human being is unique. You have provided examples of code as a proof( to support objectiveness) but I strongly believe that this example does not prove objectiveness of your point of view.
Styling is about order and order(any reasonable order actually) makes it easier to interpret(read) anything. Because we get used to something and it becomes easier for us. Btw i have not found convention in this PEP regarding string concatenation. Maybe because it is arguable?

Code styles exist but they are different even though they all seek for optimal readability. It is another proof that not every code style rule could be considered as universally correct one.

Can't there be any objective truth in styling? due to our physiological/psychological/neurological limitations?

exactly because of our limitations, which varies from person to person like how good your eyesight is, it is hard to find common truth in such questions.

Copy link
Contributor Author

@sebhrusen sebhrusen Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit of a flawed reasoning:

You have provided examples of code as a proof

No, examples can never serve as a proof, they just serve as examples. Only counter-examples can prove the falseness of a statement. That's the eternal problem of induction.

I believe that every developer/human being is unique
our limitations, which varies from person to person like how good your eyesight is, it is hard to find common truth in such questions

Unique doesn't mean that we're all different. Otherwise, you can throw away all researches in biology, psychology, social sciences, and what's even the point in building models if we don't expect them to apply because everyone is unique.
Yes, people are unique at some point, otherwise how boring, but they're also very similar, otherwise how impossible to communicate, live in cities with millions of them...
And when I say that adding extra spaces make it more difficult to distinguish the spaces inside the strings from the ones outside, I think you're just arguing for the sake of arguing.
Objective doesn't mean absolute eternal truth (that would be a straw man): for example red and blue objects can be said to be objectively different, although there's nothing like green and red colours, they're just an effect of perception due to our "physiological/psychological/neurological" system. However, yes, you can find people that won't make the difference, but even this fact, we can explain it objectively, because the people who don't see the difference are likely to have defective cones in the retina.
But if you reject the premise red and blue objects can be said to be objectively different, you just reject the work objectivity altogether, saying it's meaningless, in which case, subjectivity becomes also meaningless because well... everything is subjective.

and...

Btw i have not found convention in this PEP regarding string concatenation. Maybe because it is arguable?

C'mon, no one uses string concatenation in Python! the Python way is string formatting or join.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, yes, you can find people that won't make the difference, but even this fact, we can explain it objectively, because the people who don't see the difference are likely to have defective cones in the retina.

It can also be just too dark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o

No, examples can never serve as a proof, they just serve as examples. Only counter-examples can prove the falseness of a statement. That's the eternal problem of induction.

It means it was counter-example, right? Because why on earth we need just examples if they just examples. Anyway my point was that whatever your example was it is subjective. People read / interpret / parse differently, they have different environments (dark mode, white mode, small screen vs big screen, good eyesight vs poor eyesight) etc.

And when I say that adding extra spaces make it more difficult to distinguish the spaces inside the strings from the ones outside, I think you're just arguing for the sake of arguing.

I've said "Never mind" - it means that I did not want to argue from the beginning.

Yes, people are unique at some point, otherwise how boring, but they're also very similar,

And who decides where this "some point" is? In conversation like this it is just about difference in our personal thresholds.

However, yes, you can find people that won't make the difference, but even this fact, we can explain it objectively, because the people who don't see the difference are likely to have defective cones in the retina.

I does not mean that people who have good cones in their retinas( can distinguish the difference) can not ignore the difference as it is something that really make no difference for them. Why don't we argue then about what font we should all use and what size of font? We can try to find objectiveness there as well. Again, I will repeat myself, I think that the most important thing is conventions within particular communities. Every language has its own coding style even though every language seek for optimal readability. We get used to some, stick to it and it makes it convenient for the team to read the code.

C'mon, no one uses string concatenation in Python! the Python way is string formatting or join.

Java also have string formatting. Why do we use string concatenation then?

PS. I started to follow your idea about naming of the long test methods in Java with underscores even though it is against Java camel case style convention. I subjectively agreed to this. But if the rest of the team would say me that let's stick to a convention I would follow the majority as it is more important. It is just in this particular case I don't feel like that without spaces it is more convenient to read. And I asked the initial question whether we have new convention or not, that is it.

if (!aModel.isSupervised()) {
throw new H2OIllegalArgumentException("Base model is not supervised: " + aModel._key.toString());
throw new H2OIllegalArgumentException("Base model is not supervised: "+aModel._key.toString());
}

if (beenHere) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o I know this is not you code, but I would rename beenHere as it is too generic.

 if(!isFirstBaseModel) {}

or

 if(!isReferenceBaseModel) {}

would be clearer and we wont need some comments to explain beenHere. Ignore this if you don't feel like touching legacy code :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will rename

if (aModel._parms._fold_column != null &&
! aModel._parms._fold_column.equals(basemodel_fold_column))
throw new H2OIllegalArgumentException("Base models are inconsistent: they use different fold_columns.");
if (aModel._parms._fold_assignment != basemodel_fold_assignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o what if we have following situation:

aModel._parms._fold_assignment == Random && basemodel_fold_assignment == AUTO

shouldn't we exclude it as well? or it is impossible to get into this state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L376:
if (basemodel_fold_assignment == AUTO) basemodel_fold_assignment = Random;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seb-h2o I see now. It was not on the surface let's say :)

Copy link
Contributor

@deil87 deil87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants