Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-6376: StackedEnsemble prediction fails when applied to a test dataset without response column #3382

Merged
merged 4 commits into from
Mar 28, 2019

Conversation

sebhrusen
Copy link
Contributor

https://0xdata.atlassian.net/browse/PUBDEV-6376

ensured that GLM respect computeMetrics parameter when it doesn't have direct access to original frame (like in SE scoring)

… direct access to original frame (like in SE scoring)
@sebhrusen
Copy link
Contributor Author

@michalkurka please have a look: tiny fix but very important! thx

@@ -1452,7 +1452,7 @@ protected Frame predictScoreImpl(Frame fr, Frame adaptFrm, String destination_ke
GLMScore gs = makeScoringTask(adaptFrm,true,j);// doAll(names.length,Vec.T_NUM,adaptFrm);
assert gs._dinfo._valid:"_valid flag should be set on data info when doing scoring";
gs.doAll(names.length,Vec.T_NUM,gs._dinfo._adaptedFrame);
if (gs._computeMetrics)
if (computeMetrics && gs._computeMetrics)
Copy link
Contributor

@michalkurka michalkurka Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a better fix would be to propagate the flag to makeScoringTask - because GLMScore will actually run _mb.perRow to prepare the metadata for the metrics but this won't be used anywhere

to me that means we should let GLMScore know we don't need the metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkurka: yes, wanted to minimize changes but this would be more robust.
Will propagate then.

@michalkurka
Copy link
Contributor

I think the GLM change makes sense. However, why does GLM thinks it does have a response when SE doesn't?

@sebhrusen
Copy link
Contributor Author

for documentation purpose, answering @michalkurka question as we had a separate discussion about GLM behaviour:
GLM was not considering the adapted response column as "Bad", if it were in this state then it would have properly skipped metrics computation.
However, this is still a separate issue that is fixed by https://0xdata.atlassian.net/browse/PUBDEV-6379.
cf. also #3383

@michalkurka
Copy link
Contributor

michalkurka commented Mar 26, 2019

underlying cause fixed in #3383

we shouldn't need this urgently anymore

fix makes sense, I would just clean-up the naming computeMetrics vs computeMetrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants