-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-7736 use btree for coxph computation instead the direct approach #4895
Conversation
8c1517a
to
e91a22b
Compare
|
||
return byStrata.stream() | ||
.map( | ||
indexes -> statsForAStrata(startVec, stopVec, eventVec, estimateVec, indexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this using the Vec API at all? If you know you have only a very limited number of indices - can it be collected to non-distributed structures and avoid all overhead with using the Vec API to access a single value?
@@ -166,64 +174,343 @@ static Stats concordance(final Vec startVec, final Vec stopVec, final Vec eventV | |||
private static Stats concordanceStats(Vec.Reader startVec, Vec.Reader stopVec, Vec.Reader eventVec, List<Vec.Reader> strataVecs, Vec.Reader estimateVec, long length) { | |||
assert 0 <= length && length <= Integer.MAX_VALUE; | |||
|
|||
Stream<Integer> allIndexes = IntStream.range(0, (int) length).boxed(); | |||
Collection<List<Integer>> byStrata = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be rewritten to MRTask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish
7d5fc81
to
df18ee1
Compare
df18ee1
to
489d632
Compare
} | ||
|
||
private static Frame prepareFrameForConcordanceComputation(Vec eventVec, List<Vec> strataVecs, Vec estimateVec, Vec durations) { | ||
final Frame fr = new Frame(Key.make()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to have a key?
for (int i = 0; i < strataVecs.size(); i++) { | ||
fr.add("strata_" + i, strataVecs.get(i)); | ||
} | ||
DKV.put(fr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably no need to install into DKV
.outputFrame(Key.make("durations_" + fr1._key), new String[]{"durations"}, null)); | ||
Scope.track(frame); | ||
|
||
fr1.delete(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why false?
I think there is a lot of unnecessary DKV#puts, you really only need to put a Frame in DKV when it is a user-facing result. Most (if not all) your frames have a temporary nature - for regular MRTasks they don't need to have a key, don't need to be in DKV. Working with Scope and explicit deletes is suspicious and unexpected. |
@michalkurka you were right, that DKV and Scope ops were not necessary. Pushed a cleanup commit |
} | ||
|
||
private static Vec durations(Vec startVec, Vec stopVec) { | ||
Vec vec = null == startVec ? stopVec.makeZero() : startVec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the temporary frame disposed of anywhere.
On another note - if startVec is not given then durations = startVec. You can skip the MRTask altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
return stats; | ||
withoutNas.replace(1, withoutNas.vec("event").toNumericVec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removeNAs is also converting event column to numeric?
I would expect such transformations to be done in top-level (public) methods instead of utility functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was removed
@@ -20,8 +20,6 @@ def coxph_smoke(): | |||
|
|||
metrics = coxph.model_performance(heart) | |||
assert 0.581 > metrics.concordance() and 0.580 < metrics.concordance() | |||
assert 3696 == metrics.concordant() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased, the test was redesigned in an other PR to use lifelines instead of hardcoded results.
@@ -24,7 +24,7 @@ test.CoxPH.concordance <- function() { | |||
rConcordance <- unname(summary(rModel)$concordance)[1] | |||
hexConcordance <- h2o.performance(hexModel, data=tstdataHex)@metrics$concordance | |||
|
|||
expect_equal(rConcordance, hexConcordance) | |||
expect_equal(rConcordance, hexConcordance, tolerance = 1e-3, scale = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the tolerance quite high? This is not complex math
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used in all the coxph tests so far I have seen.
838ba28
to
138e6f9
Compare
3b49fcc
to
68c3a50
Compare
dbbe094
to
bf55498
Compare
5a5267a
to
c4552a4
Compare
a018ee3
to
5a01a46
Compare
5a01a46
to
5685e66
Compare
…ch (h2oai#4895) * PUBDEV-7736 use btree for coxph computation instead the direct approach (cherry picked from commit 8612f10)
No description provided.