Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-7736 use btree for coxph computation instead the direct approach #4895

Merged
merged 26 commits into from
Nov 9, 2020

Conversation

satai
Copy link
Contributor

@satai satai commented Sep 4, 2020

No description provided.


return byStrata.stream()
.map(
indexes -> statsForAStrata(startVec, stopVec, eventVec, estimateVec, indexes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this using the Vec API at all? If you know you have only a very limited number of indices - can it be collected to non-distributed structures and avoid all overhead with using the Vec API to access a single value?

@@ -166,64 +174,343 @@ static Stats concordance(final Vec startVec, final Vec stopVec, final Vec eventV
private static Stats concordanceStats(Vec.Reader startVec, Vec.Reader stopVec, Vec.Reader eventVec, List<Vec.Reader> strataVecs, Vec.Reader estimateVec, long length) {
assert 0 <= length && length <= Integer.MAX_VALUE;

Stream<Integer> allIndexes = IntStream.range(0, (int) length).boxed();
Collection<List<Integer>> byStrata =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be rewritten to MRTask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish

@satai satai force-pushed the ondra/concordance_performance branch 2 times, most recently from 7d5fc81 to df18ee1 Compare September 22, 2020 09:32
@satai satai requested a review from Pscheidl September 22, 2020 09:36
@satai satai force-pushed the ondra/concordance_performance branch from df18ee1 to 489d632 Compare September 22, 2020 09:41
@satai satai marked this pull request as ready for review September 22, 2020 11:52
}

private static Frame prepareFrameForConcordanceComputation(Vec eventVec, List<Vec> strataVecs, Vec estimateVec, Vec durations) {
final Frame fr = new Frame(Key.make());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to have a key?

for (int i = 0; i < strataVecs.size(); i++) {
fr.add("strata_" + i, strataVecs.get(i));
}
DKV.put(fr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably no need to install into DKV

.outputFrame(Key.make("durations_" + fr1._key), new String[]{"durations"}, null));
Scope.track(frame);

fr1.delete(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why false?

@michalkurka
Copy link
Contributor

I think there is a lot of unnecessary DKV#puts, you really only need to put a Frame in DKV when it is a user-facing result. Most (if not all) your frames have a temporary nature - for regular MRTasks they don't need to have a key, don't need to be in DKV.

Working with Scope and explicit deletes is suspicious and unexpected.

@satai
Copy link
Contributor Author

satai commented Sep 23, 2020

@michalkurka you were right, that DKV and Scope ops were not necessary. Pushed a cleanup commit

}

private static Vec durations(Vec startVec, Vec stopVec) {
Vec vec = null == startVec ? stopVec.makeZero() : startVec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the temporary frame disposed of anywhere.

On another note - if startVec is not given then durations = startVec. You can skip the MRTask altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


return stats;
withoutNas.replace(1, withoutNas.vec("event").toNumericVec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeNAs is also converting event column to numeric?

I would expect such transformations to be done in top-level (public) methods instead of utility functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was removed

@@ -20,8 +20,6 @@ def coxph_smoke():

metrics = coxph.model_performance(heart)
assert 0.581 > metrics.concordance() and 0.580 < metrics.concordance()
assert 3696 == metrics.concordant()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, the test was redesigned in an other PR to use lifelines instead of hardcoded results.

@@ -24,7 +24,7 @@ test.CoxPH.concordance <- function() {
rConcordance <- unname(summary(rModel)$concordance)[1]
hexConcordance <- h2o.performance(hexModel, data=tstdataHex)@metrics$concordance

expect_equal(rConcordance, hexConcordance)
expect_equal(rConcordance, hexConcordance, tolerance = 1e-3, scale = 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the tolerance quite high? This is not complex math

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used in all the coxph tests so far I have seen.

@satai satai force-pushed the ondra/concordance_performance branch 4 times, most recently from 838ba28 to 138e6f9 Compare October 15, 2020 12:09
@satai satai force-pushed the ondra/concordance_performance branch from 3b49fcc to 68c3a50 Compare October 20, 2020 10:44
@satai satai force-pushed the ondra/concordance_performance branch 2 times, most recently from dbbe094 to bf55498 Compare October 29, 2020 11:44
@satai satai force-pushed the ondra/concordance_performance branch from 5a5267a to c4552a4 Compare October 29, 2020 12:11
@satai satai force-pushed the ondra/concordance_performance branch 2 times, most recently from a018ee3 to 5a01a46 Compare October 30, 2020 18:52
@satai satai force-pushed the ondra/concordance_performance branch from 5a01a46 to 5685e66 Compare November 2, 2020 12:53
@satai satai merged commit 8612f10 into master Nov 9, 2020
@satai satai deleted the ondra/concordance_performance branch December 17, 2020 11:38
flaviusburca pushed a commit to mware-solutions/h2o-3 that referenced this pull request Apr 21, 2021
…ch (h2oai#4895)

* PUBDEV-7736 use btree for coxph computation instead the direct approach

(cherry picked from commit 8612f10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants