Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazard specification of parametric baseline hazard #134

Merged
merged 7 commits into from
Jul 26, 2022

Conversation

adrian-lison
Copy link
Collaborator

This PR closes #56 by implementing direct computation of hazards for the parametric delay distributions. Only in the case of no reporting effects the probabilities are instead computed (same case distinction as before).

The model fits correctly, but speed is improved only marginally (on the order of 0.1% compared to the likelihood). This may be bigger for larger delays, and anyway I think this is the cooler way to specify the baseline 馃槅

FYI: Initially, I considered adding special cases/shortcuts for the exponential and log-logistic distribution. However, I found that no relevant improvement of speed would be achieved at the cost of making the code less elegant. In particular, the exponential distribution only has a constant hazard if we don't normalize it, so it's not worth implementing a case distinction for this.

// compute discretised hazard
haz[1] = cdf[1];
haz[2:(n-1)] = 1 - ccdf[2:(n-1)]./ccdf[1:(n-2)];
haz[n] = 1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like in the prob_to_hazard function, we are forced to set haz[n]=1 as otherwise numerical issues can arise (non-finite gradients).

Come to think about it, we are currently not able to actually implement max_strat=0, because we have to set the last hazard to 1 (making it max_strat=1 effectively). So the question is whether we should remove it or just leave it there. It doesn't do much harm at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point. It also introduces different behaviour depending on if you use hazards or probability directly. As we don't surface the stan functionality I would suggest we leave in but with a comment noting this issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm perhaps this is related to what I am seeing in #140 with infinite gradients. Not entirely clear if it can be as this is Inf on the logit scale.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be the same issue.

Regarding commenting the limitation of max_strat==0 do you want to add something to #140 now that this PR is merged? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do - sorry got excited!

Comment on lines -4 to +5
j += is_nan(abs(pmfs[k, i])) ? 1 : 0;
j += is_inf(abs(pmfs[k, i])) ? 1 : 0;
j += is_nan(abs(ref_lh[k, i])) ? 1 : 0;
j += is_inf(abs(ref_lh[k, i])) ? 1 : 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced pfms with ref_lh. Could be hazards if we want to keep them in transformed parameters. Btw, why do we use abs here (and cmdstan is asking to change it to fabs)?

Copy link
Collaborator

@seabbs seabbs Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2.30.0 it warns about fabs. In 2.29.0 it warns about abs. I think they flip-flopped on which to implement. I saw some issues to this effect but now can't find.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't know they were switching around. My question was more about why we need the absolute here at all, why can't we just plug in the potentially negative values into is_inf? I am probably lacking some knowledge on is_inf here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uummmmm I don't know what past Sam was thinking here. Maybe lack support for negatives? I have no idea. Will test.

@adrian-lison adrian-lison added the enhancement New feature or request label Jul 22, 2022
@adrian-lison adrian-lison linked an issue Jul 22, 2022 that may be closed by this pull request
@adrian-lison adrian-lison requested a review from seabbs July 22, 2022 20:51
@adrian-lison adrian-lison self-assigned this Jul 22, 2022
@adrian-lison
Copy link
Collaborator Author

Still need to sort out the tests apparently...

Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this so much regardless of speed up! Everything looks good aside from failing tests.

Shame about the direct specification but agree we should focus on keeping the code concise/elegant.

Test failures are a result of changes in what the stan model outputs it looks like. This isn't documented anywhere so my bad.

Essentially need to run tests locally and use testhat::snapshot_review() to check snapshots have changed in the way that is expected (i.e pmfs is gone).

There may be a second test checking the equivalence of the saved example and current output. That needs saved examples to be updated.

Will add making this info public to the improving contributor workflow issue

inst/stan/epinowcast.stan Show resolved Hide resolved
// compute discretised hazard
haz[1] = cdf[1];
haz[2:(n-1)] = 1 - ccdf[2:(n-1)]./ccdf[1:(n-2)];
haz[n] = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point. It also introduces different behaviour depending on if you use hazards or probability directly. As we don't surface the stan functionality I would suggest we leave in but with a comment noting this issue?

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #134 (5819695) into develop (630bc77) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop     #134   +/-   ##
========================================
  Coverage    85.12%   85.12%           
========================================
  Files           12       12           
  Lines         1190     1190           
========================================
  Hits          1013     1013           
  Misses         177      177           

Help us with your feedback. Take ten seconds to tell us how you rate us.

@seabbs
Copy link
Collaborator

seabbs commented Jul 26, 2022

I've merged in modular changes from #137 and updated the tests and examples. Seeing a little bit of numerical instability potentially?

@seabbs
Copy link
Collaborator

seabbs commented Jul 26, 2022

Tested this out and no instability. Going to merge but lets keep discussing points above.

@seabbs seabbs merged commit 8f43853 into develop Jul 26, 2022
@seabbs seabbs deleted the feature_ref_hazard branch July 26, 2022 20:37
@adrian-lison
Copy link
Collaborator Author

Thanks for going the last mile @seabbs, added some comments, discussion continues on #140 I guess :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high-priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hazard specification of parametric baseline hazard
2 participants