-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Stan dependency versions to package users #400
Comments
This is a good idea and would also help us in development I think. The only thing we want to be careful of is to not make this overly restrictive as stan is pretty stable so its likely our model won't actually depend on a specific version and we don't want people to install new versions for no reason. I'm not sure we want to supply a function to do this. We could encode the stan dependency into the DESCRIPTION but that might be overly proscriptive. We could also use the new #386 to add the version of cmdstan to That would be good as it would be automated but it would be bad as it would always be the least version of |
Prying a bit: how are you defining "reproducibility"? In the scientific sense (i.e. prioritizing ability to specify these precise conditions => get these precise results) or in the engineering sense (i.e. on a budget, reliably churning out parts within quality tolerance standards + minimal disruptions to production line with maintenance/upgrades)? In my opinion, these are antagonistic, so the answer can't be "both". So: what we might actually do to solve the problem depends on which perspective you're arguing to prioritize. |
This is a good question and not something I've thought much about before. Perhaps
Thinking out loud: seeing the engineering and scientific perspectives as orthogonal makes sense to me, but I'd be interested to hear what you see as actively antagonistic. I'm not sure I'm totally following? In practice, I see my goals here as closer to your engineering perspective. The priorities are to (1) deliver "correct"2 results (2) on time each week (3) within a reasonable budget of time and computational resources. Aspirationally, the results are both deterministic for a particular model x dataset x random seed combination and also in some sense "true" about the world. I think containerization helps with the former, but not at all with the latter. Does that help at all? Footnotes |
Re "antagonistic" - getting runs to produce the precise same results, alongside a fully interpretable audit trail, is going to cost (money, time, sanity, etc) that could be better spent on "useful" features (the phrase I would use instead of "correct", though also a bit vague) e.g. visualizations of results or open-source contributions to improve package diagnostics/messages/etc (hint-hint). Making results precisely match is useful for algorithmic diagnosis - that can be good for figuring out what's wrong with your machinery, but also...that's not the point of the machine, right? Aside: if results are sensitive to random number seed (or more broadly, generation), then they aren't results. Which is to say: the specific quantities whatever pipeline you build spits out aren't the result - "do X over Y" is the result. |
@zsusswein calling your attention to this discussion as well: https://community.epinowcast.org/t/epinowcast-command-line-interface/207/3, as that seems pertinent to what you're doing. If you could provide example use cases / user stories / etc, the potential outcome of that work would likely be more useful to you et al. (not that it solves the how-do-i-record-environment-state-so-i-can-later-diagnosis-problems, but it might place it in a different, more manageable box). |
This is a nice conversation but perhaps best suited for the community forum or a |
I think the simplest workable solution from my perspective is to expose the versions used in CI for I'm not sure what the right approach is from a package/CRAN perspective. It feels idiomatic to set an option or environment flag, but I'm not sure that's necessarily best practice?
I like to think it would, but I don't know and have no evidence. |
Just as a side note related to this, I think we currently need |
To always be aware of our minimum version, we could also create a tweak of #386 to always run with a specified minimum version (e.g. starting with 2.33 for now). When this one fails but the #386 github action with the most recent stan version does not, we know it's because our code includes some new features and we need to update the minimum version. |
@adrian-lison do you think we should aim to support a range of versions or should we just try and say which version for certain works? |
@zsusswein unfortunatly on one has had capacity to get to this. If this is still useful for your team would you have time to take a crack at it with some support. |
Sure! Happy to give it a go. A few questions:
|
It does seem like SystemRequirements would make sense though I think we want to make this a soft requirement (i.e know this works but other versions may also work) so based on that vending it ourselves based on the version used with a specific release makes sense.
I think this might be a bit heavy in the first instance. We could give an .onload message telling users a version or write it as a header in the stan files?
For now I am thinking of it less as supported and more built with. In general we will always be "supporting" the latest cmdstan release for any bug fixes etc.
yes it is and agree. |
Is your feature request related to a problem? Please describe.
I use
{epinowcast}
in a containerized pipeline where we try to maximize reproducibility. When building new versions of the image, we build{epinowcast}
from source from this repository pinned to a specific commit. But it can get a bit messy with layer caching and moving versions to get the latest features.We try to manually keep track of the associated versions of the Stan dependencies and install the correct versions into the container, but it would be a helpful feature to expose those versions programatically.
Describe the solution you'd like
I'm not sure what the most idiomatic solution is, but ideally we'd be able to call a simple function or variable to get the needed information. Something like
epinowcast::get_cmdstan_version()
.Describe alternatives you've considered
Even a simple message to the user that printed on package load would be a nice quality-of-life improvement.
The text was updated successfully, but these errors were encountered: