Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Distributions.jl Dependency #44

Closed
ParadaCarleton opened this issue Jun 24, 2023 · 5 comments
Closed

Add Distributions.jl Dependency #44

ParadaCarleton opened this issue Jun 24, 2023 · 5 comments

Comments

@ParadaCarleton
Copy link
Contributor

Distributions.jl is a more widely-used interface for working with distributions with more users. Making ASH histograms a subtype of Distributions.jl's density would substantially improve how much of the ecosystem can work with ASH.

Prior to 1.9, I think avoiding the Distributions.jl dependency was a good idea, but now with native code caching, the extra compile time is negligible. In addition, adding Distributions.jl would let the code be simplified substantially in some parts by taking advantage of existing methods from Distributions.jl.

Distributions.jl would also allow for a wider variety of kernels (any Distribution), as well as better density estimates in the tails. (A generalized Pareto Distribution can be fitted to the most extreme observations and used to predict outside the range of observations, rather than taking the ASH estimate directly.)

@joshday
Copy link
Owner

joshday commented Jun 24, 2023

Distributions (and indirectly, its 50-something dependencies!) is much too heavy to add here.

You can already use any kernel function you want. I'm not sure what you're describing with generalized Pareto.

@joshday joshday closed this as completed Jun 24, 2023
@ParadaCarleton
Copy link
Contributor Author

Distributions (and indirectly, its 50-something dependencies!) is much too heavy to add here.

Right, so Distributions.jl+indirect dependencies is quite heavy, but the main thing is that almost all of those indirect dependencies are dependencies for ASH.jl alreay. The remaining 10-ish dependencies are lightweight and widely used across the statistics ecosystem, enough so that if someone loads any other packages besides ASH, they're almost guaranteed to pick up most of the remaining 10.
image

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Jun 24, 2023

Probably the bulk of the marginal weight comes from Plots.jl and UnicodePlots.jl, which add substantially more overhead than Distributions.jl and aren't part of most users' statistical analysis toolbox (lots of people are using Makie, Gadfly, VegaLite, etc. instead). See compile times--

image

@joshday
Copy link
Owner

joshday commented Jun 24, 2023

  1. I don't see a benefit.
  2. Plots.jl isn't a dependency of either package.

@ParadaCarleton
Copy link
Contributor Author

Plots.jl isn't a dependency of either package.

Sorry, name made me think UnicodePlots.jl depended on Plots.jl.

I don't see a benefit.

Being able to overload logpdf, cdf, etc. and make ASH <: Distribution so it can automatically have all the Distribution functions defined for it (e.g. all summary statistics, including weirder ones like divergences).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants