Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] histogram distribution #323

Closed
fkiraly opened this issue May 14, 2024 · 5 comments · Fixed by #382
Closed

[ENH] histogram distribution #323

fkiraly opened this issue May 14, 2024 · 5 comments · Fixed by #382
Assignees
Labels
feature request New feature or request module:probability&simulation probability distributions and simulators

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented May 14, 2024

For distribution estimation, an important distribution type is the histogram distribution.

This will be parameterized by bins and bin densities.

@fkiraly fkiraly added module:probability&simulation probability distributions and simulators feature request New feature or request labels May 14, 2024
@ShreeshaM07
Copy link
Contributor

Just needed some clarification, the bins would be the upper bound for the interval it will be an array or if its a single value then it will be divided equally and the bin_density will have an array of same size with values for all the bins ranging from [0,1] interval as we want it to be a probability distribution. Or do we want to store the frequency in the bin_density and then divide by total frequency to bring it down to [0,1].

Do I understand the requirement correctly?

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 16, 2024

Do I understand the requirement correctly?

The exact specification has not been set out, so it would be great if you could make a few suggestions!

The suggestion on specifying by number of bins, or bin boundaries both make sense - in fact these are both so common that popular interfaces like pandas.cut allow for both, via a polymorphic interface: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

Concretely, it would be nice if you could spell out what the parameters would/should be.

Some complexity comes in through taking into account that we are looking at array distributions, so the bin boundaries might differ between marginal distributions.

@ShreeshaM07
Copy link
Contributor

ShreeshaM07 commented Jun 11, 2024

@fkiraly There were some merge conflicts in the init.py in distributions and distributions.rst that were troubling a little so I decided to make a new PR #382 instead, so there are no conflicts.

All the discussion regarding the Histogram distribution has been discussed in the PR #335.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 22, 2024

Reopening to keep a reminder of "summarizing outcomes" (vectorization) from #382 in a new issue.

@fkiraly fkiraly reopened this Jun 22, 2024
@ShreeshaM07
Copy link
Contributor

Reopening to keep a reminder of "summarizing outcomes" (vectorization) from #382 in a new issue.

I have opened #405 summarizing the outcomes of vectorization.

@fkiraly fkiraly closed this as completed Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request module:probability&simulation probability distributions and simulators
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants