Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed range #37

Closed
theogf opened this issue Dec 7, 2021 · 2 comments
Closed

Fixed range #37

theogf opened this issue Dec 7, 2021 · 2 comments

Comments

@theogf
Copy link

theogf commented Dec 7, 2021

It is made clear from the docs that once a range has been fixed, there is no possibility to change it later.
Is it because of the design of the algorithm or because the functionality is simply lacking?
Could an approximation be made to go from one range to the next (given some assumptions) ?

@joshday
Copy link
Owner

joshday commented Dec 7, 2021

Is it because of the design of the algorithm or because the functionality is simply lacking?

A little of both. If you change the range (which defines bin locations), the counts associated with the bins no longer make sense (unless you take special care in aligning the new edges with the old ones). This is what I've done with OnlineStats.ExpandingHist. I've also moved some of the ash functionality to OnlineStats so that you can do:

o = Ash(ExpandingHist(400))

Here's the docs that explains the details:

  ExpandingHist(nbins)

  An adaptive histogram where the bin edges keep doubling in size in order to
  contain every observation. nbins must be an even number. Bins are
  left-closed and the rightmost bin is closed, e.g.

    •  [a, b), [b, c), [c, d]

  Example
  ≡≡≡≡≡≡≡≡≡

  o = fit!(ExpandingHist(200), randn(10^6))

  using Plots
  plot(o)

  Details
  ≡≡≡≡≡≡≡≡≡

  How ExpandingHist works is best understood through example. Suppose we start
  with a histogram of edges/counts as follows:

  |1|2|5|3|2|

    •  Now we observe a data point that is not contained in the bin
       edges:

  |1|2|5|3|2|       *

    •  In order to contain the point, the range of the edges doubles in
       the direction of the new data point and adjacent bins merge their
       counts:

  |1|2|5|3|2|       *
   \ / \ / \ /      
                 
  | 3 | 8 | 2 | 0 | 1 |

    •  Note that multiple iterations of bin-doubling may occur until the
       new point is contained by the bin edges.

@joshday joshday closed this as completed Dec 7, 2021
@theogf
Copy link
Author

theogf commented Dec 7, 2021

Thanks, I did not follow these updates! That looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants