Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile(::WeightedAdaptiveHist, p) probably shouldn't use frequency weights #45

Open
tpapp opened this issue Jun 20, 2022 · 3 comments

Comments

@tpapp
Copy link
Contributor

tpapp commented Jun 20, 2022

julia> using WeightedOnlineStats, StatsBase

julia> x = randn(100);

julia> w = ProbabilityWeights(abs.(randn(length(x))));

julia> o = WeightedAdaptiveHist(4)
WeightedAdaptiveHist: ∑wᵢ=0 | value=(Float64[], Float64[])

julia> fit!(o, x, w)
WeightedAdaptiveHist: ∑wᵢ=80.3545413740818 | value=([-1.64447, -0.845089, 0.504903, 1.44472], [6.63211, 29.7925, 33.3148, 10.6151])

julia> median(o)
ERROR: ArgumentError: The values of the vector of `FrequencyWeights` must be numericallyequal to integers. Use `ProbabilityWeights` or `AnalyticWeights` instead.
Stacktrace:
 [1] quantile(v::Vector{Float64}, w::FrequencyWeights{Float64, Float64, Vector{Float64}}, p::Vector{Float64})
   @ StatsBase ~/.julia/packages/StatsBase/pJqvO/src/weights.jl:701
 [2] quantile
   @ ~/.julia/packages/StatsBase/pJqvO/src/weights.jl:759 [inlined]
 [3] quantile(o::WeightedAdaptiveHist{Float64, WeightedAdaptiveBins{Float64}}, p::Float64)
   @ WeightedOnlineStats ~/.julia/packages/WeightedOnlineStats/zwy6h/src/histogram.jl:256
 [4] median(o::WeightedAdaptiveHist{Float64, WeightedAdaptiveBins{Float64}})
   @ WeightedOnlineStats ~/.julia/packages/WeightedOnlineStats/zwy6h/src/histogram.jl:13
 [5] top-level scope
   @ REPL[125]:1

Maybe probability weights instead?

@gdkrmr
Copy link
Owner

gdkrmr commented Jun 20, 2022

This code was not written by me, a short test shows that quantile behaves differently for FrequencyWeights as for the other weight types and I would have to think about the correct weight type. I think @meggart knows this part of the code base better than I do.

julia> using StatsBase

julia> quantile([1, 2, 3], ProbabilityWeights([1, 2, 3]), 0:0.1:1)
11-element Vector{Float64}:
 1.0
 1.25
 1.5
 1.75
 2.0
 2.1666666666666665
 2.3333333333333335
 2.5
 2.6666666666666665
 2.8333333333333335
 3.0

julia> quantile([1, 2, 3], FrequencyWeights([1, 2, 3]), 0:0.1:1)
11-element Vector{Float64}:
 1.0
 1.5
 2.0
 2.0
 2.0
 2.5
 3.0
 3.0
 3.0
 3.0
 3.0

julia> quantile([1, 2, 3], AnalyticWeights([1, 2, 3]), 0:0.1:1)
11-element Vector{Float64}:
 1.0
 1.25
 1.5
 1.75
 2.0
 2.1666666666666665
 2.3333333333333335
 2.5
 2.6666666666666665
 2.8333333333333335
 3.0

@meggart
Copy link
Collaborator

meggart commented Jun 21, 2022

I am also no expert on the different weight types. In the current code, when doing the quantile computation we basically put in the vector of midpoints of each bin x and the accumulated weights for each bin y and then compute quantile(x,fweights(y)).

I don't know if we should change this to ProbabilityWeights or to AnalyticWeights, maybe this depends on the semantic meaning of the weights the user provides to the fit! function. My suggestion for a quick fix would be to add a keyword argument to the statistics functions here to let the user decide, so something like

function Statistics.quantile(o::WeightedHist, p = [0, .25, .5, .75, 1]; weighttype=ProbabilityWeights)

@tpapp
Copy link
Contributor Author

tpapp commented Jun 21, 2022

By elimination, I think it is probability weights, since

  1. frequency weights have to be integers, and the weights in this package aren't
  2. analytical weights have to be between 0 and 1, and weights in this package can be larger than 1

That said, the implementation of quantile does not seem to care, as long as they are not frequency weights. So weights (generic) is should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants