Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow counts argument in fit! #284

Closed
adknudson opened this issue Mar 25, 2024 · 5 comments · Fixed by joshday/OnlineStatsBase.jl#40
Closed

Allow counts argument in fit! #284

adknudson opened this issue Mar 25, 2024 · 5 comments · Fixed by joshday/OnlineStatsBase.jl#40

Comments

@adknudson
Copy link

It would be really convenient to be allowed to call fit!(o, y, k) as a shorthand way to fit observation y k times. Many stats would support this with little change, like the Hist stat for example. You would change lines 94-101

function _fit!(o::Hist, y)
    i = binindex(o.edges, y, o.left, o.closed)
    if 1  i < length(o.edges)
        o.counts[i] += 1
    else
        o.out[1 + (i > 0)] += 1
    end
end

into

function _fit!(o::Hist, y, k=1)
    i = binindex(o.edges, y, o.left, o.closed)
    if 1  i < length(o.edges)
        o.counts[i] += k
    else
        o.out[1 + (i > 0)] += k
    end
end

I'm not sure how much extra work this would be, but I would be happy to draft a pull request trying to implement this if there is interest.

@adknudson
Copy link
Author

I see that CountMap has this implemented with the fit!(o, y=>n) notation. Is it reasonable to expand this to all stats?

@adknudson adknudson changed the title Allow counts arguments in fit! Allow counts argument in fit! Mar 25, 2024
@joshday
Copy link
Owner

joshday commented Mar 26, 2024

It's a reasonable request for fit!(::OnlineStat{T}, x::Pair{T, Int}) methods. I'd be okay with a fallback method and specific types can implement their own methods as needed.

@adknudson
Copy link
Author

I'm looking through OnlineStatsBase to get an understanding of what I would need to change to get this working. It looks like for OnlineStat{T}, T is the type of observation that the stat can work on. So for CountMap, it is defined as

 CountMap{T, A <: AbstractDict{T, Int}} <: OnlineStat{Union{T, Pair{<:T,<:Integer}}}

which means it can accept a T, or a Pair{T, Int}. Is it really necessary for Pair{T,Int} to be part of the type? I can go two ways with a PR:

  1. Add Pair{T,Int} to all stats and implement _fit!(o, xy::Pair) for each stat
  2. Define fit!(o::OnlineStat{T}, xy::Pair{<:T, <Integer}) and the fallback method _fit!(o::OnlineStat{T}, xy::Pair{<:T, <Integer}) which can be specialized for certain stats (e.g. Hist)

I'm inclined towards option 2 as then there is no ambiguity of if a stat can fit a pair-type observation. Generally the new code would be

fit!(o::OnlineStat{T}, xy::Pair{<:T, <:Integer}) where {T} = (_fit!(o, xy); return o)

function _fit!(o::OnlineStat{T}, xy::Pair{<:T, <:Integer}) where {T}
    x, y = xy
    for _ in 1:y
        _fit!(o, x)
    end
    return o
end

I think CountMap could either keep or drop the Pair{<:T,<:Integer} in the union type. I think it would be cleaner to drop it, but I'm not sure if that counts as a breaking change.

@joshday
Copy link
Owner

joshday commented Mar 26, 2024

Is it really necessary for Pair{T,Int} to be part of the type?

So now I'm remembering why I haven't done this...

fit!(o::OnlineStat{T}, x::S) will iterate through x and fit! each element, so adding pair methods is a breaking change, e.g. fit!(Mean(), 1 => 5) would work differently. Previously this would result in value=3 and nobs=2. Adding this new method would result in value=1 and nobs=5.

I think we need a new function to indicate counts, fitn! or similar.

In retrospect, building fit! around iteration was a mistake. OnlineStats 2.0 should have a push!/append!-like approach to make things more explicit.

@adknudson
Copy link
Author

fit!(Mean(), 1 => 5) feels more like a bug than anything. Instead of introducing fitn!, why not dispatch on fit!(o, x, k)? That shouldn't be breaking and is easy to fall back on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants