Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can fit return something useful? #52

Closed
tbreloff opened this issue Feb 5, 2016 · 7 comments
Closed

Can fit return something useful? #52

tbreloff opened this issue Feb 5, 2016 · 7 comments

Comments

@tbreloff
Copy link
Collaborator

tbreloff commented Feb 5, 2016

If I want to create a trace vector of fitted values, I have to do something like:

Float64[(fit!(o, yi); value(o)) for yi in y]

But if fit! returned the OnlineStat object then I can do something like:

Float64[fit!(o,yi).value for yi in y]
# or
map(yi -> value(fit!(o,yi)), y)

I don't think it should return the value itself, because many times you'll want to do something else with the object.

There is precedence for this sort of thing... for example push! will return the object you are pushing to.

@joshday
Copy link
Owner

joshday commented Feb 5, 2016

I like the concept, but unless I did something wrong here there's a big performance hit for:

function fit!(o::OnlineStat, y::Union{AVec, AMat})
    for i in 1:size(y, 1)
        fit!(o, row(y, i))
    end
end

I tried adding the following:

function fit2!(o::OnlineStat, y::Union{AVec, AMat})
    for i in 1:size(y, 1)
        fit2!(o, row(y, i))
    end
end
fit2!(o::OnlineStat, y::Real) = (fit!(o, y); o)
julia> y = randn(10_000_000);

julia> o1 = Variance(); o2 = Variance()'
▌ Variance
  ▶     value: 0.0
  ▶      nobs: 0


julia> @time fit!(o1, y)
  0.059439 seconds (4 allocations: 160 bytes)

julia> @time OnlineStats.fit2!(o2, y)
  0.131593 seconds (10.00 M allocations: 152.588 MB, 7.01% gc time)

Before the rewrite, we had update_get!...how about fit_get!?

@tbreloff
Copy link
Collaborator Author

tbreloff commented Feb 5, 2016

Something seems weird here. I don't think you can use the same fit! method for Mean and Means, for example. Is a vector multiple observations (Mean), or a single observation (Means)? I like that you drastically reduced the definitions of fit! scattered throughout, but you lost this important distinction.

Also, it's really bad to take row for each item in a column vector. I think these definitions could use a rewrite, and might need to consider adding subtypes UnivariateOnlineStat and MultivariateOnlineStat to get this to work.

Finally we need to revisit the idea of row-based vs column-based storage. We should consider a type RowMatrix and flip-flop the row/column calls, so that column-based access is the default.

Thoughts?

@joshday
Copy link
Owner

joshday commented Feb 5, 2016

The fit! methods for single observations are always more specific than the fit! methods for multiple observations, so using the same method for both Mean and Means does work. But you're right that calling row on a vector is slow. Here's the change:

function fit!(o::OnlineStat, y::AMat)
    for i in 1:size(y, 1)
        fit!(o, row(y, i))
    end
end
function fit!(o::OnlineStat, y::AVec)
    for yi in y
        fit!(o, yi)
    end
end

@tbreloff
Copy link
Collaborator Author

tbreloff commented Feb 5, 2016

The fit! methods for single observations are always more specific than the fit! methods for multiple observations

I think I understand now. There's a fit!(o::OnlineStat, y::AVec) defined for each multivariate stat, and that takes precedence over the more abstract one. Got it.

Here's the change

This should be better.

Back to the original point... I'd be really surprised if returning the OnlineStat from the fit! method makes any noticeable difference in performance. It should be a no-op if you're not using it in the resulting call.

@joshday
Copy link
Owner

joshday commented Feb 5, 2016

Strange, I just tried it again and saw no time/allocation difference. In that case, I like the idea of returning the object. I'll run through everything and add it.

This was referenced Feb 5, 2016
@joshday
Copy link
Owner

joshday commented Feb 6, 2016

fit! now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.

@joshday joshday closed this as completed Feb 6, 2016
@tbreloff
Copy link
Collaborator Author

tbreloff commented Feb 6, 2016

Cool thanks!

On Feb 6, 2016, at 10:02 AM, Josh Day notifications@github.com wrote:

fit! now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants