-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge StreamStats, add to JuliaStats #19
Comments
I'm going to try to move one or two of these over... I'll start with HyperLogLog and let you know how it goes. |
Just pushed HyperLogLog on the tom branch... I think it's working, but I could use some additional documentation/tests from @johnmyleswhite (as there wasn't any documentation to begin with, I had to start out by learning what the hell HyperLogLog was :) |
@johnmyleswhite: do you have any resources you can recommend to read up more on the bootstrapping and adagrad methods you implemented? I want to understand a little more thoroughly before moving them over |
Online bootstrapping: http://statweb.stanford.edu/~owen/reports/multi.pdf |
Thanks John. I'm trying to think through whether it's feasible to use AdaGrad as a Weighting, instead of reimplementing each individual algorithm. Certainly it would be a nice bonus if this could be applied to other stats, but I'm not sure how easy it is to generalize.
|
I'd be cautious using AdaGrad as a weighting since it could only be generalized if you made the idea of a gradient part of the OnlineStats type system. |
To quote Dumb and Dumber: "...more like 1 in a million."... "So you're Anyways... I don't doubt it would require some careful thought to get On Tue, Jun 30, 2015 at 12:06 PM, John Myles White <notifications@github.com
|
Yeah, I could find some time tomorrow afternoon to chat. If we could make it general it would be useful. We'd want to think about how a general solution relates to other SGD tools that people have tried building. |
Perfect... tomorrow afternoon works. In the meantime, do you know offhand On Tue, Jun 30, 2015 at 12:20 PM, John Myles White <notifications@github.com
|
How about chatting at 11 AM Pacific time? |
I'd look at https://github.com/lindahua/SGDOptim.jl for some ideas. |
I'd love to be a part of that conversation. I'll be available at 11am Pacific. |
Sounds good. I'm in.
|
@johnmyleswhite: I'm confusing myself regarding the Logistic Loss function you have in StreamStats, and hoping you can help. I expect the standard loss functions look like this:
And the gradient would be df/dwi (i.e. the derivative of the loss function with respect to the ith weight). For the least squares, you have the derivative as (-2 * xi * err) which makes sense. In your ApproxLogit update function, you have essentially the same derivative (but you dropped the 2 and swapped signs with your update to beta... effectively making it the same as the least squares case). Is that correct? Is there some simple math that I'm being dense about? I think the derivative is something more complicated than just (-xi * err) for the logistic case, involving exp at a minimum. I didn't work out the full equation, so I guess it's possible something cancels out. |
Also I pushed up a first draft for Adagrad to the tom branch... comments welcome. |
The code I wrote maximizes logistic log likelihood rather than minimize a specific cost function. It should be closely related to minimizing log loss. |
Thanks for these comments. In your experience(s), is it better/easier to deal with maximizing likelihood vs minimizing cost? The formulas I referenced came from section 1.1 of http://www.jmlr.org/papers/volume11/xiao10a/xiao10a.pdf, which is focused on minimizing, and I'd like to attempt implementing the Adagrad/RDA hybrid that was referenced in section 3 of http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf (I believe also minimizing). Also John: now I understand why you have "beta -= ..." in the OLS version and "beta += ..." in the logistic... I was a little confused there. Thanks! |
I just went through the math. They're equivalent. The gradient ends up being the same in either case. It just depends what we want to enforce on users: {-1, 1} or {0, 1}.
|
Sweet... you're one step ahead of me. Thanks. |
Adagrad stuff looks cool 👍. It works! I think you switched
I'm wondering if some penalties will need their own |
Agreed on the link/invlink... I fixed it. Regarding the thresholding... is that specific to L1 or is it common for other regularizers? When does the thresholding happen? In the update to beta, or the update to an intermediate calc? |
Soft thresholding shows up in a few different implementations of the LASSO that I've seen. I'm not sure where else it pops up (probably also elastic net). It would be in the update to beta, but I think I spoke too soon. After looking at the papers a bit more, it doesn't look like it fits into the general Adagrad scheme, and that's why the second paper you referenced does the Adagrad/RDA hybrid. I need to think about it more. |
Ok that lines up with what I'm thinking too. I'll probably reorg a little when I build out RDA. With regards to Adagrad... the 2 tests I have for OLS with and without L2 reg are both passing, but the logistic isn't. The logistic with NoReg is close to correct... I'm not sure how close I should expect it to be. The logistic with L2 reg is just wrong. Another set of eyes would be awesome. |
Logistic with L2 penalty may be correct, just very sensitive to choice of λ. I think it's because of rather large coefficients used to generate data. A beta of 10 would be huge in logistic regression with standardized data. A single standard deviation change in the associated variable would swing the probability essentially from 0 to 1.
|
Thanks that was helpful... I also realized after re-running the tests that I didn't have my
Note that the first beta should be [1,2,3,...,10] and the second beta should be [1.5,1.5,3,...,10] |
@johnmyleswhite: Adagrad should be fully ported now. I have tests for all 4 versions implemented by StreamStats, with tests that compare the results and code speed. Example usage:
@joshday: are you going to take a look at the Bootstrapping code? |
👍 . Yes, I should get a chance to look at Bootstrapping later this week. |
@johnmyleswhite Online bootstrap stuff is ported over. Since OnlineStats' type BernoulliBootstrap{S <: OnlineStat} <: Bootstrap
replicates::Vector{S} # replicates of base stat
cached_state::Vector{Float64} # cache of replicate states
f::Function # function to generate state. Ex: mean, var, std
n::Int # number of observations
cache_is_dirty::Bool
end With this setup, there's a little more flexibility in that you can use the same object with different functions: v = Variance()
o_mean = BernoulliBootstrap(v, mean, 1000)
o_std = BernoulliBootstrap(v, std, 1000) |
See StreamStats.jl issue #24.
To prepare for adding OnlineStats into JuliaStats, we should add the following functionality from StreamStats:
The text was updated successfully, but these errors were encountered: