Merge StreamStats, add to JuliaStats #19

joshday · 2015-06-29T19:35:31Z

To prepare for adding OnlineStats into JuliaStats, we should add the following functionality from StreamStats:

tbreloff · 2015-06-30T01:20:19Z

I'm going to try to move one or two of these over... I'll start with HyperLogLog and let you know how it goes.

tbreloff · 2015-06-30T02:35:30Z

Just pushed HyperLogLog on the tom branch... I think it's working, but I could use some additional documentation/tests from @johnmyleswhite (as there wasn't any documentation to begin with, I had to start out by learning what the hell HyperLogLog was :)

tbreloff · 2015-06-30T03:04:05Z

@johnmyleswhite: do you have any resources you can recommend to read up more on the bootstrapping and adagrad methods you implemented? I want to understand a little more thoroughly before moving them over

johnmyleswhite · 2015-06-30T04:51:43Z

Online bootstrapping: http://statweb.stanford.edu/~owen/reports/multi.pdf

AdaGrad: http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf

tbreloff · 2015-06-30T12:59:06Z

Thanks John. I'm trying to think through whether it's feasible to use AdaGrad as a Weighting, instead of reimplementing each individual algorithm. Certainly it would be a nice bonus if this could be applied to other stats, but I'm not sure how easy it is to generalize.

On Jun 30, 2015, at 12:51 AM, John Myles White notifications@github.com wrote:

Online bootstrapping: http://statweb.stanford.edu/~owen/reports/multi.pdf

AdaGrad: http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf

—
Reply to this email directly or view it on GitHub.

johnmyleswhite · 2015-06-30T16:06:32Z

I'd be cautious using AdaGrad as a weighting since it could only be generalized if you made the idea of a gradient part of the OnlineStats type system.

tbreloff · 2015-06-30T16:16:42Z

To quote Dumb and Dumber: "...more like 1 in a million."... "So you're
saying there's a chance!" (yes, in this case I'm "dumber")

Anyways... I don't doubt it would require some careful thought to get
right. If it can be generalizable (and we decide it's worthwhile to adjust
the type system), then do you see value in attempting it? If you have 15
minutes sometime to discuss a little further with skype or hangouts, let me
know. @joshday should join as well.

On Tue, Jun 30, 2015 at 12:06 PM, John Myles White <notifications@github.com

wrote:

I'd be cautious using AdaGrad as a weighting since it could only be
generalized if you made the idea of a gradient part of the OnlineStats type
system.

—
Reply to this email directly or view it on GitHub
#19 (comment)
.

johnmyleswhite · 2015-06-30T16:20:20Z

Yeah, I could find some time tomorrow afternoon to chat.

If we could make it general it would be useful. We'd want to think about how a general solution relates to other SGD tools that people have tried building.

tbreloff · 2015-06-30T16:23:37Z

Perfect... tomorrow afternoon works. In the meantime, do you know offhand
the other SGD tools/packages out there I should familiarize myself with?

On Tue, Jun 30, 2015 at 12:20 PM, John Myles White <notifications@github.com

wrote:

Yeah, I could find some time tomorrow afternoon to chat.

If we could make it general it would be useful. We'd want to think about
how a general solution relates to other SGD tools that people have tried
building.

—
Reply to this email directly or view it on GitHub
#19 (comment)
.

johnmyleswhite · 2015-07-01T02:11:44Z

How about chatting at 11 AM Pacific time?

johnmyleswhite · 2015-07-01T02:12:37Z

I'd look at https://github.com/lindahua/SGDOptim.jl for some ideas.

joshday · 2015-07-01T02:32:14Z

I'd love to be a part of that conversation. I'll be available at 11am Pacific.

tbreloff · 2015-07-01T02:38:30Z

Sounds good. I'm in.

On Jun 30, 2015, at 10:32 PM, Josh Day notifications@github.com wrote:

I'd love to be a part of that conversation. I'll be available at 11am Pacific.

—
Reply to this email directly or view it on GitHub.

tbreloff · 2015-07-02T22:10:32Z

@johnmyleswhite: I'm confusing myself regarding the Logistic Loss function you have in StreamStats, and hoping you can help. I expect the standard loss functions look like this:

Least-squares: f(w,(x, y)) = (y−w'x)^2
Hinge loss: f(w,(x, y)) = max{0,1−y(w'x)}
Logistic regression: f(w,(x, y)) = log(1+exp(−yw'x))

And the gradient would be df/dwi (i.e. the derivative of the loss function with respect to the ith weight). For the least squares, you have the derivative as (-2 * xi * err) which makes sense.

In your ApproxLogit update function, you have essentially the same derivative (but you dropped the 2 and swapped signs with your update to beta... effectively making it the same as the least squares case). Is that correct? Is there some simple math that I'm being dense about?

I think the derivative is something more complicated than just (-xi * err) for the logistic case, involving exp at a minimum. I didn't work out the full equation, so I guess it's possible something cancels out.
�

tbreloff · 2015-07-02T22:12:52Z

Also I pushed up a first draft for Adagrad to the tom branch... comments welcome.

johnmyleswhite · 2015-07-02T23:01:48Z

The code I wrote maximizes logistic log likelihood rather than minimize a specific cost function. It should be closely related to minimizing log loss.

joshday · 2015-07-02T23:19:58Z

One thing to beware is how the response is coded. Tom, I believe the loss function you wrote above has y coded as (-1, 1), whereas StreamStats has it as (0, 1). Here's the log-likelihood version John is talking about, apart from a typo (I think - missing minus sign in gradient):

tbreloff · 2015-07-03T00:16:45Z

Thanks for these comments. In your experience(s), is it better/easier to deal with maximizing likelihood vs minimizing cost? The formulas I referenced came from section 1.1 of http://www.jmlr.org/papers/volume11/xiao10a/xiao10a.pdf, which is focused on minimizing, and I'd like to attempt implementing the Adagrad/RDA hybrid that was referenced in section 3 of http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf (I believe also minimizing).

Also John: now I understand why you have "beta -= ..." in the OLS version and "beta += ..." in the logistic... I was a little confused there. Thanks!

joshday · 2015-07-03T01:28:05Z

I just went through the math. They're equivalent. The gradient ends up being the same in either case. It just depends what we want to enforce on users: {-1, 1} or {0, 1}.

{0, 1} will be more familiar to statisticians
{-1, 1} will be more familiar to machine learning folks

tbreloff · 2015-07-03T01:37:19Z

Sweet... you're one step ahead of me. Thanks.

joshday · 2015-07-03T15:50:11Z

Adagrad stuff looks cool 👍. It works!

I think you switched link and invlink. The link function relates the expected value to the linear predictor:

link function: g(E(y)) = Xb
inverse link: h(Xb) = E(y)

I'm wondering if some penalties will need their own update! method. Updating with the L1 penalty should have that extra soft-thresholding step (instead of setting coefficients to 0, right now it pushes them negative).

tbreloff · 2015-07-03T16:30:04Z

Agreed on the link/invlink... I fixed it.

Regarding the thresholding... is that specific to L1 or is it common for other regularizers? When does the thresholding happen? In the update to beta, or the update to an intermediate calc?

joshday · 2015-07-03T19:20:29Z

Soft thresholding shows up in a few different implementations of the LASSO that I've seen. I'm not sure where else it pops up (probably also elastic net).

It would be in the update to beta, but I think I spoke too soon. After looking at the papers a bit more, it doesn't look like it fits into the general Adagrad scheme, and that's why the second paper you referenced does the Adagrad/RDA hybrid. I need to think about it more.

tbreloff · 2015-07-03T19:47:36Z

Ok that lines up with what I'm thinking too. I'll probably reorg a little when I build out RDA.

With regards to Adagrad... the 2 tests I have for OLS with and without L2 reg are both passing, but the logistic isn't. The logistic with NoReg is close to correct... I'm not sure how close I should expect it to be. The logistic with L2 reg is just wrong. Another set of eyes would be awesome.

joshday · 2015-07-03T20:45:01Z

Logistic with L2 penalty may be correct, just very sensitive to choice of λ. I think it's because of rather large coefficients used to generate data. A beta of 10 would be huge in logistic regression with standardized data. A single standard deviation change in the associated variable would swing the probability essentially from 0 to 1.

julia> 1 / (1 + exp(-5))
0.9933071490757153

julia> 1 / (1 + exp(5))
0.0066928509242848554

julia>  o = Adagrad(x, y; link=LogisticLink(), loss=LogisticLoss())

julia>  o2 = Adagrad(x, y; link=LogisticLink(), loss=LogisticLoss(), reg = L2Reg(.0001))

julia> coef(o) - coef(o2)
10-element Array{Float64,1}:
 0.0509798
 0.103378
 0.153715
 0.205846
 0.257928
 0.310924
 0.362245
 0.416003
 0.466223
 0.517923

tbreloff · 2015-07-03T20:53:19Z

Thanks that was helpful... I also realized after re-running the tests that I didn't have my y mapped to {0,1}. Now both logistic tests are similar, but the whole beta vector is scaled down:

OnlineStats.Adagrad{β=[0.7217354581720843,1.4348008374414074,2.146884897634276,2.8766786770306774,3.5859435927149845,4.3038226266476745,5.059988427489082,5.757408227532339,6.491600366175806,7.219898540409098] nobs=1000000}: β=[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]
OnlineStats.Adagrad{β=[1.0584635582815747,1.0584635582815747,2.14490825297699,2.8505958359494468,3.5604315976752616,4.262512514113438,4.987637119365928,5.708458455240219,6.420254076403213,7.142296659800167] nobs=1000000}: β=[1.5,1.5,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]

Note that the first beta should be [1,2,3,...,10] and the second beta should be [1.5,1.5,3,...,10]

tbreloff · 2015-07-14T13:05:01Z

@johnmyleswhite: Adagrad should be fully ported now. I have tests for all 4 versions implemented by StreamStats, with tests that compare the results and code speed. Example usage:

β = coef(Adagrad(x,y; link=LogisticLink(), loss=LogisticLoss(), reg=L2Reg(λ))

@joshday: are you going to take a look at the Bootstrapping code?

joshday · 2015-07-14T16:43:58Z

👍 . Yes, I should get a chance to look at Bootstrapping later this week.

joshday · 2015-07-17T13:19:41Z

@johnmyleswhite Online bootstrap stuff is ported over. Since OnlineStats' state function operates differently than in StreamStats, I added a field for a function to specify how to generate the cached_state

type BernoulliBootstrap{S <: OnlineStat} <: Bootstrap
    replicates::Vector{S}            # replicates of base stat
    cached_state::Vector{Float64}    # cache of replicate states
    f::Function                      # function to generate state. Ex: mean, var, std
    n::Int                           # number of observations
    cache_is_dirty::Bool
end

With this setup, there's a little more flexibility in that you can use the same object with different functions:

v = Variance()
o_mean = BernoulliBootstrap(v, mean, 1000)
o_std = BernoulliBootstrap(v, std, 1000)

joshday added the enhancement label Jun 29, 2015

joshday mentioned this issue Jun 29, 2015

Merge StreamStats.jl into OnlineStats.jl? johnmyleswhite/StreamStats.jl#24

Open

tbreloff closed this as completed Jul 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge StreamStats, add to JuliaStats #19

Merge StreamStats, add to JuliaStats #19

joshday commented Jun 29, 2015

tbreloff commented Jun 30, 2015

tbreloff commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jul 1, 2015

johnmyleswhite commented Jul 1, 2015

joshday commented Jul 1, 2015

tbreloff commented Jul 1, 2015

tbreloff commented Jul 2, 2015

tbreloff commented Jul 2, 2015

johnmyleswhite commented Jul 2, 2015

joshday commented Jul 2, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

tbreloff commented Jul 14, 2015

joshday commented Jul 14, 2015

joshday commented Jul 17, 2015

Merge StreamStats, add to JuliaStats #19

Merge StreamStats, add to JuliaStats #19

Comments

joshday commented Jun 29, 2015

tbreloff commented Jun 30, 2015

tbreloff commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jun 30, 2015

tbreloff commented Jun 30, 2015

johnmyleswhite commented Jul 1, 2015

johnmyleswhite commented Jul 1, 2015

joshday commented Jul 1, 2015

tbreloff commented Jul 1, 2015

tbreloff commented Jul 2, 2015

tbreloff commented Jul 2, 2015

johnmyleswhite commented Jul 2, 2015

joshday commented Jul 2, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

joshday commented Jul 3, 2015

tbreloff commented Jul 3, 2015

tbreloff commented Jul 14, 2015

joshday commented Jul 14, 2015

joshday commented Jul 17, 2015