ENH: Lightning seems to be slow when `loss=log` #14

MechCoder · 2014-09-18T09:33:32Z

@mblondel I'm not sure if this is meant to be, but I ran a quick few benchmarks.

# Load News20 dataset from scikit-learn. 
bunch = fetch_20newsgroups_vectorized(subset="all")
X = bunch.data
y = bunch.target

# To remove the effect of parallelization
y[y != 1] = -1
time_logistic = []
time_lightning = []
Cs = np.logspace(-4, 4, 10)

for C in Cs:
    print C
    t = time()
    clf = LogisticRegression(penalty='l1', tol=0.0001, fit_intercept=False, C=C)
    t = time()
    clf.fit(X, y)
    time_logistic.append(time() - t)
    print time_logistic
    cl = CDClassifier(loss='log', tol=0.0001, max_iter=100, max_steps=0, C=C, penalty='l1')
    t = time()
    cl.fit(X, y)
    time_lightning.append(time() - t)
    print time_lightning

I get times like these for a grid of 10 Cs from np.logspace(-4, 4, 10)

time_lightning

[0.20100116729736328, 0.6052899360656738,  0.7211019992828369,
 2.470484972000122, 4.043258190155029, 7.791965007781982,
 10.92172908782959, 13.969007968902588, 12.534989833831787,
  5.275091886520386]

time_logistic

[0.08612680435180664, 0.22542500495910645, 0.5105628967285156,
 0.5970029830932617, 0.642221212387085, 0.8863811492919922,
 1.241279125213623, 1.1004469394683838, 0.9302711486816406,
 0.8940119743347168]

The text was updated successfully, but these errors were encountered:

mblondel · 2014-09-18T09:52:40Z

What is your point?

liblinear doesn't implement the same algorithm as lightning...

MechCoder · 2014-09-18T10:03:10Z

Liblinear implements the CD + glmnet right in the same paper? I just wanted to clarify if the new GLMnet is inherently much faster than CDN, or if you think there are places in the lightning code where we could speed up.

mblondel · 2014-09-18T10:14:46Z

Indeed liblinear uses CD + glmnet. Don't they compare with CD in the paper?

One bottleneck is the computation of logs and exponentials. Using some kind of approximation could result in a big speed up.

lightning uses dataset and loss function abstractions. I am not sure what is the overhead of virtual method calls.

mblondel · 2014-09-18T10:18:32Z

Something else to be careful about is the stopping criterion. If liblinear doesn't use the same as lightning, the meaning of tol will differ. The only objective way to compare is to plot the objective value over time.

MechCoder changed the title ~~ENH: Lightning seems to be slow when loss=logistic~~ ENH: Lightning seems to be slow when loss=log Sep 18, 2014

mblondel closed this as completed Sep 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Lightning seems to be slow when `loss=log` #14

ENH: Lightning seems to be slow when `loss=log` #14

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

mblondel commented Sep 18, 2014

ENH: Lightning seems to be slow when loss=log #14

ENH: Lightning seems to be slow when loss=log #14

Comments

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

mblondel commented Sep 18, 2014

ENH: Lightning seems to be slow when `loss=log` #14

ENH: Lightning seems to be slow when `loss=log` #14