New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix randomized_svd transpose heuristic. #4478

Merged
merged 1 commit into from Oct 15, 2015

Conversation

Projects
None yet
5 participants
@amueller
Member

amueller commented Apr 1, 2015

Fixes #4455.
This fixes two things: the heuristic (< vs >) and a logic error (the "auto" also evaluates to True)
As this is a performance optimization, I don't know how to regression-test it.
I did something like this:

from sklearn.utils.extmath import randomized_svd
import numpy as np
X = np.dot(np.random.normal(size=(10000, 25)), np.random.normal(size=(25, 200)))
%timeit randomized_svd(X, n_components=10, transpose='auto')

@amueller amueller changed the title from Fix randomized_svd transpose heuristic. to [MRG] Fix randomized_svd transpose heuristic. Apr 1, 2015

@amueller

This comment has been minimized.

Member

amueller commented Apr 1, 2015

ping @ogrisel via cdeee0d ;)

@amueller

This comment has been minimized.

Member

amueller commented Apr 1, 2015

Whatsnew for a bugfix?

@landscape-bot

This comment has been minimized.

landscape-bot commented Apr 1, 2015

Code Health
Code quality remained the same when pulling f7fdfd8 on amueller:fix_randomized_svd_transpose into e2dfd23 on scikit-learn:master.

@amueller

This comment has been minimized.

Member

amueller commented Apr 1, 2015

Interesting failure... looks like "transpose" is broken? just me being dense.

@amueller

This comment has been minimized.

Member

amueller commented Apr 1, 2015

It looks like there are numeric issues with the heuristic being fixed. @ogrisel any input?

@landscape-bot

This comment has been minimized.

landscape-bot commented Apr 1, 2015

Code Health
Code quality remained the same when pulling 1db8f14 on amueller:fix_randomized_svd_transpose into e2dfd23 on scikit-learn:master.

1 similar comment
@landscape-bot

This comment has been minimized.

landscape-bot commented Apr 1, 2015

Code Health
Code quality remained the same when pulling 1db8f14 on amueller:fix_randomized_svd_transpose into e2dfd23 on scikit-learn:master.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Apr 6, 2015

I started to run some benchmarks and with a recent version of numpy (1.9.1) I cannot seem to see a CPU benefit time benefit to transpose the data:

$ python ~/tmp/bench_randomized_svd.py
svd(X_wide)
done in 41.169s
randomized_svd(X_wide, transpose=False)
done in 7.377s
randomized_svd(X_wide, transpose=True)
done in 5.012s
randomized_svd(X_wide_f, transpose=False)
done in 6.961s
randomized_svd(X_wide_f, transpose=True)
done in 8.045s
svd(X_tall)
done in 39.873s
randomized_svd(X_tall, transpose=False)
done in 7.404s
randomized_svd(X_tall, transpose=True)
done in 6.598s
randomized_svd(X_tall_f, transpose=False)
done in 6.926s
randomized_svd(X_tall_f, transpose=True)

Here is my script:

from sklearn.utils.extmath import randomized_svd
import numpy as np
from time import time
from numpy.testing import assert_array_almost_equal
from joblib import Memory


m = Memory(cachedir='.')
k = 20
rng = np.random.RandomState(42)


@m.cache()
def bench_svd(X):
    t0 = time()
    U, S, V = np.linalg.svd(X, full_matrices=False)
    return U, S, V, time() - t0


@m.cache()
def make_data(big, small):
    X_wide = rng.randn(small, big)
    X_tall = rng.randn(big, small)
    return X_wide, X_tall

X_wide, X_tall = make_data(1000000, 100)
X_wide_f = np.asfortranarray(X_wide)
X_tall_f = np.asfortranarray(X_tall)


print("svd(X_wide)")
U, S, V, duration = bench_svd(X_wide)
print("done in %0.3fs" % duration)


print("randomized_svd(X_wide, transpose=False)")
t0 = time()
U1, S1, V1 = randomized_svd(X_wide, k, transpose=False)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_wide, transpose=True)")
t0 = time()
U2, S2, V2 = randomized_svd(X_wide, k, transpose=True)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_wide_f, transpose=False)")
t0 = time()
U3, S3, V3 = randomized_svd(X_wide_f, k, transpose=False)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_wide_f, transpose=True)")
t0 = time()
U4, S4, V4 = randomized_svd(X_wide_f, k, transpose=True)
print("done in %0.3fs" % (time() - t0))


print("svd(X_tall)")
U, S, V, duration = bench_svd(X_tall)
print("done in %0.3fs" % duration)


print("randomized_svd(X_tall, transpose=False)")
t0 = time()
U1, S1, V1 = randomized_svd(X_tall, k, transpose=False)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_tall, transpose=True)")
t0 = time()
U2, S2, V2 = randomized_svd(X_tall, k, transpose=True)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_tall_f, transpose=False)")
t0 = time()
U3, S3, V3 = randomized_svd(X_tall_f, k, transpose=False)
print("done in %0.3fs" % (time() - t0))

print("randomized_svd(X_tall_f, transpose=True)")
t0 = time()
U4, S4, V4 = randomized_svd(X_tall_f, k, transpose=True)
print("done in %0.3fs" % (time() - t0))

There might be a memory usage benefit (I have not yet tried to profile it with memory_profiler) and the accuracy is likely not to be the same. We need to check those first.

@amueller

This comment has been minimized.

Member

amueller commented Apr 6, 2015

Well it is somewhat faster. I guess we would need to check the speed vs accuracy tradeoff...

@lesteve

This comment has been minimized.

Member

lesteve commented Jun 19, 2015

When I run @ogrisel snippet I do get a sizeable difference between the tall and wide matrix (the wide one is more than 50% slower than the tall one). Here is the output when I run it on my laptop (bog-standard Anaconda distribution):

svd(X_wide)
done in 30.090s
randomized_svd(X_wide, transpose=False)
done in 6.993s
randomized_svd(X_wide, transpose=True)
done in 4.256s
randomized_svd(X_wide_f, transpose=False)
done in 6.918s
randomized_svd(X_wide_f, transpose=True)
done in 4.120s
svd(X_tall)
done in 17.736s
randomized_svd(X_tall, transpose=False)
done in 4.144s
randomized_svd(X_tall, transpose=True)
done in 6.922s
randomized_svd(X_tall_f, transpose=False)
done in 4.232s
randomized_svd(X_tall_f, transpose=True)
done in 6.960s

Minor note the relative gain in speed tend to decrease when using n_iter.

About accuracy no major differences jump at me. For example if if I look at the differences of the eigenvalues:

In [54]: S_ref = S[:20]

In [55]: np.abs(S1/S_ref - 1)
Out[55]: 
array([ 0.00453109,  0.00450896,  0.00457142,  0.00461244,  0.00501684,
        0.00528286,  0.00522629,  0.00542917,  0.00543311,  0.00513347,
        0.00527189,  0.00557382,  0.00577985,  0.00616987,  0.00594913,
        0.00602258,  0.00591408,  0.00614868,  0.00610883,  0.00633806])

In [56]: np.abs(S2/S_ref - 1)
Out[56]: 
array([ 0.00420375,  0.00433335,  0.00518771,  0.00483352,  0.00493721,
        0.00520857,  0.00514801,  0.00542728,  0.0054518 ,  0.00521127,
        0.005237  ,  0.00564997,  0.00572783,  0.00605334,  0.00597059,
        0.00592777,  0.00611051,  0.0061444 ,  0.00618784,  0.00639205])

I quickly look at the memory usage too introducing profile.timestamp in the snippet above.

test_svd_memory_profiler

Happy to investigate more if needed.

@amueller

This comment has been minimized.

Member

amueller commented Jun 19, 2015

what is S1 vs S2 above?
I was a bit concerned as so many tests failed.

@lesteve

This comment has been minimized.

Member

lesteve commented Jun 22, 2015

what is S1 vs S2 above?

Sorry I should have said that: S1 and S2 are the singular values from @ogrisel's snippet:

U1, S1, V1 = randomized_svd(X_tall, k, transpose=False)

U2, S2, V2 = randomized_svd(X_tall, k, transpose=True)

This was just to show that whether you transpose or not, the precision doesn't seem to be affected that much.

@amueller

This comment has been minimized.

Member

amueller commented Jun 22, 2015

it's just a bit odd that so many tests failed then...

@lesteve

This comment has been minimized.

Member

lesteve commented Jul 12, 2015

I looked at the norm of the difference for U, S, V between transpose=False and transpose=True.

AFAICT the difference between using transpose=True or False are not that big.

Script:

import numpy as np

from scipy import linalg

import pandas as pd

from sklearn.utils.extmath import randomized_svd, svd_flip

import matplotlib.pyplot as plt

rng = np.random.RandomState(42)


def ref_svd(X):
    U, S, V = np.linalg.svd(X, full_matrices=False)
    U, V = svd_flip(U, V)
    return U, S, V


def make_data(big, small, n):
    return [rng.randn(10000, 100) for __ in range(n_experiments)]


def diff_norm_dict(X_tall, n_components, random_state=0):
    U, S, V = randomized_svd(X_tall, n_components, transpose=False,
                             random_state=random_state)
    U, V = svd_flip(U, V)
    Ut, St, Vt = randomized_svd(X_tall, n_components, transpose=True,
                                random_state=random_state)

    rU, rS, rV = ref_svd(X_tall)
    rU, rS, rV = rU[:, :n_components], rS[:n_components], rV[:n_components, :]

    return {'u_tall': linalg.norm(U - rU), 's_tall': linalg.norm(S - rS),
            'v_tall': linalg.norm(V - rV),
            'u_wide': linalg.norm(Ut - rU), 's_wide': linalg.norm(St - rS),
            'v_wide': linalg.norm(Vt - rV)}


def plot(data, name):
    plt.figure()
    diff = data[name + '_tall'] - data[name + '_wide']
    diff.hist(bins=50)
    print(name + ' mean:', diff.mean())
    print(name + ' std :', diff.std())
    plt.title(name + ' precision difference')
    plt.draw()


n_experiments = 1000
n_components = 20
n_rows = 10000
n_columns = 100

X_talls = make_data(n_rows, n_columns, n_experiments)
data = pd.DataFrame.from_dict([diff_norm_dict(X_tall, n_components)
                               for X_tall in X_talls])

for name in 'usv':
    plot(data, name)

The mean norm of the difference is quite small compared to the spread. Here is the output of the script:

u mean: 0.00236613498747
u std : 0.15611472696
s mean: 0.00658685997304
s std : 0.77389247538
v mean: 0.00257053325879
v std : 0.156317235007

Plots of the norm of the difference:
u_diff
s_diff
v_diff

Here are the histograms, in case that helps:
u_histo
s_histo
v_histo

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Sep 15, 2015

I have ran @ogrisel 's script on master:

svd(X_wide)
done in 40.697s
randomized_svd(X_wide, transpose=False)
done in 9.524s
randomized_svd(X_wide, transpose=True)
done in 6.595s
randomized_svd(X_wide_f, transpose=False)
done in 11.486s
randomized_svd(X_wide_f, transpose=True)
done in 5.929s

svd(X_tall)
done in 31.867s
randomized_svd(X_tall, transpose=False)
done in 4.938s
randomized_svd(X_tall, transpose=True)
done in 8.430s
randomized_svd(X_tall_f, transpose=False)
done in 4.988s
randomized_svd(X_tall_f, transpose=True)
done in 8.365s

and on top of #5141 --which is significantly slower as it is called with n_iter=2 by default:

svd(X_wide)
done in 39.367s
randomized_svd(X_wide, transpose=False)
done in 11.670s
randomized_svd(X_wide, transpose=True)
done in 8.332s
randomized_svd(X_wide_f, transpose=False)
done in 12.304s
randomized_svd(X_wide_f, transpose=True)
done in 8.107s

svd(X_tall)
done in 24.542s
randomized_svd(X_tall, transpose=False)
done in 9.202s
randomized_svd(X_tall, transpose=True)
done in 11.594s
randomized_svd(X_tall_f, transpose=False)
done in 8.506s
randomized_svd(X_tall_f, transpose=True)
done in 11.647s

Stretegy: Wide => transpose=True, Tall => transpose=False

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Sep 15, 2015

Which is the same as implemented in fbpca.

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Sep 15, 2015

Running @lesteve 's script:

u mean: -0.000233994363658
u std : 0.203724569688
s mean: 0.0107055413927
s std : 0.710789173723
v mean: -0.00056141368302
v std : 0.203734340488

figure_1
figure_2
figure_3

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Sep 15, 2015

I wonder: could randomization be a reason of those differences? I mean, although @lesteve 's script properly fixes the seed for pseudo-random sampling, but we are effectively generating different random matrices, since they differ in shape.

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Sep 23, 2015

To recap:

  • there is a bug in the auto mode and it is an easy fix
  • if we do so, we definitely gain in runtime
  • it appears there is not systematic error in using M or M.T, which is good, but there is error
  • many tests fail for that reason
  • I conjecture that this is (at least partially) due to the fact that we are effectively using different random matrices in randomized_range_finder and we cannot expect the result of M and M.T to be the same anyway
  • not sure if there any other source of error

What's next? @lesteve @ogrisel @amueller. I believe we should at least fix the auto mode bug for the next release.

@lesteve

This comment has been minimized.

Member

lesteve commented Sep 29, 2015

I agree with @giorgiop points.

Unless @amueller insists on finishing up this PR himself, maybe the next step would be to create another PR implementing the same easy fix and getting the Travis tests to pass by relaxing the tolerance of the comparisons.

@giorgiop giorgiop referenced this pull request Oct 13, 2015

Merged

[MRG+3] Collapsing PCA and RandomizedPCA #5299

8 of 8 tasks complete
@amueller

This comment has been minimized.

Member

amueller commented Oct 13, 2015

Ok, I'll change the tolerances. Sorry for the long delay. I agree this would be good to fix for 0.17.

@amueller amueller added this to the 0.17 milestone Oct 13, 2015

@amueller

This comment has been minimized.

Member

amueller commented Oct 13, 2015

This is independent of #5299, right?

@amueller

This comment has been minimized.

Member

amueller commented Oct 13, 2015

should be good now.

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Oct 14, 2015

This is independent of #5299, right?

I think the only change requested is in test_pca. We can still use RandomizedPCA in the test, but I would rather use PCA(svd_solver='randomized') to avoid warnings.

Is there any conflict during rebase? That's not in master yet... well, it's going to be an easy conflict resolution, independently by the merging order.

@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Oct 14, 2015

For the rest LGTM!

@amueller

This comment has been minimized.

Member

amueller commented Oct 14, 2015

Thanks. @ogrisel ?

@amueller

This comment has been minimized.

Member

amueller commented Oct 14, 2015

@@ -88,7 +88,7 @@ def test_whitening():
X_whitened2 = pca.transform(X_)
assert_array_almost_equal(X_whitened, X_whitened2)
assert_almost_equal(X_whitened.std(axis=0), np.ones(n_components))
assert_almost_equal(X_whitened.std(axis=0), np.ones(n_components), 6)

This comment has been minimized.

@lesteve

lesteve Oct 15, 2015

Member

maybe decimal=6. It wasn't immediately clear to me what this 6 was about.

@@ -123,8 +123,7 @@ def test_explained_variance():
np.var(X_pca, axis=0))
X_rpca = rpca.transform(X)
assert_array_almost_equal(rpca.explained_variance_,
np.var(X_rpca, axis=0))
assert_array_almost_equal(rpca.explained_variance_, np.var(X_rpca, axis=0), 1)

This comment has been minimized.

@lesteve

lesteve Oct 15, 2015

Member

Same comment as above about using decimal=1

@lesteve

This comment has been minimized.

Member

lesteve commented Oct 15, 2015

Other than the minor comments, LGTM.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 15, 2015

Ok merging, I will put the decimal directly in master.

ogrisel added a commit that referenced this pull request Oct 15, 2015

Merge pull request #4478 from amueller/fix_randomized_svd_transpose
[MRG] Fix randomized_svd transpose heuristic.

@ogrisel ogrisel merged commit 6fe94b5 into scikit-learn:master Oct 15, 2015

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@giorgiop

This comment has been minimized.

Contributor

giorgiop commented Oct 15, 2015

🍷

@amueller amueller deleted the amueller:fix_randomized_svd_transpose branch May 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment