Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use faster csr indexing #21

Merged

Conversation

Projects
None yet
2 participants
@stegben
Copy link
Contributor

commented Dec 24, 2016

It's a simple change, but I find it has huge performance improvement (n*10 times). I use the following code to profile:

import cProfile

import numpy as np
np.random.seed(1234)
import scipy.sparse as sps

from kaggler.online_model import FTRL


DATA_NUM = 1e6


def main():
    print('create y...')
    y = np.random.randint(0, 1, DATA_NUM)
    print('create x...')
    row = np.random.randint(0, 300000, DATA_NUM)
    col = np.random.randint(0, 10, DATA_NUM)
    data = np.ones(DATA_NUM)
    x = sps.csr_matrix((data, (row, col)), dtype=np.int8)

    print('train...')
    profiler = cProfile.Profile(subcalls=True, builtins=True, timeunit=0.001,)
    clf = FTRL(interaction=True)
    profiler.enable()
    clf.fit(x, y)
    profiler.disable()
    profiler.print_stats()


if __name__ == '__main__':
    main()

And the profile result before:

         32400004 function calls (31800004 primitive calls) in 28.852 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   600000    0.207    0.000    0.681    0.000 <frozen importlib._bootstrap>:996(_handle_fromlist)
   900000    0.265    0.000    0.372    0.000 base.py:1081(isspmatrix)
  1200000    0.422    0.000    0.904    0.000 base.py:181(nnz)
   300000    0.202    0.000    0.202    0.000 base.py:70(__init__)
   300000    0.496    0.000    0.526    0.000 base.py:77(set_shape)
  1500001    0.253    0.000    0.253    0.000 base.py:99(get_shape)
   300000    1.071    0.000    2.149    0.000 compressed.py:1021(prune)
   300000    3.486    0.000    8.797    0.000 compressed.py:127(check_format)
   300000    1.626    0.000   15.347    0.000 compressed.py:24(__init__)
  1200000    0.482    0.000    0.482    0.000 compressed.py:99(getnnz)
   900000    0.166    0.000    0.166    0.000 csr.py:231(_swap)
   300000    0.697    0.000   25.280    0.000 csr.py:236(__getitem__)
   300000    0.577    0.000   20.043    0.000 csr.py:368(_get_row_slice)
   300000    1.270    0.000   19.240    0.000 csr.py:411(_get_submatrix)
   600000    0.509    0.000    1.006    0.000 csr.py:416(process_slice)
   600000    0.236    0.000    0.236    0.000 csr.py:439(check_bounds)
   300000    0.145    0.000    0.347    0.000 data.py:22(__init__)
      2/1    0.000    0.000    0.000    0.000 ftrl.pyx:125(fit)
599999/300000    1.800    0.000    0.268    0.000 ftrl.pyx:156(update_one)
599999/300000    1.772    0.000    0.457    0.000 ftrl.pyx:176(predict_one)
   600000    1.058    0.000    1.058    0.000 getlimits.py:245(__init__)
   600000    0.248    0.000    0.248    0.000 getlimits.py:270(max)
  2100000    0.929    0.000    1.622    0.000 numeric.py:414(asarray)
   600000    1.730    0.000    3.873    0.000 sputils.py:119(get_index_dtype)
   900000    1.181    0.000    1.883    0.000 sputils.py:188(isintlike)
   300000    0.936    0.000    0.936    0.000 sputils.py:200(isshape)
   900000    0.499    0.000    0.703    0.000 sputils.py:215(issequence)
   300000    1.193    0.000    3.067    0.000 sputils.py:265(_unpack_index)
   300000    0.118    0.000    0.149    0.000 sputils.py:293(_check_ellipsis)
   300000    0.642    0.000    1.205    0.000 sputils.py:331(_check_boolean)
   300000    0.274    0.000    0.741    0.000 sputils.py:91(to_native)
   600000    0.474    0.000    0.474    0.000 {built-in method builtins.hasattr}
  6000000    0.698    0.000    0.698    0.000 {built-in method builtins.isinstance}
  3000000    0.252    0.000    0.252    0.000 {built-in method builtins.len}
   300000    0.138    0.000    0.138    0.000 {built-in method builtins.max}
  3000000    1.155    0.000    1.155    0.000 {built-in method numpy.core.multiarray.array}
   300000    1.340    0.000    1.340    0.000 {built-in method scipy.sparse._sparsetools.get_csr_submatrix}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      2/1    0.000    0.000    0.000    0.000 {method 'fit' of 'kaggler.online_model.ftrl.FTRL' objects}
   300000    0.121    0.000    0.121    0.000 {method 'indices' of 'slice' objects}
   300000    0.185    0.000    0.185    0.000 {method 'newbyteorder' of 'numpy.dtype' objects}

and after:

         1200004 function calls (600004 primitive calls) in 2.284 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 base.py:99(get_shape)
      2/1    0.000    0.000    0.000    0.000 ftrl.pyx:125(fit)
599999/300000    1.081    0.000    0.392    0.000 ftrl.pyx:156(update_one)
599999/300000    1.203    0.000    0.473    0.000 ftrl.pyx:176(predict_one)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      2/1    0.000    0.000    0.000    0.000 {method 'fit' of 'kaggler.online_model.ftrl.FTRL' objects}

The result is the same even when interaction=True:
before

         32400004 function calls (31800004 primitive calls) in 32.136 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   600000    0.219    0.000    0.676    0.000 <frozen importlib._bootstrap>:996(_handle_fromlist)
   900000    0.264    0.000    0.366    0.000 base.py:1081(isspmatrix)
  1200000    0.421    0.000    0.918    0.000 base.py:181(nnz)
   300000    0.204    0.000    0.204    0.000 base.py:70(__init__)
   300000    0.504    0.000    0.533    0.000 base.py:77(set_shape)
  1500001    0.266    0.000    0.266    0.000 base.py:99(get_shape)
   300000    1.076    0.000    2.179    0.000 compressed.py:1021(prune)
   300000    3.610    0.000    8.970    0.000 compressed.py:127(check_format)
   300000    1.630    0.000   15.610    0.000 compressed.py:24(__init__)
  1200000    0.498    0.000    0.498    0.000 compressed.py:99(getnnz)
   900000    0.190    0.000    0.190    0.000 csr.py:231(_swap)
   300000    0.723    0.000   25.669    0.000 csr.py:236(__getitem__)
   300000    0.588    0.000   20.415    0.000 csr.py:368(_get_row_slice)
   300000    1.306    0.000   19.593    0.000 csr.py:411(_get_submatrix)
   600000    0.518    0.000    1.008    0.000 csr.py:416(process_slice)
   600000    0.243    0.000    0.243    0.000 csr.py:439(check_bounds)
   300000    0.145    0.000    0.349    0.000 data.py:22(__init__)
      2/1    0.000    0.000    0.000    0.000 ftrl.pyx:125(fit)
599999/300000    3.054    0.000    1.472    0.000 ftrl.pyx:156(update_one)
599999/300000    3.413    0.000    1.981    0.000 ftrl.pyx:176(predict_one)
   600000    1.069    0.000    1.069    0.000 getlimits.py:245(__init__)
   600000    0.268    0.000    0.268    0.000 getlimits.py:270(max)
  2100000    0.943    0.000    1.649    0.000 numeric.py:414(asarray)
   600000    1.702    0.000    3.899    0.000 sputils.py:119(get_index_dtype)
   900000    1.202    0.000    1.898    0.000 sputils.py:188(isintlike)
   300000    0.954    0.000    0.954    0.000 sputils.py:200(isshape)
   900000    0.493    0.000    0.696    0.000 sputils.py:215(issequence)
   300000    1.177    0.000    3.034    0.000 sputils.py:265(_unpack_index)
   300000    0.128    0.000    0.159    0.000 sputils.py:293(_check_ellipsis)
   300000    0.624    0.000    1.175    0.000 sputils.py:331(_check_boolean)
   300000    0.265    0.000    0.743    0.000 sputils.py:91(to_native)
   600000    0.456    0.000    0.456    0.000 {built-in method builtins.hasattr}
  6000000    0.702    0.000    0.702    0.000 {built-in method builtins.isinstance}
  3000000    0.249    0.000    0.249    0.000 {built-in method builtins.len}
   300000    0.141    0.000    0.141    0.000 {built-in method builtins.max}
  3000000    1.190    0.000    1.190    0.000 {built-in method numpy.core.multiarray.array}
   300000    1.381    0.000    1.381    0.000 {built-in method scipy.sparse._sparsetools.get_csr_submatrix}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      2/1    0.000    0.000    0.000    0.000 {method 'fit' of 'kaggler.online_model.ftrl.FTRL' objects}
   300000    0.129    0.000    0.129    0.000 {method 'indices' of 'slice' objects}
   300000    0.193    0.000    0.193    0.000 {method 'newbyteorder' of 'numpy.dtype' objects}

after:

         1200004 function calls (600004 primitive calls) in 4.753 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 base.py:99(get_shape)
      2/1    0.000    0.000    0.000    0.000 ftrl.pyx:125(fit)
599999/300000    2.293    0.000    1.544    0.000 ftrl.pyx:156(update_one)
599999/300000    2.460    0.000    1.613    0.000 ftrl.pyx:176(predict_one)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      2/1    0.000    0.000    0.000    0.000 {method 'fit' of 'kaggler.online_model.ftrl.FTRL' objects}

Note that when profiling, I set #cython linetrace=True. The current version may run even faster since the overhead for profiling has gone.

@stegben

This comment has been minimized.

Copy link
Contributor Author

commented Dec 24, 2016

Other candidate might be using lil_matrix since its row-iterating is fast, too. However, there're gonna be extra transforming time.

@jeongyoonlee

This comment has been minimized.

Copy link
Owner

commented Dec 24, 2016

Wow, incredible! Thanks!

@jeongyoonlee jeongyoonlee merged commit 5ee1628 into jeongyoonlee:master Dec 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.