Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make FTRL more c-style and faster #20

Conversation

stegben
Copy link
Contributor

@stegben stegben commented Dec 23, 2016

It's about 10% ~ 30% faster when interaction=False.

You may use the following script to profile the performance. But before compiling, add # cython: linetrace=True at the header of ftrl.pyx.

import cProfile

import numpy as np
np.random.seed(1234)
import scipy.sparse as sps

from kaggler.online_model import FTRL


DATA_NUM = 5e7


class customCSR(object):
    def __init__(self, csr_matrix):
        self.data = []
        self.shape = csr_matrix.shape
        for row in range(self.shape[0]):
            self.data.append(csr_matrix[row])
    def __getitem__(self, idx):
        return self.data[idx]


def main():
    print('create y...')
    y = np.random.randint(0, 1, DATA_NUM)
    print('create x...')
    row = np.random.randint(0, 100000, DATA_NUM)
    col = np.random.randint(0, 10, DATA_NUM)
    data = np.ones(DATA_NUM)
    x = sps.csr_matrix((data, (row, col)), dtype=np.int8)
    x = customCSR(x)
    
    print('train...')
    profiler = cProfile.Profile(subcalls=True, builtins=True, timeunit=0.001,)
    clf = FTRL(interaction=False)
    profiler.enable()
    clf.fit(x, y)
    profiler.disable()
    profiler.print_stats()
    print(clf.predict(x))


if __name__ == '__main__':
    main()

@stegben
Copy link
Contributor Author

stegben commented Dec 23, 2016

And as you may already know, the main overhead is scipy sparse matrix, not what I fixed here. And I would like to hear about your thoughts of it.

@stegben
Copy link
Contributor Author

stegben commented Dec 23, 2016

Ah, sorry, I didn't follow some PEP8 and docstring format. Plz do not merge for a while.

@stegben stegben force-pushed the cph-patch-ftrl-cython-optimization branch from 56e9f29 to 691d09b Compare December 23, 2016 13:27
@stegben
Copy link
Contributor Author

stegben commented Dec 23, 2016

done

@stegben
Copy link
Contributor Author

stegben commented Dec 23, 2016

Just curious, are you participating in Outbrain competition?

@jeongyoonlee
Copy link
Owner

jeongyoonlee commented Dec 23, 2016

@stegben Thanks again! I will complete the code review later because it's holiday here. I'm not participating in the Outbrain competition. I'm considering to start the annual Santa competition instead. :)

Also, you may find other modules such as preprocessing and data_io useful other than online_model. Please take a look.

@stegben
Copy link
Contributor Author

stegben commented Dec 23, 2016

I'll take a look at those modules, thank you and Merry Christmas!

@stegben
Copy link
Contributor Author

stegben commented Dec 24, 2016

Wait, plz don't merge it now, I'm working on #21 which I think should consider first. If that is OK and merged, I'll rebase this branch based on that.

@stegben
Copy link
Contributor Author

stegben commented Dec 26, 2016

#22

@stegben stegben closed this Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants