New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF decomposition using RandomizedSVD #1355

Merged
merged 18 commits into from Jun 6, 2018

Conversation

Projects
None yet
4 participants
@haritha1313
Contributor

haritha1313 commented Apr 5, 2018

No description provided.

@haritha1313 haritha1313 changed the title from [WIP] CF decomposition using RandomizedSVD to CF decomposition using RandomizedSVD Apr 10, 2018

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Apr 22, 2018

Member

I think it might be worth the effort to refactor the cf class so that it uses a policy based design for the factorization, that way we can implement new methods without modifying the main class; I guess that makes even more sense if we add more methods in the future. Let me know what you think.

The PCA class or the optimizer codebase is a good example.

Member

zoq commented Apr 22, 2018

I think it might be worth the effort to refactor the cf class so that it uses a policy based design for the factorization, that way we can implement new methods without modifying the main class; I guess that makes even more sense if we add more methods in the future. Let me know what you think.

The PCA class or the optimizer codebase is a good example.

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 Apr 26, 2018

Contributor

@zoq You are right, that would make much more sense. I'm working on this and will complete as soon as I can. I've got exams coming up and hence the delay.

Contributor

haritha1313 commented Apr 26, 2018

@zoq You are right, that would make much more sense. I'm working on this and will complete as soon as I can. I've got exams coming up and hence the delay.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Apr 26, 2018

Member

No worries, we don't have to rush anything; good luck on your exams .

Member

zoq commented Apr 26, 2018

No worries, we don't have to rush anything; good luck on your exams .

@rcurtin

I think it's really nice to add this support so I am looking forward to seeing it merged, but I'll wait to review further since some refactoring will be done. Thanks!

haritha1313 added some commits May 18, 2018

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 May 18, 2018

Contributor

To do:
Debug CLI error when loading/saving model.
Any possible restructuring in cf_test.cpp.

Contributor

haritha1313 commented May 18, 2018

To do:
Debug CLI error when loading/saving model.
Any possible restructuring in cf_test.cpp.

haritha1313 added some commits May 18, 2018

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 19, 2018

Member

Looks like you solved the build issues; just some probably random test failures we should take a closer look into, in many cases this just means we have to adjust some tolerance (threshold) or increase the number of iterations.

Member

zoq commented May 19, 2018

Looks like you solved the build issues; just some probably random test failures we should take a closer look into, in many cases this just means we have to adjust some tolerance (threshold) or increase the number of iterations.

@Wenhao-H

This comment has been minimized.

Show comment
Hide comment
@Wenhao-H

Wenhao-H May 19, 2018

Member

Hmm it seems that our work have some overlaps. I am working on CF data normalization at #1397. Basically what it does is to normalize rating before training, and denormalize predicted rating before returning the results. I think the conflicts are easy to resolve so it's not a big for the moment. But as we are working on the same module for the summer, it is better if we discuss on how to collaborate so as to avoid repetitive work. We can discuss over email or a new issue later:))

For randomized_svd, it seems that the rank of decomposition does not equal the rank specified in constructor. Would you mind having a look at that? A test can be added to check whether rank of decomposition is correct.

Member

Wenhao-H commented May 19, 2018

Hmm it seems that our work have some overlaps. I am working on CF data normalization at #1397. Basically what it does is to normalize rating before training, and denormalize predicted rating before returning the results. I think the conflicts are easy to resolve so it's not a big for the moment. But as we are working on the same module for the summer, it is better if we discuss on how to collaborate so as to avoid repetitive work. We can discuss over email or a new issue later:))

For randomized_svd, it seems that the rank of decomposition does not equal the rank specified in constructor. Would you mind having a look at that? A test can be added to check whether rank of decomposition is correct.

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 May 19, 2018

Contributor

@Wenhao-H I think this PR is some basic refactoring which could cause some conflicts between our work. But once this part is done, my work will be on another method of performing collaborative filtering and yours on existing CF class i suppose. So I guess in future conflicts would be less. But in case there is anything else, we can always discuss. :)

Contributor

haritha1313 commented May 19, 2018

@Wenhao-H I think this PR is some basic refactoring which could cause some conflicts between our work. But once this part is done, my work will be on another method of performing collaborative filtering and yours on existing CF class i suppose. So I guess in future conflicts would be less. But in case there is anything else, we can always discuss. :)

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 20, 2018

Member

@Wenhao-H if this doesn't make it easier for you, I guess we could delay the changes if @haritha1313 agrees, but if you think it makes things easier, let's merge this one as soon as it's done.

Member

zoq commented May 20, 2018

@Wenhao-H if this doesn't make it easier for you, I guess we could delay the changes if @haritha1313 agrees, but if you think it makes things easier, let's merge this one as soon as it's done.

@Wenhao-H

This comment has been minimized.

Show comment
Hide comment
@Wenhao-H

Wenhao-H May 20, 2018

Member

@zoq Everything's fine! This is a nice improvement and it would be better if we can merge this soon so I can resolve the conflicts earlier.

Member

Wenhao-H commented May 20, 2018

@zoq Everything's fine! This is a nice improvement and it would be better if we can merge this soon so I can resolve the conflicts earlier.

Show outdated Hide outdated src/mlpack/tests/cf_test.cpp
Show outdated Hide outdated src/mlpack/tests/cf_test.cpp
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 20, 2018

Member

@Wenhao-H Okay, thanks for the fast response.

Member

zoq commented May 20, 2018

@Wenhao-H Okay, thanks for the fast response.

Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
* @param numRecs Number of Recommendations
* @param recommendations Matrix to save recommendations
* @param users Users for which recommendations are to be generated
* @param numRecs Number of Recommendations.

This comment has been minimized.

@zoq

zoq May 20, 2018

Member

Nice catch!

@zoq

zoq May 20, 2018

Member

Nice catch!

Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp

haritha1313 added some commits May 23, 2018

CF::CF(const size_t numUsersForSimilarity,
const size_t rank) :
// Default CFType constructor.
CFType::CFType(const size_t numUsersForSimilarity,

This comment has been minimized.

@zoq

zoq May 23, 2018

Member

I think if @Wenhao-H is going to templatize this class anyway, we can still go with CFType, @Wenhao-H let me know what you think.

@zoq

zoq May 23, 2018

Member

I think if @Wenhao-H is going to templatize this class anyway, we can still go with CFType, @Wenhao-H let me know what you think.

This comment has been minimized.

@Wenhao-H

Wenhao-H May 28, 2018

Member

@zoq Sorry that I missed your comment here. Yes you are right that I need to templatize the class. It's easier for me if we go with CFType.

@Wenhao-H

Wenhao-H May 28, 2018

Member

@zoq Sorry that I missed your comment here. Yes you are right that I need to templatize the class. It's easier for me if we go with CFType.

Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
Show outdated Hide outdated src/mlpack/methods/cf/decomposition_policies/CMakeLists.txt
Show outdated Hide outdated src/mlpack/methods/cf/decomposition_policies/SVD_complete_method.hpp
Show outdated Hide outdated src/mlpack/methods/cf/decomposition_policies/SVD_complete_method.hpp
Show outdated Hide outdated src/mlpack/methods/cf/decomposition_policies/SVD_complete_method.hpp
Show outdated Hide outdated src/mlpack/methods/cf/decomposition_policies/randomized_svd_method.hpp
Show outdated Hide outdated src/mlpack/methods/randomized_svd/randomized_svd.hpp
Show outdated Hide outdated src/mlpack/methods/randomized_svd/randomized_svd.hpp

haritha1313 added some commits May 26, 2018

*/
BOOST_AUTO_TEST_CASE(CFGetRecommendationsAllUsersRandSVDTest)
{
GetRecommendationsAllUsers<RandomizedSVDPolicy>();

This comment has been minimized.

@zoq

zoq May 27, 2018

Member

I guess, if we increase the iteratedPower and maxIterations and don't use the default values it might give us better results. In this case we would have to pass an instantization of the policy. Something like:

RandomizedSVDPolicy policy(10, 10);

might work.

@zoq

zoq May 27, 2018

Member

I guess, if we increase the iteratedPower and maxIterations and don't use the default values it might give us better results. In this case we would have to pass an instantization of the policy. Something like:

RandomizedSVDPolicy policy(10, 10);

might work.

This comment has been minimized.

@haritha1313

haritha1313 May 27, 2018

Contributor

I was getting different error log when testing locally and thus wasn't able to debug. The jenkins log made more sense, and it was a silly enough mistake. :) Guess its kind of solved now. I'll push the changes and if this change is still necessary, do let me know.

@haritha1313

haritha1313 May 27, 2018

Contributor

I was getting different error log when testing locally and thus wasn't able to debug. The jenkins log made more sense, and it was a silly enough mistake. :) Guess its kind of solved now. I'll push the changes and if this change is still necessary, do let me know.

haritha1313 added some commits May 27, 2018

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 28, 2018

Member

Does CFTest/TrainRegSVDTest pass on your system?

Member

zoq commented May 28, 2018

Does CFTest/TrainRegSVDTest pass on your system?

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 May 28, 2018

Contributor

CFTest hasn't been running properly on my system. This particular error that jenkins has caught makes it get stuck indefinitely when running locally.

Contributor

haritha1313 commented May 28, 2018

CFTest hasn't been running properly on my system. This particular error that jenkins has caught makes it get stuck indefinitely when running locally.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 28, 2018

Member

Can you identify the test? On my system only the TrainRegSVDTest failed, but the rest passes.

Member

zoq commented May 28, 2018

Can you identify the test? On my system only the TrainRegSVDTest failed, but the rest passes.

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 May 28, 2018

Contributor

Yes, that one fails in jenkins too. It gives error related to sparse matrix assignment, but none has been used in the train function for RegSVD.

Contributor

haritha1313 commented May 28, 2018

Yes, that one fails in jenkins too. It gives error related to sparse matrix assignment, but none has been used in the train function for RegSVD.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 31, 2018

Member

I think this is ready, before we merge this perhaps we could decrease the test time somehow. A simple solution is to see if we can decrease the number of training iterations. I'll take a closer look into the test cases, maybe you have an idea?

Member

zoq commented May 31, 2018

I think this is ready, before we merge this perhaps we could decrease the test time somehow. A simple solution is to see if we can decrease the number of training iterations. I'll take a closer look into the test cases, maybe you have an idea?

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 Jun 1, 2018

Contributor

It would be easy enough to change the maxIterations parameter, and this does reduce the test time significantly.

Contributor

haritha1313 commented Jun 1, 2018

It would be easy enough to change the maxIterations parameter, and this does reduce the test time significantly.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 2, 2018

Member

Yeah, the timings are looking good to me, thanks for looking into the issue.

Member

zoq commented Jun 2, 2018

Yeah, the timings are looking good to me, thanks for looking into the issue.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 2, 2018

Member

Let me run the cf tests using different random seeds to make sure they are stable; for more information #922.

Member

zoq commented Jun 2, 2018

Let me run the cf tests using different random seeds to make sure they are stable; for more information #922.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 2, 2018

Member

Okay, not a single failure in 300 trials, do you like to make any other changes?

Member

zoq commented Jun 2, 2018

Okay, not a single failure in 300 trials, do you like to make any other changes?

@haritha1313

This comment has been minimized.

Show comment
Hide comment
@haritha1313

haritha1313 Jun 2, 2018

Contributor

Thats great! I have gone through it again, and tried running using CLI too. I can't think of anything to change in particular, unless you have any suggestion, of course.

Contributor

haritha1313 commented Jun 2, 2018

Thats great! I have gone through it again, and tried running using CLI too. I can't think of anything to change in particular, unless you have any suggestion, of course.

@zoq

zoq approved these changes Jun 2, 2018

No more comments from my side, this code is going to be changed over the next weeks so expect more changes; I'll let this sit for 3 days before I'll merge this in, to give anyone else a chance to take a look at the code.

@zoq zoq merged commit e6c64a9 into mlpack:master Jun 6, 2018

5 checks passed

Memory Checks
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 6, 2018

Member

Thanks again for the great contribution, this should make things a lot easier in the future.

Member

zoq commented Jun 6, 2018

Thanks again for the great contribution, this should make things a lot easier in the future.

@haritha1313 haritha1313 deleted the haritha1313:randSVD_cf branch Jun 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment