New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF Data Normalization #1397

Merged
merged 64 commits into from Jun 16, 2018

Conversation

Projects
None yet
4 participants
@Wenhao-H
Member

Wenhao-H commented May 17, 2018

This PR adds data normalization for the CF module.

Things to do:

  • Add OverallMeanNormalization, UserMeanNormalization, ItemMeanNormalization, ZNormalization, CombinedNormalization...
  • Add tests. Debug. Update comments.

Wenhao-H added some commits May 19, 2018

@Wenhao-H Wenhao-H changed the title from [WIP] CF Data Normalization to CF Data Normalization May 20, 2018

@Wenhao-H

This comment has been minimized.

Show comment
Hide comment
@Wenhao-H

Wenhao-H May 20, 2018

Member

I think this can be reviewed now. I will resolve the conflicts after #1297 is merged and add tests for normalization methods.

Member

Wenhao-H commented May 20, 2018

I think this can be reviewed now. I will resolve the conflicts after #1297 is merged and add tests for normalization methods.

haritha1313 added some commits May 23, 2018

Wenhao-H added some commits Jun 9, 2018

@Wenhao-H

This comment has been minimized.

Show comment
Hide comment
@Wenhao-H

Wenhao-H Jun 9, 2018

Member

@lozhnikov @zoq All data normalization methods and tests are added. I think this PR is ready to be merged. Can you please review this?

Member

Wenhao-H commented Jun 9, 2018

@lozhnikov @zoq All data normalization methods and tests are added. I think this PR is ready to be merged. Can you please review this?

@zoq

Some first/quick comments from my side.

Show outdated Hide outdated src/mlpack/methods/cf/normalization/combined_normalization.hpp
/**
* This normalization class performs a sequence of normalization methods on
* raw ratings.

This comment has been minimized.

@zoq

zoq Jun 9, 2018

Member

Can you add a simple example in the comments here something like:

* An example of how to use the interface is shown below:
*
* @code
* arma::mat data; // Data matrix.
*
* const double epsilon = 0.01; // Relative error limit of data in subspace.
* const double delta = 0.1 // Lower error bound for Monte Carlo estimate.
*
* arma::mat u, v, sigma; // Matrices for the factors. data = u * sigma * v.t()
*
* // Get the factorization in the constructor.
* QUIC_SVD(data, u, v, sigma, epsilon, delta);
* @endcode

@zoq

zoq Jun 9, 2018

Member

Can you add a simple example in the comments here something like:

* An example of how to use the interface is shown below:
*
* @code
* arma::mat data; // Data matrix.
*
* const double epsilon = 0.01; // Relative error limit of data in subspace.
* const double delta = 0.1 // Lower error bound for Monte Carlo estimate.
*
* arma::mat u, v, sigma; // Matrices for the factors. data = u * sigma * v.t()
*
* // Get the factorization in the constructor.
* QUIC_SVD(data, u, v, sigma, epsilon, delta);
* @endcode

Show outdated Hide outdated src/mlpack/methods/cf/normalization/item_mean_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/item_mean_normalization.hpp

@Wenhao-H Wenhao-H referenced this pull request Jun 10, 2018

Merged

CF Interpolation Weights #1410

4 of 4 tasks complete
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
for (; it != it_end; it++)
{
*it = *it - itemMean(it.row());
// The algorithm omits rating of zero. If normalized rating equals zero,

This comment has been minimized.

@lozhnikov

lozhnikov Jun 11, 2018

Contributor

Could you add the same comment to GetRecommendations() and CleanData()?

@lozhnikov

lozhnikov Jun 11, 2018

Contributor

Could you add the same comment to GetRecommendations() and CleanData()?

This comment has been minimized.

@Wenhao-H

Wenhao-H Jun 12, 2018

Member

Added:)

@Wenhao-H

Wenhao-H Jun 12, 2018

Member

Added:)

Show outdated Hide outdated src/mlpack/methods/cf/normalization/z_score_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/z_score_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/user_mean_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/user_mean_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/no_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/item_mean_normalization.hpp
{
for (size_t i = 0; i < predictions.n_elem; i++)
{
const size_t item = combinations(1, i);

This comment has been minimized.

@zoq

zoq Jun 11, 2018

Member

your choice, but I think we could also just write:

predictions += combinations.submat(...).
@zoq

zoq Jun 11, 2018

Member

your choice, but I think we could also just write:

predictions += combinations.submat(...).

This comment has been minimized.

@Wenhao-H

Wenhao-H Jun 12, 2018

Member

hmm I think we should do something like:

arma::Col<size_t> items = combinations.row(1);
predictions += itemMean.elem(items);

But X.elem() takes arma::uvec as input and I didn't find a easy conversion from arma::Col<size_t> to arma::uvec. So unless there is an easy conversion I missed, I think we can stay with the current solution.

@Wenhao-H

Wenhao-H Jun 12, 2018

Member

hmm I think we should do something like:

arma::Col<size_t> items = combinations.row(1);
predictions += itemMean.elem(items);

But X.elem() takes arma::uvec as input and I didn't find a easy conversion from arma::Col<size_t> to arma::uvec. So unless there is an easy conversion I missed, I think we can stay with the current solution.

This comment has been minimized.

@zoq

zoq Jun 12, 2018

Member

There is http://arma.sourceforge.net/docs.html#conv_to, but I think since this would cause an extra copy, let's stick with the current implementation, thanks for looking into it.

@zoq

zoq Jun 12, 2018

Member

There is http://arma.sourceforge.net/docs.html#conv_to, but I think since this would cause an extra copy, let's stick with the current implementation, thanks for looking into it.

Show outdated Hide outdated src/mlpack/methods/cf/normalization/item_mean_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/combined_normalization.hpp
Show outdated Hide outdated src/mlpack/methods/cf/normalization/combined_normalization.hpp
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 13, 2018

Member

Let me know if you need help to solve the travis issue.

Member

zoq commented Jun 13, 2018

Let me know if you need help to solve the travis issue.

Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf.hpp
Show outdated Hide outdated src/mlpack/methods/cf/cf_impl.hpp
{
std::string tagName = "normalization_";
tagName += std::to_string(I);
ar & boost::serialization::make_nvp(

This comment has been minimized.

@lozhnikov

lozhnikov Jun 14, 2018

Contributor

Correct me if I'm mistaken, but why don't you write just ar & BOOST_SERIALIZATION_NVP(normalizations);? I thought we can serialize tuples.

@lozhnikov

lozhnikov Jun 14, 2018

Contributor

Correct me if I'm mistaken, but why don't you write just ar & BOOST_SERIALIZATION_NVP(normalizations);? I thought we can serialize tuples.

This comment has been minimized.

@Wenhao-H

Wenhao-H Jun 14, 2018

Member

Boost Serialization doesn't support tuple serialization (refer to here), so I wrote an implementation to serialize it.

@Wenhao-H

Wenhao-H Jun 14, 2018

Member

Boost Serialization doesn't support tuple serialization (refer to here), so I wrote an implementation to serialize it.

This comment has been minimized.

@lozhnikov

lozhnikov Jun 14, 2018

Contributor

Yeah, you're right. Curious, Boost 1.66 still doesn't support tuples, I didn't know.

@lozhnikov

lozhnikov Jun 14, 2018

Contributor

Yeah, you're right. Curious, Boost 1.66 still doesn't support tuples, I didn't know.

This comment has been minimized.

@Wenhao-H

Wenhao-H Jun 14, 2018

Member

Yea boost 1.66 still doesn't support tuple serialization. I tried and it fails.

@Wenhao-H

Wenhao-H Jun 14, 2018

Member

Yea boost 1.66 still doesn't support tuple serialization. I tried and it fails.

@Wenhao-H

This comment has been minimized.

Show comment
Hide comment
@Wenhao-H

Wenhao-H Jun 15, 2018

Member

@zoq @lozhnikov Thanks for reviewing. The travis issue has been solved with the help from Mikhail. Do you have any further comment?

Member

Wenhao-H commented Jun 15, 2018

@zoq @lozhnikov Thanks for reviewing. The travis issue has been solved with the help from Mikhail. Do you have any further comment?

@zoq

zoq approved these changes Jun 15, 2018

Thanks for the great contribution, no more comments from my side.

@lozhnikov

Looks good to me. I've just understood that it is worth to add these normalization techniques to the CLI. But I think we can do another PR for that. So, I'll merge the PR in a couple of hours (I'd like to build the PR against the recent master branch and make sure that there are no issues).
@Wenhao-H Thanks for the contribution!
and
@zoq Thanks for the review!

@lozhnikov lozhnikov merged commit 2554f60 into mlpack:master Jun 16, 2018

5 checks passed

Memory Checks
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment