Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User similarity #1329

Merged
merged 71 commits into from
Mar 11, 2021
Merged

User similarity #1329

merged 71 commits into from
Mar 11, 2021

Conversation

mayhem
Copy link
Member

@mayhem mayhem commented Mar 9, 2021

This PR is the user similarity work that lucifer and I have been working on, based on Param's work.

paramsingh and others added 30 commits July 25, 2020 13:40
This reverts commit 7adb05b.

Reverting because it isn't relevant to this PR
We default to None but do not handle it anywhere. None will cause an
error while calculating the from_date offset which is expecting an int.
admin/sql/create_foreign_keys.sql Outdated Show resolved Hide resolved
@@ -35,6 +36,7 @@
'cf.recommendations.recording.recommendations': listenbrainz_spark.recommendations.recording.recommend.main,
'import.mapping': listenbrainz_spark.request_consumer.jobs.import_dump.import_mapping_to_hdfs,
'import.artist_relation': listenbrainz_spark.request_consumer.jobs.import_dump.import_artist_relation_to_hdfs,
'similarity.similar_users': listenbrainz_spark.recommendations.recording.user_similarity.main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is user_similarity in recommendations.recording ? It should be outside in a separate user_similarity package imo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because we're still using recording data to calculate this.... But I'll let lucifer chime in...

Copy link
Member

@amCap1712 amCap1712 Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intend to move user_similarity to a different package. I initially put it there to allow easy reuse of code as the implementation was not final or tested. Now that we have a working implementation, I'll do the cleanup (add comments, type annotations, refactor code) and add tests.



def save_dataframe_metadata_to_hdfs(metadata):
def save_dataframe_metadata_to_hdfs(metadata, df_metadata_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add type annotations to this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

listenbrainz/db/stats.py Outdated Show resolved Hide resolved
}


def threshold_similar_users(matrix, threshold):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types and docstrings on these methods would really help readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

current_app.logger.error(str(err), exc_info=True)
raise

tuple_mapped_rdd = playcounts_df.rdd.map(lambda x: MatrixEntry(x["recording_id"], x["user_id"], x["count"]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style thing again, but comments in this and the next few lines would greatly enhance readability in my opinion. It's not obvious at the outset what a co-ordinate matrix or a indexed-row-matrix or vectors_mapped_rdd is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some description in the docstring.

@amCap1712 amCap1712 marked this pull request as ready for review March 11, 2021 14:09
@mayhem mayhem requested a review from paramsingh March 11, 2021 14:09
Copy link
Collaborator

@paramsingh paramsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks for making those changes!

@mayhem mayhem merged commit 3cf1d79 into master Mar 11, 2021
@mayhem mayhem deleted the user-similarity branch March 11, 2021 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants