-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User similarity #1329
User similarity #1329
Conversation
This reverts commit 7adb05b. Reverting because it isn't relevant to this PR
We default to None but do not handle it anywhere. None will cause an error while calculating the from_date offset which is expecting an int.
…server into user-similarity
…server into user-similarity
…server into user-similarity
…stent with the similar-users endpoint
…server into user-similarity
listenbrainz_spark/query_map.py
Outdated
@@ -35,6 +36,7 @@ | |||
'cf.recommendations.recording.recommendations': listenbrainz_spark.recommendations.recording.recommend.main, | |||
'import.mapping': listenbrainz_spark.request_consumer.jobs.import_dump.import_mapping_to_hdfs, | |||
'import.artist_relation': listenbrainz_spark.request_consumer.jobs.import_dump.import_artist_relation_to_hdfs, | |||
'similarity.similar_users': listenbrainz_spark.recommendations.recording.user_similarity.main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is user_similarity
in recommendations.recording
? It should be outside in a separate user_similarity
package imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because we're still using recording data to calculate this.... But I'll let lucifer chime in...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intend to move user_similarity to a different package. I initially put it there to allow easy reuse of code as the implementation was not final or tested. Now that we have a working implementation, I'll do the cleanup (add comments, type annotations, refactor code) and add tests.
|
||
|
||
def save_dataframe_metadata_to_hdfs(metadata): | ||
def save_dataframe_metadata_to_hdfs(metadata, df_metadata_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add type annotations to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
|
||
def threshold_similar_users(matrix, threshold): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types and docstrings on these methods would really help readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
listenbrainz_spark/recommendations/recording/user_similarity.py
Outdated
Show resolved
Hide resolved
current_app.logger.error(str(err), exc_info=True) | ||
raise | ||
|
||
tuple_mapped_rdd = playcounts_df.rdd.map(lambda x: MatrixEntry(x["recording_id"], x["user_id"], x["count"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style thing again, but comments in this and the next few lines would greatly enhance readability in my opinion. It's not obvious at the outset what a co-ordinate matrix or a indexed-row-matrix or vectors_mapped_rdd is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some description in the docstring.
Co-authored-by: Param Singh <iliekcomputers@gmail.com>
6caffa2
to
a0ad6f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thanks for making those changes!
This PR is the user similarity work that lucifer and I have been working on, based on Param's work.