Attributes for candidate recording dumps #710

vansika · 2020-01-27T07:38:07Z

Description

We are moving towards our first release, yay!
As a first step, we need to fetch all the attributes that will make up the schema.
Schema

user_name
release_name
artist_name
track_name
mb_recording_mbid
mb_artist_credit_mbids
mb_artist_credit_id
mb_release_mbid

This PR gets all the required attributes while generating recommendations.

mayhem

LGTM.

* used mbids->msids mapping in create_dataframes used pyspark API over sql queries * fixed bad indent * Use combined mapping (recording_artist_msid_mbid) and not recording and artist mapping independently * Use artist_credit_artist_credit relation and recording_artist_mbid_msid mapping to generate recommendations * vertical align pyspark functions to increase readability * Update utils.py * Unit test listenbrainz_spark/recommendations (#74) * add tests for create_dataframes modify utils to avoid prepending hdfs_cluster_uri to every path * unit tests for create_dataframes * unit tests for train models * Add tests for request consumer and fix test.sh path problems (#72) * Add first test for request consumer * Use python -m pytest instead of py.test py.test doesn't add the current dir to PYTHONPATH which is why if you try to add a test anywhere other than listenbrainz_spark/tests you'll get an import error. this fixes that problem. * fix definition of self * Import create_app * Add test for get_result * declared constant var as a class member changed class names for consistency changed test_get_dates... to assert difference used assertListEqual for lists removed wildcard imports typos and newlines * unit tests for candidate_sets.py and recommend.py * defined date as an instance variable Co-authored-by: Param Singh <iliekcomputers@gmail.com> * upload mappings, artist relation and listens to HDFS (#77) * download listens and mapping from FTP upload listens and mapping to HDFS * calculate time taken to download files from ftp * correct listenbrainz listens dump path on ftp * add misiing import and remove extra func arg * upload and download script for artist relation * add *force* utility to delete existing data * rectify name of imports, delete unused import files * update tests and recommendation engine with mdis_mbid_mapping * use NotImplementedException to catch null callback * Add archival warning * Unit tests for HDFS/FTP module (#698) * download listens and mapping from FTP upload listens and mapping to HDFS * calculate time taken to download files from ftp * correct listenbrainz listens dump path on ftp * add misiing import and remove extra func arg * upload and download script for artist relation * add *force* utility to delete existing data * rectify name of imports, delete unused import files * update tests and recommendation engine with mdis_mbid_mapping * use NotImplementedException to catch null callback * tests for ftp downloader and hdfs uploader for mapping, listens, artist relations * update func name in utils and add test for it * Fix startup script of spark-request-consumer * add pxz.wait to avoid race condiiton modified utils.create_dataframe to warp spark row object in list * improve function names * define constants on top of file and import them in tests Co-authored-by: Param Singh <iliekcomputers@gmail.com> * Attributes for candidate recording dumps (#710) * get attributes for candidate recoridngs dump * update unit tests with changes in recommed.py * typo in function name * import utils as module and not attribute Co-authored-by: Param Singh <iliekcomputers@gmail.com>

vansika added 3 commits January 27, 2020 11:54

get attributes for candidate recoridngs dump

9063339

update unit tests with changes in recommed.py

beca484

typo in function name

88ce4dc

vansika requested a review from mayhem January 27, 2020 07:38

mayhem approved these changes Jan 30, 2020

View reviewed changes

vansika merged commit 23b57d1 into metabrainz:all-changes-mapping Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes for candidate recording dumps #710

Attributes for candidate recording dumps #710

vansika commented Jan 27, 2020

mayhem left a comment

Attributes for candidate recording dumps #710

Attributes for candidate recording dumps #710

Conversation

vansika commented Jan 27, 2020

Description

mayhem left a comment

Choose a reason for hiding this comment