Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes for candidate recording dumps #710

Merged
merged 3 commits into from Jan 30, 2020

Conversation

vansika
Copy link
Member

@vansika vansika commented Jan 27, 2020

Description

We are moving towards our first release, yay!
As a first step, we need to fetch all the attributes that will make up the schema.
Schema

  • user_name
  • release_name
  • artist_name
  • track_name
  • mb_recording_mbid
  • mb_artist_credit_mbids
  • mb_artist_credit_id
  • mb_release_mbid

This PR gets all the required attributes while generating recommendations.

@vansika vansika requested a review from mayhem January 27, 2020 07:38
Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vansika vansika merged commit 23b57d1 into metabrainz:all-changes-mapping Jan 30, 2020
vansika added a commit that referenced this pull request Jan 30, 2020
* used mbids->msids mapping in create_dataframes
used pyspark API over sql queries

* fixed bad indent

* Use combined mapping (recording_artist_msid_mbid) and not recording and artist mapping independently

* Use artist_credit_artist_credit relation and recording_artist_mbid_msid mapping to generate recommendations

* vertical align pyspark functions to increase readability

* Update utils.py

* Unit test listenbrainz_spark/recommendations (#74)

* add tests for create_dataframes
modify utils to avoid prepending hdfs_cluster_uri to every path

* unit tests for create_dataframes

* unit tests for train models

* Add tests for request consumer and fix test.sh path problems (#72)

* Add first test for request consumer

* Use python -m pytest instead of py.test

py.test doesn't add the current dir to PYTHONPATH which is why
if you try to add a test anywhere other than listenbrainz_spark/tests
you'll get an import error. this fixes that problem.

* fix definition of self

* Import create_app

* Add test for get_result

* declared constant var as a class member
changed class names for consistency
changed test_get_dates... to assert difference
used assertListEqual for lists
removed wildcard imports
typos and newlines

* unit tests for candidate_sets.py and recommend.py

* defined date as an instance variable

Co-authored-by: Param Singh <iliekcomputers@gmail.com>

* upload mappings, artist relation and listens to HDFS (#77)

* download listens and mapping from FTP
upload listens and mapping to HDFS

* calculate time taken to download files from ftp

* correct listenbrainz listens dump path on ftp

* add misiing import and remove extra func arg

* upload and download script for artist relation

* add *force* utility to delete existing data

* rectify name of imports, delete unused import files

* update tests and recommendation engine with mdis_mbid_mapping

* use NotImplementedException to catch null callback

* Add archival warning

* Unit tests for HDFS/FTP module (#698)

* download listens and mapping from FTP
upload listens and mapping to HDFS

* calculate time taken to download files from ftp

* correct listenbrainz listens dump path on ftp

* add misiing import and remove extra func arg

* upload and download script for artist relation

* add *force* utility to delete existing data

* rectify name of imports, delete unused import files

* update tests and recommendation engine with mdis_mbid_mapping

* use NotImplementedException to catch null callback

* tests for ftp downloader and hdfs uploader for mapping, listens, artist relations

* update func name in utils and add test for it

* Fix startup script of spark-request-consumer

* add pxz.wait to avoid race condiiton
modified utils.create_dataframe to warp spark row object  in list

* improve function names

* define constants on top of file and import them in tests

Co-authored-by: Param Singh <iliekcomputers@gmail.com>

* Attributes for candidate recording dumps (#710)

* get attributes for candidate recoridngs dump

* update unit tests with changes in recommed.py

* typo in function name

* import utils as module and not attribute

Co-authored-by: Param Singh <iliekcomputers@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants