Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queries for entities (artist, user) #22

Merged
merged 11 commits into from Jun 10, 2019

Conversation

@vansika
Copy link
Contributor

commented Feb 26, 2019

Made changes in artist.py

vansika and others added 8 commits Jan 26, 2019
Upgrade spark version from 2.3.1 to 2.3.2
Improve the process to download apache distribution packages

def get_releases():
result = run_query("""

This comment has been minimized.

Copy link
@paramsingh

paramsingh Feb 28, 2019

Member

We need to order this by the most listened to release for each artist.



def get_artist_names():
result = run_query("""

This comment has been minimized.

Copy link
@paramsingh

paramsingh Feb 28, 2019

Member

This doesn't seem to calculate any stats.


def get_listen_count():
result = run_query("""
SELECT artist_msid

This comment has been minimized.

Copy link
@paramsingh

paramsingh Feb 28, 2019

Member

We should get more details here, such as the artist name.

GROUP BY artist_msid, user_name
ORDER BY cnt DESC
""")
result.show()

This comment has been minimized.

Copy link
@paramsingh

paramsingh Mar 17, 2019

Member

Lets remove these shows

Copy link
Member

left a comment

There is no code here to push stuff into rabbitmq, was I working on that? I am sorry, I forget. :sheepish:


def get_most_popular():
def get_listener():

This comment has been minimized.

Copy link
@paramsingh

paramsingh Mar 17, 2019

Member

Docstrings explaining what each function does would be helpful

, recording_mbid
, recording_msid
, count(recording_msid) as cnt
FROM listen

This comment has been minimized.

Copy link
@paramsingh

paramsingh Mar 17, 2019

Member

In the other PR, we pass around the name of the temporary table. I know I am the one who started the name listen but we should have better names and pass them around too (in consistency with the other parts of the codebase)


def main(app_name):
t0 = time.time()
listenbrainz_spark.init_spark_session(app_name)
df = None
for y in range(2018, 2019):
for m in range(12, 13):
t = datetime.utcnow().replace(day=1)

This comment has been minimized.

Copy link
@paramsingh

paramsingh Mar 17, 2019

Member

This code is duplicated a lot, we should consider creating a function that takes a time range and returns a dataframe with listens between that timeframe.

This comment has been minimized.

Copy link
@vansika

vansika Mar 19, 2019

Author Contributor

The loop was for testing purpose. Like we have decided, stats would be calculated for the previous month/week, this code would no longer be needed.

@paramsingh

This comment has been minimized.

Copy link
Member

commented Jun 10, 2019

Merging this into the popular-artist branch and gonna continue from there.

@paramsingh paramsingh merged commit 7571bb4 into metabrainz:popular-artist Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.