#### guiding questions 

##### the basic q's
- in true spotify wrapped fashion (everything from jan 1st 2023 onwards)...
    - top 5 artists?
    - top 5 songs?
    - total minutes listened?
    - top genre?

##### more in depth q's
- how have my top 3 songs changed over the last 6 months?
    - using 6 month intervals because of the limited time period of data
- is there an increasing or decreasing trend in amount of music time listened to? what are some probable causes of this?

In [36]:
# data cleaning
import pandas as pd

# get total streaming history (earliest to latest dates)
strm_hist_0 = pd.read_json('data/StreamingHistory0.json')
strm_hist_1 = pd.read_json('data/StreamingHistory1.json')

strm_hist_total = pd.concat([strm_hist_0, strm_hist_1])

strm_hist_total

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2022-12-17 07:06,keshi,skeletons,148449
1,2022-12-23 23:56,Kassy,Aching,74929
2,2022-12-25 00:52,DPR LIVE,Jasmine,6816
3,2022-12-25 00:56,Kassy,Aching,242048
4,2022-12-26 01:08,Kassy,Aching,176095
...,...,...,...,...
5557,2023-12-24 04:34,Clinton Kane,AVO TOAST,161461
5558,2023-12-24 04:37,Jessica Baio,at least,167502
5559,2023-12-24 04:41,Ed Sheeran,Merry Christmas,208546
5560,2023-12-24 04:44,Zach Hood,lonely isn't the word for this,167060


In [37]:
# data cleaning continued

# get only the 2023 dates
# sort by date
current_year = strm_hist_total.sort_values(by='endTime')

# convert to datetime since it has an extended time format (y/m/d and hr/min)
current_year['endTime'] = pd.to_datetime(current_year['endTime'], format='%Y-%m-%d %H:%M') 

# keep only the years, in a String format, so that we can filter through later
current_year['endTime'] = current_year['endTime'].dt.strftime('%Y') 

# drop all rows with year 2022 (returns the indexes of the rows w/ year 2022)
index_of_drop_rows = current_year[
    # specify 2022 as a string not an integer b/c the years are now String objects
    current_year['endTime'] == '2022' 
].index

current_year.drop(index_of_drop_rows, inplace=True)

current_year.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed
74,2023,WINTER,ONCE AGAIN,41817
73,2023,AILEE,I′m Sorry,256173
75,2023,TWICE,Basics,176973
76,2023,Wonstein,Your Existence,204229
77,2023,iKON,LOVE SCENARIO,209648


In [39]:
# the basic questions

# top 5 artists?
all_artists = current_year['artistName'].value_counts() # currently a Series

framed_artists = all_artists.to_frame() # copy the original df

framed_artists.rename(
    columns={'artistName': 'Artist', 
             'count': 'Number of Tracks Played'}, inplace=True) # rename columns 

framed_artists.head(5)


Unnamed: 0_level_0,Number of Tracks Played
artistName,Unnamed: 1_level_1
Tate McRae,943
Patrick Brasca,752
Taylor Swift,635
Olivia Rodrigo,472
Virginia To Vegas,374


In [55]:
# top 5 songs?

sort_by_played = current_year.sort_values(by='msPlayed', ascending=False)

sort_by_played.head(50)

sort_by_played.iloc[5:] # drop the first 5 rows b/c they're podcasts


Unnamed: 0,endTime,artistName,trackName,msPlayed
6660,2023,Hozier,Eat Your Young,472904
9909,2023,"Hello, {Real} World!",Episode 28: A Recap of our University Experience,413024
360,2023,Jay Chou,稻香,393730
7408,2023,JUNNY,MOVIE,361556
4221,2023,Gracie Abrams,Right now,350836
...,...,...,...,...
1013,2023,Tate McRae,you're so cool,0
5361,2023,TRI.BE,WONDERLAND - English Ver.,0
1014,2023,Swedish House Mafia,Moth To A Flame (with The Weeknd),0
4549,2023,Tate McRae,greedy,0


In [42]:
# total minutes listened?
import math

total_milisecs = current_year['msPlayed'].sum() # in milliseconds
total_secs = math.floor(total_milisecs/1000) # get seconds

print(f'{math.floor(total_secs/60)} minutes {round(total_secs % 60)} seconds') 

40399 minutes 51 seconds
