# Spotify Streaming Data Analysis

### Checkout my youtube tutorial which explains the code 👉 [Video HERE](https://youtu.be/OLxqUuiwO_g?si=iPFvwvNzFHykxzhq)

## 1️⃣ Import Data

Your Spotify data will come to you as many .json files

In [17]:
#Change this to where your data is
parent_folder = "/Users/name/Downloads/Spotify Extended Streaming History/"

file_name = "Streaming_History_Audio_*.json"

In [18]:
import pandas as pd
import glob

df = pd.DataFrame()

for file in glob.glob(parent_folder+file_name):
    temp = pd.read_json(file)
    df = pd.concat([df, temp])

  df = pd.concat([df, temp])


In [23]:
print(df.shape)
df

(248041, 19)


Unnamed: 0,ts,platform,ms_played,conn_country,user_agent_decrypted,master_metadata_track_name,master_metadata_album_artist_name,master_metadata_album_album_name,spotify_track_uri,episode_name,episode_show_name,spotify_episode_uri,reason_start,reason_end,shuffle,skipped,offline,offline_timestamp,incognito_mode
0,2013-11-24T03:10:53Z,OS X 10.9.0 [x86 4],163319,AU,unknown,Timber,Pitbull,Timber,spotify:track:1zHlj4dQ8ZAtrayhuDDmkY,,,,unknown,popup,False,1.0,False,0,False
1,2013-11-24T03:11:07Z,OS X 10.9.0 [x86 4],14584,AU,unknown,Bros,Wolf Alice,Bros,spotify:track:57thHrPMP26JFKVM5M2znL,,,,popup,popup,False,1.0,False,0,False
2,2013-11-24T03:14:14Z,OS X 10.9.0 [x86 4],187026,AU,unknown,All Night,Icona Pop,THIS IS... ICONA POP,spotify:track:1ADGydOgtuzBNmdFpcQGLB,,,,popup,trackdone,False,0.0,False,0,False
3,2013-11-24T03:17:31Z,OS X 10.9.0 [x86 4],204053,AU,unknown,Timber,Pitbull,Timber,spotify:track:1zHlj4dQ8ZAtrayhuDDmkY,,,,trackdone,trackdone,False,0.0,False,0,False
4,2013-11-24T03:57:10Z,"iOS 7.0.4 (iPhone4,1)",224295,AU,unknown,Roar,Katy Perry,PRISM,spotify:track:3bDGwl0X3EjQmIyFD1uif5,,,,,trackdone,False,0.0,False,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16965,2016-07-18T21:25:42Z,"iOS 9.3.2 (iPhone8,1)",203196,AU,unknown,So Sentimental,Violent Soho,WACO,spotify:track:2gPASVJHzAHYeGHeus8IBr,,,,trackdone,endplay,False,,False,0,False
16966,2016-07-18T21:25:46Z,"iOS 9.3.2 (iPhone8,1)",0,AU,unknown,Pokemon Theme,Pokémon,Pokemon X - Ten Years Of Pokemon,spotify:track:6xG2ZGudUgtV235xvDlSEt,,,,clickrow,endplay,False,,False,0,False
16967,2016-07-18T21:25:51Z,"iOS 9.3.2 (iPhone8,1)",4667,AU,unknown,Pokemon Theme,Pokémon,Pokemon X - Ten Years Of Pokemon,spotify:track:6xG2ZGudUgtV235xvDlSEt,,,,clickrow,endplay,False,,False,0,False
16968,2016-07-18T21:25:56Z,"iOS 9.3.2 (iPhone8,1)",4760,AU,unknown,Pokemon Theme,Pokémon,Pokemon X - Ten Years Of Pokemon,spotify:track:6xG2ZGudUgtV235xvDlSEt,,,,clickrow,endplay,False,,False,0,False


## 2️⃣ Preprocessing
### 🕰️ Convert to Timestamp & Localize

In [26]:
#convert to datetime
df['ts'] = pd.to_datetime(df['ts'], format="%Y-%m-%dT%H:%M:%SZ", utc=True)

#localise
import pytz
my_tz = pytz.timezone("Australia/Sydney")

#convert timezone
df['ts'] = df['ts'].dt.tz_convert(my_tz)

In [27]:
#sort data in chronological order
df.sort_values(by='ts', inplace=True)

### 🗓️ Extract the `year`

In [28]:
df['year'] = df['ts'].dt.year
df[['ts', 'year']]

Unnamed: 0,ts,year
0,2013-11-24 14:10:53+11:00,2013
1,2013-11-24 14:11:07+11:00,2013
2,2013-11-24 14:14:14+11:00,2013
3,2013-11-24 14:17:31+11:00,2013
4,2013-11-24 14:57:10+11:00,2013
...,...,...
243,2024-09-27 07:43:03+10:00,2024
244,2024-09-27 07:48:11+10:00,2024
245,2024-09-27 07:51:19+10:00,2024
246,2024-09-27 07:55:15+10:00,2024


## 4️⃣ Who Was the Most Played Artist in 2024

In [19]:
df[df['year']==2024].groupby('master_metadata_album_artist_name')['master_metadata_album_artist_name'].count().sort_values(ascending=False)

master_metadata_album_artist_name
Taylor Swift         3155
Charli xcx            975
Romy                  504
Chappell Roan         474
Sabrina Carpenter     330
                     ... 
Odie Leigh              1
F.HERO                  1
Okami Sky               1
Old Mervs               1
Laura Dreyfuss          1
Name: master_metadata_album_artist_name, Length: 1105, dtype: int64