# Spotify Listening History Initial Analysis
---
- This notebook contains the **raw streaming history data** exported from Spotify for the period **2022-2023** and **2023-2025**. 

- The purpose of this notebook is to **import and preview the raw JSON data**, check the essential columns, and save a cleaned CSV version for further analysis.

In [1]:
#Add libraries
import pandas as pd
import json

In [4]:
#Load the first JSON file (2022-2023)
with open("Streaming_History_Audio_2022-2023_6.json", "r", encoding="utf-8") as f:
    data_2022_2023 = json.load(f)

#Convert to a DataFrame
df_2022_2023 = pd.DataFrame(data_2022_2023)

#Peek into the first 5 rows
df_2022_2023.head()

Unnamed: 0,ts,platform,ms_played,conn_country,ip_addr,master_metadata_track_name,master_metadata_album_artist_name,master_metadata_album_album_name,spotify_track_uri,episode_name,...,audiobook_uri,audiobook_chapter_uri,audiobook_chapter_title,reason_start,reason_end,shuffle,skipped,offline,offline_timestamp,incognito_mode
0,2022-07-08T14:28:19Z,"iOS 15.5 (iPhone13,2)",0,US,174.215.249.249,Rhyme Or Reason,Eminem,The Marshall Mathers LP2,spotify:track:45JxylFwaThjLsRBuzcfoL,,...,,,,playbtn,endplay,True,False,False,,False
1,2022-07-08T14:28:20Z,"iOS 15.5 (iPhone13,2)",661,US,174.215.249.249,My Fault,Eminem,The Slim Shady LP,spotify:track:3CNbDbMHxYHxEgrPGf89yc,,...,,,,playbtn,endplay,True,False,False,,False
2,2022-07-08T14:28:21Z,"iOS 15.5 (iPhone13,2)",725,US,174.215.249.249,Just the Way You Are,Bruno Mars,Doo-Wops & Hooligans,spotify:track:7BqBn9nzAq8spo5e7cZ0dJ,,...,,,,playbtn,endplay,True,False,False,,False
3,2022-07-08T14:28:21Z,"iOS 15.5 (iPhone13,2)",661,US,174.215.249.249,Marsh,Eminem,Music To Be Murdered By - Side B,spotify:track:1qFip2ddjfGlKkXDhfBnBr,,...,,,,playbtn,endplay,True,False,False,,False
4,2022-07-08T14:28:22Z,"iOS 15.5 (iPhone13,2)",682,US,174.215.249.249,"Can You Feel the Love Tonight - From ""The Lion...",Joseph Williams,Disney Summer Songs,spotify:track:2XQee7HP5QrDwUGHOl6GFf,,...,,,,playbtn,endplay,True,False,False,,False


### **^Raw data loaded (sensitive columns hidden for privacy)^**

In [5]:
#Keep only the essential columns
df_2022_2023 = df_2022_2023[[
    "ts",
    "ms_played",
    "master_metadata_track_name",
    "master_metadata_album_artist_name",
    "master_metadata_album_album_name"
]]

#Rename columns for readability
df_2022_2023.rename(columns={
    "ts": "endTime",
    "ms_played": "msPlayed",
    "master_metadata_track_name": "trackName",
    "master_metadata_album_artist_name": "artistName",
    "master_metadata_album_album_name": "albumName"
}, inplace=True)

#Peek at the cleaned DataFrame
df_2022_2023.head()

Unnamed: 0,endTime,msPlayed,trackName,artistName,albumName
0,2022-07-08T14:28:19Z,0,Rhyme Or Reason,Eminem,The Marshall Mathers LP2
1,2022-07-08T14:28:20Z,661,My Fault,Eminem,The Slim Shady LP
2,2022-07-08T14:28:21Z,725,Just the Way You Are,Bruno Mars,Doo-Wops & Hooligans
3,2022-07-08T14:28:21Z,661,Marsh,Eminem,Music To Be Murdered By - Side B
4,2022-07-08T14:28:22Z,682,"Can You Feel the Love Tonight - From ""The Lion...",Joseph Williams,Disney Summer Songs


In [6]:
#Save filtered DataFrame as a .csv
df_2022_2023.to_csv("Spotify_2022_2023.csv", index=False)

In [7]:
#Load the second JSON file (2023-2025)
with open("Streaming_History_Audio_2023-2025_7.json", "r", encoding="utf-8") as f:
    data_2023_2025 = json.load(f)

#Convert to DataFrame
df_2023_2025 = pd.DataFrame(data_2023_2025)

#Peek at the first few rows
df_2023_2025.head()

Unnamed: 0,ts,platform,ms_played,conn_country,ip_addr,master_metadata_track_name,master_metadata_album_artist_name,master_metadata_album_album_name,spotify_track_uri,episode_name,...,audiobook_uri,audiobook_chapter_uri,audiobook_chapter_title,reason_start,reason_end,shuffle,skipped,offline,offline_timestamp,incognito_mode
0,2023-05-03T22:30:21Z,ios,768,US,68.229.178.117,Grenade,Bruno Mars,Doo-Wops & Hooligans,spotify:track:2tJulUYLDKOg9XrtVkMgcJ,,...,,,,clickrow,endplay,False,True,False,1683153000.0,False
1,2023-05-03T22:30:25Z,ios,4160,US,174.215.248.238,The Lazy Song,Bruno Mars,Doo-Wops & Hooligans,spotify:track:1ExfPZEiahqhLyajhybFeS,,...,,,,clickrow,unexpected-exit-while-paused,False,False,False,1683153000.0,False
2,2023-05-05T19:54:16Z,ios,89493,US,174.198.68.254,Nikki Sixx,Doobie,Nikki Sixx,spotify:track:4U1j4Di9S3dSqckssFU8kl,,...,,,,clickrow,fwdbtn,False,True,False,1683316000.0,False
3,2023-05-05T20:02:18Z,ios,14826,US,174.198.68.254,Icy Titties,Doobie,Icy Titties,spotify:track:4MBxX1P21a0yEZJEHA7zw1,,...,,,,fwdbtn,endplay,False,True,False,1683316000.0,False
4,2023-05-05T20:02:19Z,ios,320,US,174.198.68.254,When the Drugs Don't Work,Doobie,When the Drugs Don't Work,spotify:track:0i39HdwYbAl5YCEbnBUWeG,,...,,,,clickrow,endplay,False,True,False,1683317000.0,False


### **^Raw data loaded (sensitive columns hidden for privacy)^**

In [9]:
#Keep only the essential columns
df_2023_2025 = df_2023_2025[[
    "ts",
    "ms_played",
    "master_metadata_track_name",
    "master_metadata_album_artist_name",
    "master_metadata_album_album_name"
]]

#Rename columns for readability
df_2023_2025.rename(columns={
    "ts": "endTime",
    "ms_played": "msPlayed",
    "master_metadata_track_name": "trackName",
    "master_metadata_album_artist_name": "artistName",
    "master_metadata_album_album_name": "albumName"
}, inplace=True)

#Peek at the cleaned DataFrame
df_2023_2025.head()

Unnamed: 0,endTime,msPlayed,trackName,artistName,albumName
0,2023-05-03T22:30:21Z,768,Grenade,Bruno Mars,Doo-Wops & Hooligans
1,2023-05-03T22:30:25Z,4160,The Lazy Song,Bruno Mars,Doo-Wops & Hooligans
2,2023-05-05T19:54:16Z,89493,Nikki Sixx,Doobie,Nikki Sixx
3,2023-05-05T20:02:18Z,14826,Icy Titties,Doobie,Icy Titties
4,2023-05-05T20:02:19Z,320,When the Drugs Don't Work,Doobie,When the Drugs Don't Work


In [10]:
#Save filtered DataFrame as a .csv
df_2023_2025.to_csv("Spotify_2023_2025.csv", index=False)

## Data Preparation Summary | 2022-2023 & 2023-2025 file
---
- Imported Spotify streaming history JSON data for 2022-2023
- Selected essential features and renamed them for readability
    - **Timestamp of track playback** (`endTime`)
    - **Track duration** (`msPlayed`)
    - **Track Name, artist, and album** (`trackName`),(`artistName`), and (`albumName`)
- Saved clean CSV for later analysis
- Repeated on the 2023-2025 JSON file