Step 1: Upload and Combine Your Spotify Streaming History Data
Spotify allows you to download your entire streaming history as multiple JSON files, often split by year or date ranges. To analyze your listening habits comprehensively, the first step is to combine these separate JSON files into a single dataset.

Assuming you have uploaded your exported Spotify streaming history files to your working environment (e.g., Google Colab or your local machine), you can load and merge them using Python’s built-in json module.

Here’s a straightforward approach:

In [None]:
import os
import json
from glob import glob
# Path to your folder with streaming history
folder_path = '/content/drive/MyDrive/Spotify Extended Streaming History'


# Use glob to find all .json files in the folder
json_files = sorted(glob(os.path.join(folder_path, '*.json')))


# Display found files (optional)
print(f"Found {len(json_files)} JSON files.")
# Create a list to store all streaming history records
combined_data = []


for file_path in json_files:
   with open(file_path, 'r', encoding='utf-8') as f:
       data = json.load(f)
       combined_data.extend(data)


print(f"Total records combined: {len(combined_data)}")

After merging all your individual Spotify streaming history files into one comprehensive dataset, it’s good practice to save this combined data for easy access and reproducibility in future analyses.

You can do this simply by writing the combined list back to a JSON file, as shown below:

In [None]:
# Path to save the combined file
output_path = '/content/drive/MyDrive/Spotify_Extended_Streaming_Combined.json'


# Save combined data
with open(output_path, 'w', encoding='utf-8') as f:
   json.dump(combined_data, f, ensure_ascii=False, indent=2)


print(f"Combined JSON saved to: {output_path}")

Step 2: Load the Combined Data into a DataFrame and Preview
With your combined Spotify streaming history saved, the next step is to load this data into a pandas DataFrame. This will make it easier to analyze and manipulate your listening records using Python’s data tools.

Here’s how you can convert your combined JSON data into a DataFrame and take an initial look at the first few rows:

In [None]:
import pandas as pd


# Convert to DataFrame for further analysis
df = pd.DataFrame(combined_data)
df.head()

Step 3: Convert Timestamps and Extract Time Features
To perform time-based analysis on your Spotify listening history, we first need to convert the timestamp strings into datetime objects. Then, we can extract useful components like year, month, day, weekday, and hour. These new columns will help us explore listening trends across different time periods. For example, which hours or days you listen the most.

Here’s how to do it with pandas:

In [None]:
df['ts'] = pd.to_datetime(df['ts'])
df['year'] = df['ts'].dt.year
df['month'] = df['ts'].dt.month
df['day'] = df['ts'].dt.day
df['weekday'] = df['ts'].dt.day_name()  # e.g. Monday
df['hour'] = df['ts'].dt.hour          # to analyze listening by hour
df[['ts', 'year', 'month', 'day', 'weekday', 'hour']].head()

Step 4: Checking Basic Listening Statistics
After adding the time features, it’s a good idea to start with some basic statistics to understand your overall listening habits over the years.

We begin by calculating the total listening time (in milliseconds) per year and convert it into hours for easier interpretation:

In [None]:
df_filtered = df[df['year'] >= 2021]
ms_played_by_year_filtered = ms_played_by_year[ms_played_by_year['year'] >= 2021]
ms_played_by_year_filtered


ms_played_by_year_filtered = ms_played_by_year[ms_played_by_year['year'] >= 2021].copy()
ms_played_by_year_filtered['hours_played'] = ms_played_by_year_filtered['hours_played'].round(2)
ms_played_by_year_filtered

Exploring Monthly Listening Patterns with a Pivot Table
To dig deeper into your listening habits, let’s create a pivot table that shows total listening hours broken down by year and month. This will help visualize seasonal trends or identify months with particularly high or low listening activity.

We start by filtering the data for recent years (2021 and later), then convert milliseconds played to hours. Next, we build a pivot table where rows are years and columns are months (1 to 12), with the sum of hours played as the values. To keep the output tidy and easy to read, we round the hours to one decimal place and make sure months are ordered correctly.

In [None]:
df_filtered = df[df['year'] >= 2021].copy()
# Add column with hours
df_filtered['hours_played'] = df_filtered['ms_played'] / (1000 * 60 * 60)


# Pivot table: years as rows, months as columns
pivot_table = df_filtered.pivot_table(
   index='year',
   columns='month',
   values='hours_played',
   aggfunc='sum'
)
pivot_table = pivot_table.round(1)
pivot_table = pivot_table.sort_index(axis=1)  # Ensure months are in order
pivot_table

Displaying this pivot table provides a clear view of how listening hours vary across months and years, highlighting any seasonal trends or shifts in listening behavior. Step 5: Drawing first insights
Let’s find out which artists you listened to the most. This part of the analysis focuses on your Spotify activity in 2024 and helps you see how much time you spent listening to each artist. You’ll start by filtering your dataset to include only plays from 2024. Then, you’ll group the data by artist name and calculate the total listening time in milliseconds. To make things easier to interpret, you’ll convert that time into hours.

Once you’ve got the total hours, you’ll sort the artists from most to least listened to and keep the top 10based on total playtime. Finally, you’ll rename the columns so the output is clean and easy to read or export. This gives you a clear, ranked list of your most played artists in 2024 and shows who dominated your playlists throughout the year.



In [None]:
df_2024 = df[df['year'] == 2024].copy()


# Group by artist, sum ms_played
artists_2024 = df_2024.groupby('master_metadata_album_artist_name')['ms_played'].sum().reset_index()
# Convert ms_played to hours
artists_2024['hours_played'] = artists_2024['ms_played'] / (1000 * 60 * 60)
artists_2024['hours_played'] = artists_2024['hours_played'].round(2)


# Sort descending and get top 10
top_artists_2024 = artists_2024.sort_values(by='hours_played', ascending=False).head(10)
# Rename columns for clarity
top_artists_2024.columns = ['Artist', 'Milliseconds Played', 'Hours Played']
top_artists_2024

Analyzing Your Top Albums of 2024

To find out which albums took up most of your listening time in 2024, you’ll start by filtering your data to include only that year. This gives you a clean dataset focused on your 2024 activity. Then, you’ll group the data by album and artist, and calculate the total listening time in milliseconds for each album. To make the results easier to understand, you’ll convert the listening time into hours.

Once that’s done, you’ll sort the albums by total hours played, from highest to lowest, and display your top 10. The result is a quick, personalized snapshot of your most-played albums of the year — a simple way to see which records truly soundtracked your 2024.

In [None]:
df_2024 = df[df['year'] == 2024].copy()


albums_2024 = df_2024.groupby(
   ['master_metadata_album_artist_name', 'master_metadata_album_album_name']
)['ms_played'].sum().reset_index()


albums_2024['hours_played'] = albums_2024['ms_played'] / (1000 * 60 * 60)
albums_2024['hours_played'] = albums_2024['hours_played'].round(2)


top_albums_2024 = albums_2024.sort_values(by='hours_played', ascending=False).head(10)
top_albums_2024

We can also check the most listened songs with this code:

In [None]:
df_2024 = df[df['year'] == 2024].copy()


songs_2024 = df_2024.groupby(
   ['master_metadata_album_artist_name', 'master_metadata_track_name']
)['ms_played'].sum().reset_index()


songs_2024['hours_played'] = songs_2024['ms_played'] / (1000 * 60 * 60)
songs_2024['hours_played'] = songs_2024['hours_played'].round(2)

top_songs_2024 = songs_2024.sort_values(by='hours_played', ascending=False).head(10)

top_songs_2024.columns = ['Artist', 'Track', 'Milliseconds Played', 'Hours Played']
top_songs_2024

This analysis offers a focused snapshot of your listening habits in 2024, but it can be easily adapted to explore other years, like 2023 or 2021, or even applied to the entire dataset to uncover all-time trends and shifts in your music preferences over time.

Step 6: Looking at trends
Now you can dig deeper into when you actually listen to music, not just what you listen to. This block of code helps you explore your listening patterns throughout 2024 by examining trends across the year, week, and even specific hours of the day.

First, make sure your timestamp column is in the correct datetime format, then extract useful details like the year, month, weekday, and hour. Focus only on plays from 2024 and calculate how many hours you spent listening across three time-related dimensions.

For monthly trends, group all streams by month and sum the total listening time in hours. This will help you spot any seasonal shifts in your habits — for example, if you listen more during the summer or certain busy months.

Next, look at day-of-week trends by summing the hours listened on each day, from Monday to Sunday. This shows whether you tend to play music more on weekends or weekdays.

Finally, break things down by hour to see when during the day you listen the most — whether that’s early morning, during work hours, or late at night.

Together, these insights give you a clear picture of your daily and weekly rhythms. It’s a great starting point for visualizing your personal listening habits and creating your own version of “Spotify Wrapped” focused on when you press play.

In [None]:
# Convert ts column to datetime if not done yet
df['ts'] = pd.to_datetime(df['ts'])


# Extract features
df['year'] = df['ts'].dt.year
df['month'] = df['ts'].dt.month
df['day_of_week'] = df['ts'].dt.day_name()  # e.g. Monday, Tuesday
df['hour'] = df['ts'].dt.hour

df_2024 = df[df['year'] == 2024].copy()
monthly = df_2024.groupby('month')['ms_played'].sum().reset_index()
monthly['hours_played'] = (monthly['ms_played'] / (1000*60*60)).round(2)
print(monthly)
dow = df_2024.groupby('day_of_week')['ms_played'].sum().reset_index()
dow['hours_played'] = (dow['ms_played'] / (1000*60*60)).round(2)

# Sort days in week order
days_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow['day_of_week'] = pd.Categorical(dow['day_of_week'], categories=days_order, ordered=True)
dow = dow.sort_values('day_of_week')

print(dow)
hourly = df_2024.groupby('hour')['ms_played'].sum().reset_index()
hourly['hours_played'] = (hourly['ms_played'] / (1000*60*60)).round(2)
print(hourly)

Having the output — we can plot it to see some graphical insights. Let’s start with Monthly Listening Hours:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


# Monthly listening hours
plt.figure(figsize=(10,5))
sns.barplot(data=monthly, x='month', y='hours_played', palette='viridis')
plt.title('Monthly Listening Hours in 2024')
plt.xlabel('Month')
plt.ylabel('Hours Played')
plt.show()

Now we can jump to Listening Hours by Day of Week:

In [None]:
# Day of week listening hours
plt.figure(figsize=(10,5))
sns.barplot(data=dow, x='day_of_week', y='hours_played', palette='magma')
plt.title('Listening Hours by Day of Week in 2024')
plt.xlabel('Day of Week')
plt.ylabel('Hours Played')
plt.show()
Press enter or click to view image in full size


And finally — Listening Hours by Hour of Day:

In [None]:
# Hourly listening hours
plt.figure(figsize=(12,5))
sns.lineplot(data=hourly, x='hour', y='hours_played', marker='o')
plt.title('Listening Hours by Hour of Day in 2024')
plt.xlabel('Hour of Day')
plt.ylabel('Hours Played')
plt.xticks(range(0,24))
plt.show()
Press enter or click to view image in full size


Step 7: Defining streaks and having a look at seasonality
Let’s define some streaks. You can process your listening data to find the songs you played on the most consecutive days in 2024. Start by filtering your dataset to include only records from that year and convert each timestamp to just the date. Then remove any duplicate entries for the same song on the same day so repeated plays don’t affect the results.

Next, use a custom function to calculate the longest streak of consecutive listening days for each track. This function goes through the sorted list of dates for each song and counts how many days in a row you listened to it without missing a day.

After calculating the streaks, add up the total number of hours each song was played throughout the year. Combine both pieces of information — the longest streak and total playtime — into one table. Finally, sort the songs by their longest streaks and display the top ten. This will give you a clear view of which tracks stayed in your rotation most consistently over time.

In [None]:
df_2024 = df[df['year'] == 2024].copy()
df_2024['date'] = df_2024['ts'].dt.date
df_songs = df_2024[['master_metadata_track_name', 'date']].drop_duplicates()


def longest_streak(dates):
   dates = sorted(dates)
   max_streak = 1
   current_streak = 1
   for i in range(1, len(dates)):
       if (dates[i] - dates[i-1]).days == 1:
           current_streak += 1
       else:
           max_streak = max(max_streak, current_streak)
           current_streak = 1
   max_streak = max(max_streak, current_streak)
   return max_streak

# Calculate longest consecutive days streak per song
streaks = df_songs.groupby('master_metadata_track_name')['date'].apply(longest_streak).reset_index()
streaks.columns = ['track_name', 'longest_streak_days']

# Join with total hours played for extra info
total_played = df_2024.groupby('master_metadata_track_name')['ms_played'].sum().reset_index()
total_played['hours_played'] = total_played['ms_played'] / (1000 * 60 * 60)

result = pd.merge(streaks, total_played[['master_metadata_track_name', 'hours_played']],
                 left_on='track_name', right_on='master_metadata_track_name').drop('master_metadata_track_name', axis=1)

# Sort by longest streak and pick top 10
top_streak_songs = result.sort_values(by='longest_streak_days', ascending=False).head(20)


top_streak_songs

Now, you can analyze your listening habits from 2023 by finding the songs you played on the highest number of distinct days throughout the year. Start by filtering your dataset to include only entries from 2023, then convert each timestamp to just the calendar date. With this cleaned data, calculate how many unique days you listened to each track, no matter how many times you played it on those days.

To give more context, also compute the total number of hours each song was played during the year. Then combine these two metrics — unique listening days and total listening time — into one table.

Finally, sort the songs by the number of unique days they appeared in your listening history and show the top ten. This will help you spot which tracks were part of your routine over time, not just ones you binged in short bursts.

In [None]:
df_2023 = df[df['year'] == 2023].copy()


# Extract date only from timestamp
df_2023['date'] = df_2023['ts'].dt.date


# Group by track and count unique days listened
unique_days = df_2023.groupby('master_metadata_track_name')['date'].nunique().reset_index()
unique_days.columns = ['track_name', 'unique_days_listened']


# Optional: add total hours played per song for context
total_played = df_2023.groupby('master_metadata_track_name')['ms_played'].sum().reset_index()
total_played['hours_played'] = total_played['ms_played'] / (1000 * 60 * 60)


result = pd.merge(unique_days, total_played[['master_metadata_track_name', 'hours_played']],
                 left_on='track_name', right_on='master_metadata_track_name').drop('master_metadata_track_name', axis=1)


# Sort by most unique days listened
top_songs_by_unique_days = result.sort_values(by='unique_days_listened', ascending=False).head(10)


print(top_songs_by_unique_days)

Now, let’s explore your top artists by season. This code will help you see which artists you listened to the most during each season of the year.

First, it sorts your data by season and listening time (hours played), making sure the artists with the highest playtime appear at the top within each season.

Next, it ranks the artists within each season based on how many hours you spent listening to them. This way, you can identify the leaders for each part of the year.

Then, it filters the data to keep only the top few artists per season — for example, the top three artists you listened to the most in winter, spring, summer, and fall.

After that, the code creates easy-to-read labels combining the season, the artist’s rank, their name, and total hours played.

Finally, it prints out these labels so you get a clear, formatted list showing your seasonal favorite artists and how much you listened to them. This is a great way to understand how your music taste changes with the seasons!

In [None]:
import pandas as pd

# Copy your dataframe (assuming df has a datetime column 'ts')
df['ts'] = pd.to_datetime(df['ts'])

# Define a function to map month to season
def month_to_season(month):
   if month in [12, 1, 2]:
       return 'Winter'
   elif month in [3, 4, 5]:
       return 'Spring'
   elif month in [6, 7, 8]:
       return 'Summer'
   else:
       return 'Autumn'


# Add season column
df['season'] = df['ts'].dt.month.apply(month_to_season)

# Filter for a specific year if desired, e.g. 2024
df_2024 = df[df['ts'].dt.year == 2024].copy()

# Group by season and artist, summing listening time (ms_played)
season_artist = df_2024.groupby(
   ['season', 'master_metadata_album_artist_name']
)['ms_played'].sum().reset_index()

# Convert ms_played to hours
season_artist['hours_played'] = season_artist['ms_played'] / (1000 * 60 * 60)
season_artist['hours_played'] = season_artist['hours_played'].round(2)

# Sort by season and descending hours_played
season_artist_sorted = season_artist.sort_values(['season', 'hours_played'], ascending=[True, False])


# Rank artists within each season based on hours_played
season_artist_sorted['rank'] = season_artist_sorted.groupby('season')['hours_played'].rank(method='first', ascending=False)


# Filter top N artists per season (e.g., top 3)
top_n = 3
top_season_artists = season_artist_sorted[season_artist_sorted['rank'] <= top_n].copy()


# Create formatted label with season, rank and artist name
top_season_artists['label'] = top_season_artists.apply(
   lambda x: f"{x['season']} Top {int(x['rank'])}: {x['master_metadata_album_artist_name']} ({x['hours_played']} hrs)",
   axis=1
)


# Print the formatted labels
for label in top_season_artists['label']:
   print(label)

By digging into your Spotify listening data from different angles — whether it’s total playtime, unique listening days, streaks, or seasonal favorites — you get a much richer picture of your music habits. This kind of exploration not only reveals what you listened to but also when and how consistently those songs and artists showed up in your life.

As you experiment with these analyses, you might discover surprising patterns or rediscover forgotten favorites that marked certain moments of your year. Music is deeply personal, and tracking your listening journey can be a fun way to reflect on your moods, routines, and even how the seasons influence your vibe.

So why stop here? Use these methods as a starting point to customize your own musical story. Share your findings, compare with friends, or even challenge yourself to shake up your playlists. After all, understanding your listening habits is the first step to making your soundtrack even more meaningful.

What trends do you see in your data? I’d love to hear about your own discoveries!