<a href="https://colab.research.google.com/github/jelainhart/SpotifyAnalyzer/blob/main/Spotify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import textwrap
from collections import Counter
from google.colab import userdata
import google.generativeai as genai

In [2]:
df = pd.read_csv("spotify_history.csv")

df.head()

Unnamed: 0,spotify_track_uri,ts,platform,ms_played,track_name,artist_name,album_name,reason_start,reason_end,shuffle,skipped
0,2J3n32GeLmMjwuAzyhcSNe,2013-07-08 02:44:34,web player,3185,"Say It, Just Say It",The Mowgli's,Waiting For The Dawn,autoplay,clickrow,False,False
1,1oHxIPqJyvAYHy0PVrDU98,2013-07-08 02:45:37,web player,61865,Drinking from the Bottle (feat. Tinie Tempah),Calvin Harris,18 Months,clickrow,clickrow,False,False
2,487OPlneJNni3NWC8SYqhW,2013-07-08 02:50:24,web player,285386,Born To Die,Lana Del Rey,Born To Die - The Paradise Edition,clickrow,unknown,False,False
3,5IyblF777jLZj1vGHG2UD3,2013-07-08 02:52:40,web player,134022,Off To The Races,Lana Del Rey,Born To Die - The Paradise Edition,trackdone,clickrow,False,False
4,0GgAAB0ZMllFhbNc3mAodO,2013-07-08 03:17:52,web player,0,Half Mast,Empire Of The Sun,Walking On A Dream,clickrow,nextbtn,False,False


In [3]:
df['ts'] = pd.to_datetime(df['ts'], errors='coerce')
df = df.dropna(subset=['ts', 'artist_name'])

In [4]:
top_artists = df['artist_name'].value_counts().head(5)
skip_rate = df['skipped'].mean()
df['hour'] = df['ts'].dt.hour
most_common_hour = df['hour'].mode()[0]
shuffle_rate = df['shuffle'].mean()

In [5]:
prompt = f"""
You are an AI that summarizes Spotify listening behavior in a fun and friendly tone.

Here’s the user’s behavior:
- Top 5 artists: {', '.join(top_artists.index)}
- Percentage of songs skipped: {skip_rate:.2%}
- Most common hour of listening: {most_common_hour}:00
- Shuffle usage rate: {shuffle_rate:.2%}

Write a paragraph summarizing their music habits.
"""

In [6]:
api_key = userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=api_key)
model = genai.GenerativeModel('gemini-2.5-flash')

response = model.generate_content(prompt)
wrapped = textwrap.fill(response.text, width=80)

print("Your Music Summary:\n")
print(wrapped)

Your Music Summary:

Alright, let's dive into your Spotify world! You've got an amazing ear for both
timeless classics and modern jams, with rock legends like The Beatles, Bob
Dylan, and Paul McCartney sharing your top spots alongside the arena-rock energy
of The Killers and the smooth stylings of John Mayer. Clearly, you appreciate a
good blend of old and new! When it comes to listening, you're quite the night
owl, with midnight being your most common jam session hour. And you love a good
surprise, letting the music lead the way by hitting shuffle almost 75% of the
time. Plus, you're rarely hitting skip (only 5.25%!), which means your ears are
usually very happy campers!


In [7]:
artist_skips = df.groupby('artist_name')['skipped'].value_counts().unstack(fill_value=0)
artist_skips.columns = ['finished', 'skipped']
artist_skips = artist_skips[artist_skips.sum(axis=1) >= 5]
artist_skips['skip_rate'] = artist_skips['skipped'] / (artist_skips['skipped'] + artist_skips['finished']) * 100;
most_skipped_artist = artist_skips.sort_values(by='skip_rate', ascending=False).head(5)
most_finished_artist = artist_skips.sort_values(by='skip_rate', ascending=True).head(5)

print("Most Skipped Artists:")
print(most_skipped_artist[['skipped', 'finished', 'skip_rate']])

print("\nMost Finished Artists:")
print(most_finished_artist[['skipped', 'finished', 'skip_rate']])

Most Skipped Artists:
                skipped  finished  skip_rate
artist_name                                 
Will Champlin         8         0      100.0
Major Lazer           5         0      100.0
Frikstailers          7         0      100.0
OMI                   8         0      100.0
Ellie Goulding        6         0      100.0

Most Finished Artists:
                      skipped  finished  skip_rate
artist_name                                       
Natema                      0         6        0.0
J Alvarez                   0         7        0.0
Isla de Caras               0         9        0.0
Ludwig van Beethoven        0         5        0.0
Ludwig Göransson            0         5        0.0


In [8]:
from datetime import timedelta

df['date'] = df['ts'].dt.date

listening_days = sorted(df['date'].unique())

def longest_streak(dates):
    longest = 0
    current = 1
    for i in range(1, len(dates)):
        if dates[i] == dates[i-1] + timedelta(days=1):
            current += 1
        else:
            longest = max(longest, current)
            current = 1
    return max(longest, current)

streak = longest_streak(listening_days)
print(f"Longest listening streak: {streak} days")

Longest listening streak: 78 days
