In [None]:
from google.colab import files
uploaded = files.upload()

Saving archive.zip to archive (1).zip


In [None]:
import pandas as pd
import zipfile

# Unzip the uploaded archive
# Assuming 'bts.csv' is directly inside 'archive (1).zip'
with zipfile.ZipFile('archive (1).zip', 'r') as zip_ref:
    zip_ref.extractall()

df = pd.read_csv('bts.csv')
print(df.shape)
df.head()

(444, 34)


Unnamed: 0,id,album_title,eng_album_title,album_rd,album_seq,track_title,raw_track_title,eng_track_title,lyrics,hidden_track,...,spotify_track_mode,spotify_track_speechiness,spotify_track_acousticness,spotify_track_instrumentalness,spotify_track_liveness,spotify_track_valence,spotify_track_tempo,spotify_track_time_signature,eng_lyrics_source_url,eng_lyrics_credits
0,BTS-1,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,1,Intro: 2 Cool 4 Skool (ft. DJ Friz),,Intro: 2 Cool 4 Skool (ft. DJ Friz),we're now going to progress to some steps\nwhi...,False,...,1.0,0.245,0.179,0.266,0.179,0.532,94.871,4.0,,
1,BTS-2,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,2,We Are Bulletproof Pt.2,,We Are Bulletproof Pt.2,(what) give it to me\n (what) be nervous\n (wh...,False,...,0.0,0.16,0.0104,6e-06,0.134,0.868,144.02,4.0,,
2,BTS-3,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,3,Skit: Circle Room Talk,,Skit: Circle Room Talk,rap monster: it was a big hit\nv: year 2006!\n...,False,...,1.0,0.802,0.912,0.0,0.913,0.817,121.045,3.0,,
3,BTS-4,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,4,No More Dream,,No More Dream,"hey, what's your dream?\n hey, what's your dre...",False,...,1.0,0.47,0.0118,2e-06,0.431,0.594,167.898,4.0,,
4,BTS-5,2 Cool 4 Skool,2 Cool 4 Skool,2013-06-12,5,Interlude,,Interlude,,False,...,0.0,0.319,0.494,0.762,0.392,0.854,125.897,4.0,,


In [None]:
# Keep only the columns we need
cols = [
    'eng_album_title', 'album_rd', 'track_title', 'eng_track_title',
    'lang', 'spotify_track_danceability', 'spotify_track_energy',
    'spotify_track_valence', 'spotify_track_tempo',
    'spotify_track_acousticness', 'spotify_track_liveness',
    'spotify_track_speechiness'
]

df_clean = df[cols].copy()

# Rename columns to simpler names
df_clean.columns = [
    'album', 'release_date', 'track_title', 'eng_title',
    'language', 'danceability', 'energy', 'valence', 'tempo',
    'acousticness', 'liveness', 'speechiness'
]

# Convert release_date to datetime and extract year
df_clean['release_date'] = pd.to_datetime(df_clean['release_date'])
df_clean['year'] = df_clean['release_date'].dt.year

# Drop rows where audio features are missing
df_clean = df_clean.dropna(subset=['danceability', 'energy', 'valence'])

print(f"Clean dataset: {df_clean.shape}")
print(f"Years covered: {df_clean['year'].min()} - {df_clean['year'].max()}")
print(f"Albums: {df_clean['album'].nunique()}")
df_clean.head()

Clean dataset: (421, 13)
Years covered: 2013 - 2025
Albums: 54


Unnamed: 0,album,release_date,track_title,eng_title,language,danceability,energy,valence,tempo,acousticness,liveness,speechiness,year
0,2 Cool 4 Skool,2013-06-12,Intro: 2 Cool 4 Skool (ft. DJ Friz),Intro: 2 Cool 4 Skool (ft. DJ Friz),KOR,0.894,0.835,0.532,94.871,0.179,0.179,0.245,2013
1,2 Cool 4 Skool,2013-06-12,We Are Bulletproof Pt.2,We Are Bulletproof Pt.2,KOR,0.753,0.95,0.868,144.02,0.0104,0.134,0.16,2013
2,2 Cool 4 Skool,2013-06-12,Skit: Circle Room Talk,Skit: Circle Room Talk,KOR,0.598,0.356,0.817,121.045,0.912,0.913,0.802,2013
3,2 Cool 4 Skool,2013-06-12,No More Dream,No More Dream,KOR,0.438,0.864,0.594,167.898,0.0118,0.431,0.47,2013
4,2 Cool 4 Skool,2013-06-12,Interlude,Interlude,,0.914,0.276,0.854,125.897,0.494,0.392,0.319,2013


# üéµ BTS: A Data-Driven Story of Global Domination

BTS debuted in 2013 as a small K-pop group from South Korea.
By 2020, they were breaking Billboard records previously held by The Beatles.
But *how* did they do it? Was it luck ‚Äî or is there a pattern hidden in the music itself?

In this project, we analyze BTS's full discography using Spotify audio features
to uncover how their sound evolved, what changed when they went global,
and what the data says about their formula for success.

**Dataset:** 421 BTS tracks | 2013‚Äì2025 | 54 albums  
**Features analyzed:** Energy, Valence (happiness), Danceability, Acousticness, Liveness

In [None]:
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## üìà Chart 1: How Did BTS's Sound Change Over Time?

We start by looking at three core audio features across every year of their career:
- **Energy** ‚Äî how intense and loud the music feels
- **Valence** ‚Äî how happy or positive the song sounds
- **Danceability** ‚Äî how easy it is to dance to

If BTS intentionally shifted their sound to appeal to Western audiences,
we should see changes in these features around 2019‚Äì2020 when they went mainstream globally.


In [None]:
yearly = df_clean.groupby('year')[['energy', 'valence', 'danceability']].mean().reset_index()

fig = px.line(
    yearly, x='year', y=['energy', 'valence', 'danceability'],
    title='üéµ How BTS\'s Sound Evolved (2013‚Äì2025)',
    markers=True,
    labels={'value': 'Score (0‚Äì1)', 'year': 'Year', 'variable': 'Feature'},
    color_discrete_map={
        'energy': '#8B5CF6',
        'valence': '#EC4899',
        'danceability': '#06B6D4'
    }
)
fig.update_layout(
    plot_bgcolor='#0f0f0f',
    paper_bgcolor='#0f0f0f',
    font_color='white',
    title_font_size=20,
    legend_title_text='Audio Feature'
)
fig.show()

## üåç Chart 2: Did Going English Change Their Sound?

BTS released their first all-English single *Dynamite* in 2020, followed by *Butter* in 2021 ‚Äî
both of which topped the Billboard Hot 100.

But did their English songs actually *sound* different from their Korean ones?
Here we compare the average audio features between Korean and English tracks.

In [None]:
lang_df = df_clean[df_clean['language'].isin(['KOR', 'ENG'])].copy()
lang_avg = lang_df.groupby('language')[['danceability', 'energy', 'valence']].mean().reset_index()

fig2 = px.bar(
    lang_avg.melt(id_vars='language', var_name='feature', value_name='score'),
    x='feature', y='score', color='language', barmode='group',
    title='üåç Korean vs English Songs ‚Äî Audio Feature Comparison',
    color_discrete_map={'KOR': '#8B5CF6', 'ENG': '#EC4899'},
    labels={'score': 'Average Score', 'feature': 'Audio Feature'}
)
fig2.update_layout(
    plot_bgcolor='#0f0f0f',
    paper_bgcolor='#0f0f0f',
    font_color='white',
    title_font_size=20
)
fig2.show()

## üï∏Ô∏è Chart 3: Sound Profile Across Eras

BTS's career can be divided into four distinct creative eras:
- **Early Era (2013‚Äì2015):** Raw, hip-hop influenced debut years
- **HYYH / Wings Era (2016‚Äì2018):** Darker, more emotional and experimental
- **Map of the Soul (2019‚Äì2020):** Polished, global mainstream sound
- **Proof / Solo Era (2021‚Äì2025):** Members pursuing individual styles

Does the data reflect these artistic shifts?

In [None]:
era_map = {
    range(2013, 2016): 'Early Era (2013‚Äì2015)',
    range(2016, 2019): 'HYYH / Wings Era (2016‚Äì2018)',
    range(2019, 2021): 'Map of the Soul (2019‚Äì2020)',
    range(2021, 2026): 'Proof / Solo Era (2021‚Äì2025)'
}

def get_era(year):
    for r, label in era_map.items():
        if year in r:
            return label
    return 'Other'

df_clean['era'] = df_clean['year'].apply(get_era)

features = ['danceability', 'energy', 'valence', 'acousticness', 'liveness']
era_avg = df_clean.groupby('era')[features].mean()

fig3 = go.Figure()
colors = ['#8B5CF6', '#EC4899', '#06B6D4', '#F59E0B']

for i, era in enumerate(era_avg.index):
    values = era_avg.loc[era].tolist()
    values += values[:1]
    fig3.add_trace(go.Scatterpolar(
        r=values,
        theta=features + [features[0]],
        fill='toself',
        name=era,
        line_color=colors[i]
    ))

fig3.update_layout(
    polar=dict(radialaxis=dict(visible=True, range=[0, 1])),
    title='üï∏Ô∏è BTS Sound Profile Across Eras',
    paper_bgcolor='#0f0f0f',
    font_color='white',
    title_font_size=20
)
fig3.show()

## üî• Chart 4: Album Mood Heatmap

This heatmap gives us a bird's eye view of every major BTS album's audio personality.
Darker purple = higher score in that feature.

Look for patterns ‚Äî which albums were the most energetic?
Which were the most acoustic and mellow?
And does *PERMISSION TO DANCE ON STAGE* really stand out from the rest?

In [None]:
# Get top 15 albums by song count
top_albums = df_clean['album'].value_counts().head(15).index
heatmap_df = df_clean[df_clean['album'].isin(top_albums)].groupby('album')[features].mean()

fig4 = px.imshow(
    heatmap_df,
    title='üî• BTS Album Mood Heatmap',
    color_continuous_scale='Purples',
    aspect='auto',
    labels=dict(color='Score')
)
fig4.update_layout(
    paper_bgcolor='#0f0f0f',
    font_color='white',
    title_font_size=20
)
fig4.show()

## üí° Key Findings: What the Data Revealed

After analyzing 421 BTS tracks across 12 years, here's what the numbers uncovered ‚Äî
and some of it will surprise you.

---

### üîç Aha Moment #1: More Famous = Sadder Music?
Most people assume that as an artist gets more successful, their music gets happier.
BTS defies this completely.

Their **valence (happiness score) hit its lowest point in 2018‚Äì2019** ‚Äî
the exact years they were breaking world records and selling out stadiums.
The data suggests that as the pressure and fame grew, their music got *darker and more emotional* ‚Äî
and fans loved them even more for it.

> **The insight:** BTS's emotional vulnerability, not their cheerfulness, is what made them global superstars.

---

### üîç Aha Moment #2: English Songs Are a Different Product
When we compare Korean vs English tracks, the data shows English songs score
significantly higher in danceability and valence ‚Äî it's not a coincidence.

BTS essentially created **two parallel identities** ‚Äî
deep, emotional Korean music for their core fanbase (ARMY),
and bright, radio-friendly English hits to capture Western charts.
Most artists pick one lane. BTS ran both simultaneously.

> **The insight:** Dynamite and Butter weren't just songs ‚Äî they were a calculated sonic strategy.

---

### üîç Aha Moment #3: Their Sound Barely Changed ‚Äî But Everything Else Did
Looking at the radar chart, all four eras have surprisingly similar shapes.
Energy and danceability stayed relatively consistent throughout their career.

What actually changed was their *production quality, storytelling, and global marketing* ‚Äî
not the core DNA of their music.

> **The insight:** BTS didn't reinvent themselves to go global. They stayed true to their sound
> and brought the world to them instead.

---

### üéØ Final Takeaway
BTS's rise wasn't luck. The data shows a group that strategically balanced
emotional depth with commercial appeal, vulnerability with high energy,
and Korean identity with global accessibility.

**That balance ‚Äî backed by data ‚Äî is the real formula behind their dominance.**