#### It was a summer evening in 2016 when I ventured to my Saavn (an outdated music app in India) account and discovered Love Story by Taylor Swift. Since then, over 120 Taylor Swift songs have made their way onto my playlists, and I proclaim myself as a mildly obsessed "Swiftie".

#### Like any sane data scientist/Swiftie, I searched near and far for projects that could combine both my loves, and that's when I decided to create a dataset from Spotify and build a notebook on it.

#### With *the Tortured Poets Department* album on the horizon, there couldn't be a better time for this analysis.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Exploring our data:

In [None]:
features = pd.read_csv("/kaggle/input/taylor-swift-the-myth-the-legend/taylorswift-Features.csv")
tracks = pd.read_csv("/kaggle/input/taylor-swift-the-myth-the-legend/taylorswift-Tracks.csv")

In [None]:
features.head()

In [None]:
tracks.head()

In [None]:
print(len(features)==len(tracks))

##### As both dataframes have the same size, we now they contain the same tracks. We will now merge them to get one large dataframe for ease in analysis.

In [None]:
df = pd.merge(features, tracks, how='inner', left_on='track_name', right_on='name')

In [None]:
df.info()

In [None]:
df['album_name'] = df['album_name_y']

In [None]:
df = df[['track_name', 'album_name', 'release_date', 'duration', 'popularity', 'explicit','energy', 'key', 'loudness', 'danceability', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]

In [None]:
df.head()

In [None]:
df.info()

#### We've obtained our relevant dataset.

## Let's start our analysis and visualisations.

In [None]:
import plotly.express as px

In [None]:
df = df.sort_values(by='popularity', ascending=False)
df_features = df[['popularity', 'explicit', 'energy', 'key', 'loudness', 'danceability', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]

In [None]:
correlation_matrix = df_features.corr()

# Create a heatmap using Plotly Express
fig = px.imshow(correlation_matrix,
                labels=dict(color="Correlation"),
                x=correlation_matrix.index,
                y=correlation_matrix.columns,
                title="Correlation Heatmap of Features")

# Display the plot
fig.show()

In [None]:
df['album_name'].unique()

In [None]:
# Filter DataFrame for selected albums
selected_albums = ['Taylor Swift', "Speak Now (Taylor's Version)", "Fearless (Taylor's Version)",
                   "Red (Taylor's Version)", "1989 (Taylor's Version) [Deluxe]", 'reputation',"Lover",
                   'folklore (deluxe version)', 'evermore (deluxe version)', 'Midnights (The Til Dawn Edition)']
df_selected = df[df['album_name'].isin(selected_albums)]

# Define custom colors for each album
custom_colors = {'Taylor Swift': 'lightgreen', 
                 "Speak Now (Taylor's Version)": 'purple', 
                 "Fearless (Taylor's Version)": 'yellow', 
                 "Red (Taylor's Version)": 'red', 
                 "1989 (Taylor's Version) [Deluxe]": 'lightblue', 
                 'reputation': 'black', 
                 'Lover':'pink',
                 'folklore (deluxe version)': 'grey', 
                 'evermore (deluxe version)': 'brown', 
                 'Midnights (The Til Dawn Edition)': 'darkblue'}

fig1 = px.box(data_frame=df_selected, x='album_name', y='popularity', 
                                  title='Popularity Distribution by Album',
                                  color='album_name',
                                  color_discrete_map=custom_colors,
                                  labels={'popularity': 'Popularity', 'album_name': 'Album'})

# Show the plot
fig1.show()


#### Based on the correlation matrix, let's obtain some data:

In [None]:
fig2 = px.histogram(data_frame=df, x='danceability', y='popularity', 
                    title='Average Popularity by Danceability',
                    color_discrete_sequence=['pink'],
                    histfunc='avg',
                   nbins = 10)

fig2.update_layout(bargap=0.2)
fig2.show()

In [None]:
fig3 = px.histogram(data_frame=df, x='speechiness', y='popularity', 
                    title='Average Popularity by Speechiness',
                    color_discrete_sequence=['lightblue'],
                    histfunc='avg',
                   nbins=10)

fig3.update_layout(bargap=0.2)
fig3.show()

In [None]:
fig4 = px.histogram(data_frame=df, x='acousticness', y='popularity', 
                    title='Average Popularity by Acousticness',
                    color_discrete_sequence=['green'],
                    histfunc='avg',
                   nbins=10)

fig4.update_layout(bargap=0.2)
fig4.show()

In [None]:
fig4 = px.histogram(data_frame=df, x='liveness', y='popularity', 
                    title='Average Popularity by Liveness',
                    color_discrete_sequence=['purple'],
                    histfunc='avg',
                   nbins=10)

fig4.update_layout(bargap=0.2)
fig4.show()

In [None]:
scatter_matrix = px.scatter_matrix(df, dimensions=['energy', 'danceability','loudness', 'liveness', 'key', 'acousticness', 'popularity'],
                                   color='popularity',
                                   title='Scatter Plot Matrix')
scatter_matrix.show()

#### These were some fun visualisations! We can analyse the following from them:
1. There is a significant positive correlation between popularity and...
    * danceability
    * speechiness
2. There is a significant negative correlation between popularity and...
    * liveness
    * acousticness
3. There are other observable relations as well, visible from the correlation and scatter matrix.

In [None]:
df_album = df[['popularity', 'explicit', 'energy', 'key', 'loudness', 'danceability', 
               'speechiness', 'acousticness', 'instrumentalness', 'liveness', 
               'valence', 'tempo', 'album_name']].groupby('album_name').mean().reset_index()
df_album = df_album.sort_values(by='popularity', ascending=False)
df_album = df_album.head(10)
df_album.reset_index(inplace=True)
df_album.drop('index', axis=1, inplace=True)
df_album

In [None]:
df_album['album_name']

In [None]:
album_genre_mapping = {
    "reputation": "Electropop",
    "Lover": "Pop",
    "Speak Now (Taylor's Version)": "Country",
    "Red (Taylor's Version)": "Rock",
    "folklore (deluxe version)": "Alternative",
    "evermore (deluxe version)": "Alternative",
    "Fearless (Taylor's Version)": "Country",
    "1989 (Taylor's Version) [Deluxe]": "Synth",
    "Midnights (The Til Dawn Edition)": "Pop",
    "Taylor Swift": "Country"
}

df_album['genre'] = df_album['album_name'].map(album_genre_mapping)

In [None]:
df_plot = df_album.groupby(['genre', 'album_name']).size().reset_index(name='count')

custom_colors = {
    'Taylor Swift': 'lightgreen',
    "Lover" : "pink",
    "Speak Now (Taylor's Version)": 'purple',
    "Fearless (Taylor's Version)": 'yellow',
    "Red (Taylor's Version)": 'red',
    "1989 (Taylor's Version) [Deluxe]": 'lightblue',
    'reputation': 'black',
    'folklore (deluxe version)': 'grey',
    'evermore (deluxe version)': 'brown',
    'Midnights (The Til Dawn Edition)': 'darkblue'
}

# Create a bar chart with custom colors
fig = px.bar(df_plot, x='genre', y='count', color='album_name', color_discrete_map=custom_colors,
             labels={'genre_mapped': 'Genre', 'count': 'Count'}, barmode='stack')
fig.update_layout(title='Genre Distribution with Custom Colors')
fig.show()

## This was fun! 