## Import Packages And Import the Taylor Swift Dataset

In [1]:
# import packages

import pandas as pd
import plotly.express as px
import plotly.io as pio

# plotly.offline needs to be imported and the following code run in order to display visualizations offline

import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)

# import warnings to hide warnings in a jupyter notebook

import warnings
warnings.filterwarnings('ignore')


In [2]:
# Import the Taylor Swift dataset

tswift = pd.read_excel('TaylorSwift.xlsx')

## Simple Exploratory Data Analytics

The data is very straight forward: Set number, song title, era, and scores according to my wife and I. Running .head() on our data set tells us this.

In [3]:
tswift.head()

Unnamed: 0,Set Number,Song Title,Era,Ben,Bethany
0,1,Miss Americana,Lover,9,6
1,2,Cruel Summer,Lover,8,10
2,3,The Man,Lover,9,7
3,4,You Need to Calm Down,Lover,10,8
4,5,Lover,Lover,6,9


tswift.info() gives us some additional information about our data. For example, we can see that there are 45 non-null rows (i.e. rows without missing data) and that some columns are int data types while others are object data types.

In [4]:
tswift.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Set Number  45 non-null     int64 
 1   Song Title  45 non-null     object
 2   Era         45 non-null     object
 3   Ben         45 non-null     int64 
 4   Bethany     45 non-null     int64 
dtypes: int64(3), object(2)
memory usage: 1.9+ KB


We can also check for any missing data by using .isnull() and .sum() on the data set. In this case, we dont have any missing values for any of our columns.

In [5]:
# checking for any missing values

tswift.isnull().sum()

Set Number    0
Song Title    0
Era           0
Ben           0
Bethany       0
dtype: int64

## How Did We Each Rank Songs As The Concert Progressed?

We ranked songs on a scale of 0-10. Interestingly, at the end of the night I didnt rank anything below a five. In my head, a score of less than 5 meant I did not enjoy a song, whereas 5 meant my opinion was neutral. I rated plenty of songs highly, but Bethany, ever the Taylor Swift diehard, only dipped below a score of 7 once. Oddly enough, it was the first song - "Americana"... I wonder what that was about?

In [6]:
# visualize ranking of songs as the concert progressed

color_discrete_map = {'Ben': 'rgb(77, 182, 255)',  # neon blue
                      'Bethany': 'rgb(255, 20, 147)'}  # neon pink

fig = px.line(tswift, 
              x='Song Title', 
              y=['Ben', 'Bethany'],
              title="Evolution of Song Ratings Over the Course of the Concert",
              color_discrete_map=color_discrete_map)

# Update the x-axis and y-axis labels
fig.update_layout(xaxis_title="Song Title", yaxis_title="Score")
fig.update_xaxes(tickangle=-45)

# Update theme and update the legend
fig.update_layout(template='plotly_white', title=dict(x=0.5, xanchor="center"), legend_title='Name')

# Increase the height of the figure
fig.update_layout(height=600)

# Remove the vertical grid lines
fig.update_xaxes(showgrid=False)

fig.show()

## Score Box Plots

It's pretty clear Bethany is the bigger Taylor Swift fan. 

In [7]:
# Create a list of scores
scores_melted = tswift.melt(value_vars=['Ben', 'Bethany'], var_name='Name', value_name='Score')

color_discrete_map = {'Ben': 'rgb(77, 182, 255)',  # neon blue
                      'Bethany': 'rgb(255, 20, 147)'}  # neon pink

fig = px.box(scores_melted, x='Name', y='Score',
            color='Name',
            color_discrete_map=color_discrete_map,
            title='Ben vs Bethany Score Summary')

# Update theme
fig.update_layout(template='plotly_white', title=dict(x=0.5, xanchor="center"))

fig.show()

## Average Score by Era

For the uninitiated, Taylor's albums are often refered to as eras, symbolizing where she was in life at the time the songs were written. Each era tends to have it's own distinct sound and feel to it, which is pretty cool. 

I wanted to see if maybe there was an era sampled at the concert where my scores made me a bigger fan than Bethany. To do this, the data had to be manipulated a bit to create a side-by-side bar chart visualizing average scores per era for both Bethany and I. Manipulation took two steps:

1. Group scores by era and obtain the mean of the scores for each era
2. Melt the data so that instead of separate columns for our scores, they were all listed in one column.

After manipulating the data, a bar chart can be created which shows our scores side by side for each era. 

### 1. Group scores by era and get the mean

In [8]:
# Calculate average scores for each era
df_avg = tswift.groupby('Era').mean().reset_index()

df_avg

Unnamed: 0,Era,Set Number,Ben,Bethany
0,1989,34.0,9.0,8.8
1,Evermore,16.333333,6.833333,9.166667
2,Fearless,8.0,7.0,9.333333
3,Folklore,28.0,6.0,8.714286
4,Lover,3.5,8.0,7.833333
5,Midnights,42.0,8.428571,9.0
6,Red,22.5,8.5,8.0
7,Reputation,16.5,9.5,10.0
8,Speak Now,25.333333,6.333333,9.0


### 2. Melt the dataframe 

In [9]:
# Melt the data to have it in long format
df_avg_melted = df_avg.melt(id_vars='Era', value_vars=['Ben', 'Bethany'],
                            var_name='ScoreBy', value_name='AverageScore')

df_avg_melted

Unnamed: 0,Era,ScoreBy,AverageScore
0,1989,Ben,9.0
1,Evermore,Ben,6.833333
2,Fearless,Ben,7.0
3,Folklore,Ben,6.0
4,Lover,Ben,8.0
5,Midnights,Ben,8.428571
6,Red,Ben,8.5
7,Reputation,Ben,9.5
8,Speak Now,Ben,6.333333
9,1989,Bethany,8.8


## Average Score By Era, Visualized!

Out of nine total eras, Bethany had a higher score for six of them. The only ones I seemed to enjoy slightly more than her were 1989 and Red.

In [10]:
color_discrete_map = {'Ben': 'rgb(77, 182, 255)',  # neon blue
                      'Bethany': 'rgb(255, 20, 147)'}  # neon pink

fig = px.bar(df_avg_melted, 
             x='Era', 
             y='AverageScore', 
             color='ScoreBy',
             barmode='group', 
             height=400, 
             title='Average Score By Era',
            color_discrete_map=color_discrete_map)

# Update theme
fig.update_layout(template='plotly_white', title=dict(x=0.5, xanchor="center"), legend_title='Scorer')

fig.show()

## Closing Thoughts

Bethany is clearly the bigger Taylor Swift fan. In each visualization, we saw that Bethany consistently rated songs higher than I did. This is not to say I didnt enjoy myself though - I had a blast! 

Scores aside, it's neat to take a random life experience such as this and apply some data analytics to it. Data is everywhere if you're willing to look, and this was a fun little project. 