# Ed Sheeran's Spotify Song Analysis 


Welcome to the world of Ed Sheeran! In this project, we will read and analyze some interesting facts form Ed Sheeran's song list on Spotify. This dataframe contains several variables, including:




| Variable Name | Data Type | Explanation |
| -------- | ------- | ------- |
| `'uri'`  | str | Unique identifier for the song in Spotify. |
| `'id'`  | str | Unique id for the song in Spotify. |
| `'album'`  | str | Album name. |
| `'name'`  | str | Song name. |
| `'release_date'`  | str | Release year, month and date |
| `'track_number'`  | int | The number of the track on the specified disc. |
| `'popularity'`  | int | 0 to 100 scale of the current popularity of the song. |
| `'danceability'`  | float | 0 to 1 scale of how suitable a track is for dancing. |
| `'energy'`  | float | 0 to 1 scale of how energitic a track is.|
| `'loudness'`  | float | The average loudness of a track, measured on a relative scale in decibels. Values typically range between -60 (softer) and 0 (louder). |
| `'speechiness'`  | float | 0 to 1 scale measuring the prevalence of spoken words. |
| `'acousticness'`  | float | 0 to 1 scale measuring how likely a track is to be acoustic. |
| `'instrumentalness'`  | float | 0 to 1 scale measuring how likely a track is to be instrumental (without vocals). 
| `'liveness'`  | float | 0 to 1 scale measuring how likely a track is to have been recorded with a live audience.|
| `'valence'`  | float | 0 to 1 scale of how positive or happy a track is. |
| `'tempo'`  | float | The estimated number of beats per minute. |
| `'duration_ms'`  | int | Length of song in milliseconds. |


## Download libraries


Some necessary libraries that should be predownloaded before running include pandas, numpy, and matplotlib


In [None]:
!pip install matplotlib

In [None]:
import pandas as pd
import numpy as np
import os

from matplotlib import pyplot as plt

plt.style.use('ggplot')
plt.rcParams["figure.figsize"] = (10, 5)






Import functions and test functions from my module

In [None]:


from My_module.functions import *



# Part1: Data Processing
By running this function, we will read the csv file and set the variable data to the dataframe. 

In [None]:
file = pd.read_csv('ed_sheeran_spotify.csv')

data = data_processing(file)

data








# Part2: Get the best album 


By running this function, it will return the album name that has the highest value of the input trait/variable in the dataframe. Examples of traits may include 'popularity', 'danceability', 'valence', and so forth.

In the following case, we try to use the trait 'popularity' as our parameter to calculate. The returned album name is the album with the higest value of popularity.

In [6]:
best_album(data, 'popularity')

  album_sorted = data.groupby('album').mean()


'÷ (Deluxe)'

# Part3: Get the song recommender

By running this function, it will return a song name that has the highest similarity to the input song based on the input trait. 

In the following case, we try to use the trait 'loudness' as our parameter to calculate. The returned song name is the song that is the most similar to the input song 'Supermarket Flowers' based on comparing their loudness values. In other words, the song returned and the input song 'Supermarket Flowers' has the most similar loudness value.

In [8]:

song_recommender('Supermarket Flowers', data, 'loudness')

'Firefly - Bravado Dubstep Remix'

# Part4: Get the best season

By running this function, it will return a season that has the highest value of the input trait.

In the following case, we try to use the trait as our parameter to calculate. The returned season 'summer' means that songs released in summer have the highest total value of liveness in comparison with other songs released in spring, fall, or winter.

In [10]:
best_season(data, 'liveness')

'summer'

# Part5: Get the correlation plot

By running this code, it will return a plot that reflects the relationship between two input traits of our choices.

In the following case, it plots the line of the best fit and the scatter plot showing the correlation between the trait 'valence' and the trait 'tempo'.

In [None]:
relation_graph(data, 'valence', 'tempo')

# Extra credit 


For this project, I spent 6-8 hours on learning about numpy and pandas documentations. Having zero background in coding before taking this class, I found it extremely challenging for me to utilize different libraries to achieve my purpose. Personally, knowing how a lot of tasks could be easily achieve by merely inputing parameters and checking the corresponding documentations of numpy, I tried to double the efforts and create multiple tasks in a function. Going beyond ploting, getting the max value, locating a specific row, I attempted to combine multiple tasks at once. For instance, in my function song_recommender, I had to not only creating for loop to  but also to calculating distance and variance values, storing values in numpy arrays, and creating new dataframe for better representation of the tasks. Throughout the whole designing process, I went beyond the minimum requirement and enrich myself with techniques of ploting, sorting, locating, and even learning about the usage of Euclidean distance formula in data analysis. It frustrated me a lot when i have to go back and forth to fix my code with small little errors like "loc" and "iloc" and to run pytest multiple times, but I found this period of exploration, trial and errors so rewarding and satisfying. 

Side note!!
