# Introduction

### Who am I?
Hello 👋! 

My name is Mark Andal, and I am a 4th year computer engineering student at UCLA. 👨‍🔬🔧🔌💻

One of my interests is music 🎶🎵🎼, and so I thought it would be cool to explore it by asking:

### What Makes a Song Popular? 🔥
In this project, I hope to be doing analysis on what makes a hit song. Are there shared similarities based on a song's metadata (i.e. bpm, variability, etc)? What motivates people to listen to these types of songs? Are there specific target values for these songs that songwriters and producers aim for? 

As someone who considers music to be an important part of their life and someone who is interested in music production (though I have not dabbled in it yet), learning and analyzing trends related to music sounds like a very fun project. Researching this type of information could also help others to understand certain statistics and why they hear and potentially enjoy very similar trending and top songs. There is definitely no equation for creating a top song, but knowing the data and contextualizing it could influence the music industry.

### Data Sources 📊
I will be utilizing data from Spotify, scraped and collected from users. For example:
[Top Spotify Songs from 2010-2019 Dataset from Kaggle](https://www.kaggle.com/leonardopena/top-spotify-songs-from-20102019-by-year)
includes a list of the songs as well as metadata provided by Spotify, like the bpm, duration, etc. 

Additionally, I may consider using the Spotify API to collect more data on each song if necessary.

### Scope & Visualizations 🔍
I hope to find general summary statistics for all the songs (mean, median, mode, max, etc) for the metadata categories. I would like to do comparisons by year for them and try to find trends or correleations with the metadata. This will probably include a lot of line/bar/scatter plots and histograms. 


### Predictions and Insight 💡
Analyzing this data as well as other potential areas of interest (such as how this differs globally) would provide insights and trends for popular songs. Breaking down the relationships and correlations would, for example, demonstrate that people prefer songs that have a good 'danceability' rating and a high bpm. There are lots of good insights when looking at the different information regarding these popular songs. Music is a science and an art. Understanding this can help intepret the scientific approach to this art, but it also will help to gain a deeper appreciation to music making.

# Data Exploration

In [1]:
import pandas as pd

__Taking a look at some of the data__ 

In [4]:
top10s_df = pd.read_csv('top10s.csv')
top10s_df.head()

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78


__Spotify Metadata column descriptions (cross referenced the spotify website):__

__bpm__

Beats Per Minute - The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

__nrgy__

Energy - Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

__dnce__

Danceability - Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

__dB__

Loudness - The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.

__live__

Liveness - Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

__dur__

Duration - The duration of the track in seconds.

__acous__

Acousticness - A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

__spch__

Speechiness - Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

__pop__

Popularity- The higher the value the more popular the song is.

__Getting summary statistics__

In [7]:
top10s_df.describe()

Unnamed: 0.1,Unnamed: 0,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
count,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0
mean,302.0,2014.59204,118.545605,70.504146,64.379768,-5.578773,17.774461,52.225539,224.674959,14.3267,8.358209,66.52073
std,174.215384,2.607057,24.795358,16.310664,13.378718,2.79802,13.102543,22.51302,34.130059,20.766165,7.483162,14.517746
min,1.0,2010.0,0.0,0.0,0.0,-60.0,0.0,0.0,134.0,0.0,0.0,0.0
25%,151.5,2013.0,100.0,61.0,57.0,-6.0,9.0,35.0,202.0,2.0,4.0,60.0
50%,302.0,2015.0,120.0,74.0,66.0,-5.0,12.0,52.0,221.0,6.0,5.0,69.0
75%,452.5,2017.0,129.0,82.0,73.0,-4.0,24.0,69.0,239.5,17.0,9.0,76.0
max,603.0,2019.0,206.0,98.0,97.0,-2.0,74.0,98.0,424.0,99.0,48.0,99.0


Ideas:
* Making histograms to see the range in which values pop up
* 

# Data Analysis
To be completed

Ideas:
* Make a table of the means for each column per year to see if year compares
* See which artists show up frequently and the elements of their songs
* See what values correlate most with popularity
* 