# Exploring Spotify and other stuffs

## Research Questions

How have popular songs changed over the years in terms of their musical qualities? Key, tempo, rhythm etc

How have new technologies influenced the music people listen to? Do danceability / energy change? 

Is there a correlation between specific qualities of music and artist sales/success? As measured by sales or as measured by other things such as awards/news mentions/social media followers?

Is there a correlation between artists social media followers and sales/success?

Is there a correlation between artists' followers on social media and streams?

## Data cleaning
- Have an initial draft of your data cleaning appendix
- Document every step that takes your raw data files and turns it into the analysis-ready dataset that you would submit with your final project
- All of your data cleaning code should be found in this section and you may want to explain the steps of your data cleaning in words as well


In [24]:
import pandas as pd

raw_year_data = pd.read_csv("data_by_year.csv")
raw_genre_data = pd.read_csv("data_by_genres.csv")
raw_artist_data = pd.read_csv("data_by_artist.csv")

#Removing Relevant Columns
new_year_data = raw_year_data[['year','danceability', 'duration_ms','instrumentalness','liveness','loudness','tempo','valence','popularity']]
new_genre_data = raw_genre_data[['genres','danceability', 'duration_ms','instrumentalness','liveness','loudness','tempo','valence','popularity']]
new_artist_data = raw_artist_data[['artists','danceability', 'duration_ms','instrumentalness','liveness','loudness','tempo','valence','popularity']]

new_artist_data.head()

#Cleaning Artist Data

new_artist_data.drop_duplicates()
new_artist_data.iloc[0:27261] #removed artists with names that aren't English


#Cleaning Genre Data
new_genre_data.iloc[2:] #removing first two non-sensiscal rows
#do we want to try and consolidate some of these genres? There's over 2000 and many overlap ...
new_genre_data.loc[:,'genres'] = [rows.lower() for rows in reviews.loc[:,'product_type']]


Unnamed: 0,genres,danceability,duration_ms,instrumentalness,liveness,loudness,tempo,valence,popularity
2,a cappella,0.577017,193652.204709,0.003799,0.127087,-12.770211,111.813230,0.453186,43.351819
3,abstract,0.459500,343018.500000,0.791400,0.119480,-14.092000,124.743200,0.304990,41.500000
4,abstract beats,0.694400,233824.400000,0.349403,0.102453,-6.699800,119.398400,0.634187,58.600000
5,abstract hip hop,0.723132,249095.103216,0.002853,0.168032,-7.216007,112.160287,0.584392,43.804971
6,accordeon,0.626333,162613.333333,0.616000,0.252667,-10.736667,114.522000,0.543667,28.666667
...,...,...,...,...,...,...,...,...,...
2659,zolo,0.560365,267545.304878,0.152665,0.190792,-11.499268,123.283566,0.596705,33.760410
2660,zouglou,0.834000,295147.000000,0.000000,0.082800,-13.455000,119.039000,0.951000,56.000000
2661,zouk,0.752762,295109.952381,0.301195,0.083224,-10.864476,101.681762,0.844381,42.476190
2662,zouk riddim,0.776000,229333.000000,0.565000,0.044500,-14.316000,99.981000,0.966000,24.000000


## Data description 
- Have an initial draft of your data description section
- Your data description should be about your analysis-ready data

## Data limitations 
- Identify any potential problems with your dataset

## Exploratory data analysis
- Perform an (initial) exploratory data analysis

## Questions for reviewers
- We want to consolidate the genres we have (currently there are over 2000 unique genres), but we're not sure the best way to do this. Is it easiest to do by hand? 
