## Data Analysis

Aaron Wollman, Albin Joseph, Kelsey Richardson Blackwell, Will Huang

In this notebook, we analized if the measure of “musical positiveness”in the Top 100 Hits and the US’s unemployment data have a strong correlation?  Is the correlation strong enough to predict next month? Are there other attributes besides happiness that have a stronger correlation - danceability, energy, tempo, speech?

In [None]:
%matplotlib inline

In [None]:
# Dependencies
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statistics
import numpy as np
from scipy.stats import linregress
import scipy.stats as st

In [None]:
# Constants


In [None]:
# Pull data to a DataFrame
data = pd.read_csv('../data/music_and_unemployment.csv')
data.drop('Unnamed: 0',axis=1,inplace=True)
data.head()

In [None]:
# Aaron's Code

In [None]:
# End of Aaron's Code

In [None]:
# Albin's Code

In [None]:
# End of Albin's Code


## Unemployment Rate vs. Happiness

We ran a regression for happiness ("valence") versus Unemployment Rate and discovered that the unemployment rate does not impact happiness in a Top 100 hit song. If you look at the plot below, you can visibily see the scattered data points. The r value = 0.1, which means there is almost no relationship between happiness in a song and the unemployment rate. 

So we decided to dig a littler deeper and look at the other attributes.

In [None]:
# Create a new data point "Weighted Valence"
data["weighed valence"] = data["valence"] * (101 - data["Placement"])

In [None]:
# Group by the song's date
data_gb = data.groupby(["Year", "Month", "Day"])

# Find the average of unemployment rate and weighed valence for each date
rate_v_valence = data_gb.mean()[["Unemployment Rate", "weighed valence"]]

# Create a Scatter Graph
rate_v_valence.plot(kind="scatter", x = "Unemployment Rate", y = "weighed valence")

# Calculate the correlation coefficient and linear regression model 
x_values = rate_v_valence["Unemployment Rate"]
y_values = rate_v_valence["weighed valence"]

(slope, intercept, rvalue, pvalue, stderr) = linregress(x_values, y_values)
regress_values = x_values * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))

plt.plot(x_values,regress_values,"r-")
plt.annotate(line_eq,(5,22),fontsize=15,color="red")

plt.title("Unemployment Rate vs. Valence (Happiness)")
plt.xlabel("Unemployment Rate")
plt.ylabel("Weighed Valence (Happiness)")
plt.show()

In [None]:
data["weighed energy"] = data["energy"] * (101 - data["Placement"])
data["weighed tempo"] = data["tempo"] * (101 - data["tempo"])

## Unemployment Rate vs. Energy

We ran a regression for the unemployment rate versus energy and discovered there is a positive relationship between the energy in a song and the unemployment rate.

In [None]:
# Group by the song's date
data_gb = data.groupby(["Year", "Month", "Day"])

# Find the average of unemployment rate and weighed valence for each date
rate_v_energy = data_gb.mean()[["Unemployment Rate", "weighed energy"]]

# Create a Scatter Graph
rate_v_energy.plot(kind="scatter", x = "Unemployment Rate", y = "weighed energy")

# Calculate the correlation coefficient and linear regression model 
x_values = rate_v_energy["Unemployment Rate"]
y_values = rate_v_energy["weighed energy"]

(slope, intercept, rvalue, pvalue, stderr) = linregress(x_values, y_values)
regress_values = x_values * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))

plt.plot(x_values,regress_values,"r-")
plt.annotate(line_eq,(5,22),fontsize=15,color="red")

plt.title("Unemployment Rate vs. Energy")
plt.xlabel("Unemployment Rate")
plt.ylabel("Energy")
plt.show()

## Unemployment Rate vs. Tempo

We ran a regression for the unemployment rate versus tempo and discovered there is a slight negative relationship between tempo in a song and the unemployment rate.

In [None]:
# Group by the song's date
data_gb = data.groupby(["Year", "Month", "Day"])

# Find the average of unemployment rate and weighed valence for each date
rate_v_tempo = data_gb.mean()[["Unemployment Rate", "weighed tempo"]]

# Create a Scatter Graph
rate_v_tempo.plot(kind="scatter", x = "Unemployment Rate", y = "weighed tempo")

# Calculate the correlation coefficient and linear regression model 
x_values = rate_v_tempo["Unemployment Rate"]
y_values = rate_v_tempo["weighed tempo"]

(slope, intercept, rvalue, pvalue, stderr) = linregress(x_values, y_values)
regress_values = x_values * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))

plt.plot(x_values,regress_values,"r-")
plt.annotate(line_eq,(8,-1800),fontsize=15,color="red")

plt.title("Unemployment Rate vs. Tempo")
plt.xlabel("Unemployment Rate")
plt.ylabel("Tempo")
plt.show()

In [None]:
# End of Kelsey's Code

In [None]:
# Will's Code

In [None]:
# End of Will's Code

## Conclusion

Happiness in a song did not have a strong correlation with the U.S. Employment Rate. However, we did discover that energy does have a correlation. When there is a high unemployment rate in the U.S., the top billboard songs are more likely to have higher energy than when there is a low unemployment rate.

This is not great news for Taylor Swift's new album "folklore" that came out last week.