# The Pandemic's Effect on Music: Exploration

Authors: Natnael Mekonnen, Sonya, and Daniel Parker

## Introduction

Why did we choose this topic? What is the importance?
Do we anticipate to be any changes?

## 1. Data Scraping

To explore the pandemic's effect on music, we chose to compare the COVID data in the US with the top 10 list on Spotify since January 2020. The COVID data will be extracted from the Atlantic's [The COVID Tracking Project](https://covidtracking.com/data/national) which has been constantly updating the data everyday with representatives from 50 states, 5 territories, and District of Colombia. On the other hand, the Spotify top 10 will be extracted from the [Spotify Charts](https://spotifycharts.com/) on a weekly starting from January 2020. The charts does not have a detail information on the song so we will get more detail by querying the Spotify API and using a Kaggle dataset.  

Import the necessary libraries: 

In [4]:
import pandas as pd
import matplotlib.pyplot as pyplt
import numpy as np
import seaborn

## 1.1 Load and View COVID data

In [5]:
covid_data = pd.read_csv('national-history.csv')
covid_data.head()

Unnamed: 0,date,death,deathIncrease,inIcuCumulative,inIcuCurrently,hospitalizedIncrease,hospitalizedCurrently,hospitalizedCumulative,negative,negativeIncrease,onVentilatorCumulative,onVentilatorCurrently,positive,positiveIncrease,recovered,states,totalTestResults,totalTestResultsIncrease
0,2020-12-07,274745.0,1347,32120.0,20098.0,3614,102148.0,589486.0,163378726,1386381,3328.0,7073.0,14717065,180193,5714557.0,56,205934174,1835388
1,2020-12-06,273398.0,1146,31946.0,20145.0,2311,101501.0,585872.0,161992345,1175503,3322.0,7095.0,14536872,177801,5624609.0,56,204098786,1648306
2,2020-12-05,272252.0,2461,31831.0,19947.0,3457,101192.0,583561.0,160816842,1530133,3321.0,7006.0,14359071,212880,5576152.0,56,202450480,2190899
3,2020-12-04,269791.0,2563,31608.0,19858.0,4652,101276.0,580104.0,159286709,1260657,3305.0,6999.0,14146191,224831,5470389.0,56,200259581,1854869
4,2020-12-03,267228.0,2706,31276.0,19723.0,5331,100755.0,575452.0,158026052,1238465,3280.0,6867.0,13921360,210204,5404018.0,56,198404712,1828230


The COVID data is detailed but since this project is finding its relationship with music, most of the unnecessary columns will be dropped.

In [6]:
covid_data = covid_data.drop(['death','deathIncrease','inIcuCumulative','inIcuCurrently','hospitalizedIncrease','hospitalizedCurrently',\
                 'hospitalizedCumulative','onVentilatorCumulative','onVentilatorCurrently','recovered','states', 'totalTestResults'],\
                  axis=1)

Knowing that the Spotify data is grouped on a weekly basis starting on a Thursday, the next step is to prepare the COVID data so it is easy to merge.

In [7]:
# Creating a custom interval index for grouping the data which starts from begining of the year on a thursday and continues weekly
i = pd.to_datetime('01/08/2020')
bins = []
while i < pd.to_datetime('12/12/2020'):
    temp = i + pd.Timedelta('7 days')
    bins.append((i,temp))
    i = temp
bins = pd.IntervalIndex.from_tuples(bins)

In [8]:
# Convert the date from the dataframe to a pandas date and time format to be comuted in the cutting
covid_data['date'] = pd.to_datetime(covid_data['date'])

# Using the interval index created above, create a new column week which has the week interval of the data
covid_data['week'] = pd.cut(covid_data['date'], bins)

# Now that every row has a week interval, they will be grouped with the total for each column 
grouped = covid_data.groupby(['week']).sum()

# Number the weeks to easily identify
grouped['week#'] = list(range(1,len(bins)+1))

covid_data.head(20)

Unnamed: 0,date,negative,negativeIncrease,positive,positiveIncrease,totalTestResultsIncrease,week
0,2020-12-07,163378726,1386381,14717065,180193,1835388,"(2020-12-02, 2020-12-09]"
1,2020-12-06,161992345,1175503,14536872,177801,1648306,"(2020-12-02, 2020-12-09]"
2,2020-12-05,160816842,1530133,14359071,212880,2190899,"(2020-12-02, 2020-12-09]"
3,2020-12-04,159286709,1260657,14146191,224831,1854869,"(2020-12-02, 2020-12-09]"
4,2020-12-03,158026052,1238465,13921360,210204,1828230,"(2020-12-02, 2020-12-09]"
5,2020-12-02,156787587,982032,13711156,195796,1459202,"(2020-11-25, 2020-12-02]"
6,2020-12-01,155805555,1941714,13515360,176753,2340996,"(2020-11-25, 2020-12-02]"
7,2020-11-30,153863841,1219808,13338607,147587,1603253,"(2020-11-25, 2020-12-02]"
8,2020-11-29,152644033,883148,13191020,135242,1289970,"(2020-11-25, 2020-12-02]"
9,2020-11-28,151760885,1276935,13055778,154522,1709566,"(2020-11-25, 2020-12-02]"
