* With your research proposal in hand, it's time to conduct the analysis in Jupyter. Provide a complete research report using the framework introduced in the previous module. The report should tell the story to your intended audience and should include compelling visualizations and actionable insights. Walk through the analysis using clean, reproducible code. Include plenty of notes and comments to guide others through your thinking.

* Along the way, consider issues in the experiment design. What bias might be influencing the analysis? Can you test for it? If you could collect new data in this domain, what changes would you make in the data collection process? What other variables or samples might be useful to test?

In [1]:
%reload_ext nb_black

<IPython.core.display.Javascript object>

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt
import ast as ast

%matplotlib inline

<IPython.core.display.Javascript object>

In [3]:
hot_100 = pd.read_csv("data/Hot_Stuff.csv")
genres = pd.read_excel("data/Hot_100_Audio_Features.xlsx")
revenue = pd.read_csv("data/Revenue_Chart_Full_Data_data_2.csv")

<IPython.core.display.Javascript object>

In [4]:
revenue

Unnamed: 0,Year of Year Date,Adjusted for Inflation Notes,Adjusted for Inflation Title,Format,Metric,Year,Value (For Charting),Adjusted for Inflation Flag,Year Date,Format Value # (Billion),Format Value # (Million),Number of Records,Total Value # (Billion),Total Value # (Million),Total Value For Year,Value (Actual),Year (copy)
0,1973,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",8 - Track,Value (Adjusted),1973,2815.681824,"(adjusted for inflation, 2019 dollars)",1973,$2.8B,,1,$11.6B,,$11611.7B,2815.681824,1973
1,1974,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",8 - Track,Value (Adjusted),1974,2848.008609,"(adjusted for inflation, 2019 dollars)",1974,$2.8B,,1,$11.4B,,$11407.1B,2848.008609,1974
2,1975,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",8 - Track,Value (Adjusted),1975,2770.409498,"(adjusted for inflation, 2019 dollars)",1975,$2.8B,,1,$11.4B,,$11350.1B,2770.409498,1975
3,1976,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",8 - Track,Value (Adjusted),1976,3047.215772,"(adjusted for inflation, 2019 dollars)",1976,$3.0B,,1,$12.3B,,$12298.0B,3047.215772,1976
4,1977,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",8 - Track,Value (Adjusted),1977,3421.416287,"(adjusted for inflation, 2019 dollars)",1977,$3.4B,,1,$14.8B,,$14769.0B,3421.416287,1977
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
427,2015,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",Vinyl Single,Value (Adjusted),2015,6.205390,"(adjusted for inflation, 2019 dollars)",2015,,$6.2M,1,$7.2B,,$7238.6B,6.205390,2015
428,2016,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",Vinyl Single,Value (Adjusted),2016,5.198931,"(adjusted for inflation, 2019 dollars)",2016,,$5.2M,1,$8.1B,,$8072.8B,5.198931,2016
429,2017,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",Vinyl Single,Value (Adjusted),2017,6.339678,"(adjusted for inflation, 2019 dollars)",2017,,$6.3M,1,$9.2B,,$9174.7B,6.339678,2017
430,2018,Inflation adjustments based on US Bureau of La...,"(Adjusted for Inflation, 2019 Dollars)",Vinyl Single,Value (Adjusted),2018,5.386197,"(adjusted for inflation, 2019 dollars)",2018,,$5.4M,1,$10.0B,,$10024.6B,5.386197,2018


<IPython.core.display.Javascript object>

In [5]:
# genres.groupby("spotify_genre").count()

<IPython.core.display.Javascript object>

In [6]:
genres["spotify_genre"].fillna("[]", inplace=True)
genres["spotify_genre"] = genres["spotify_genre"].apply(ast.literal_eval)

<IPython.core.display.Javascript object>

# Drop columns that are not needed

In [7]:
hot_100 = hot_100.drop(columns=["url"])

<IPython.core.display.Javascript object>

In [8]:
genres = genres.drop(
    columns=[
        "spotify_track_id",
        "spotify_track_preview_url",
        "spotify_track_duration_ms",
        "spotify_track_popularity",
        "danceability",
        "energy",
        "key",
        "loudness",
        "mode",
        "acousticness",
        "speechiness",
        "liveness",
        "instrumentalness",
        "valence",
        "tempo",
        "time_signature",
    ]
)

<IPython.core.display.Javascript object>

# Basic table description data

In [9]:
hot_100.shape

(320495, 9)

<IPython.core.display.Javascript object>

In [10]:
hot_100.isna().sum()

WeekID                        0
Week Position                 0
Song                          0
Performer                     0
SongID                        0
Instance                      0
Previous Week Position    30784
Peak Position                 0
Weeks on Chart                0
dtype: int64

<IPython.core.display.Javascript object>

In [11]:
hot_100.head()

Unnamed: 0,WeekID,Week Position,Song,Performer,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart
0,8/2/1958,1,Poor Little Fool,Ricky Nelson,Poor Little FoolRicky Nelson,1,,1,1
1,12/2/1995,1,One Sweet Day,Mariah Carey & Boyz II Men,One Sweet DayMariah Carey & Boyz II Men,1,,1,1
2,10/11/1997,1,Candle In The Wind 1997/Something About The Wa...,Elton John,Candle In The Wind 1997/Something About The Wa...,1,,1,1
3,7/1/2006,1,Do I Make You Proud,Taylor Hicks,Do I Make You ProudTaylor Hicks,1,,1,1
4,10/24/2009,1,3,Britney Spears,3Britney Spears,1,,1,1


<IPython.core.display.Javascript object>

In [12]:
hot_100.dtypes

WeekID                     object
Week Position               int64
Song                       object
Performer                  object
SongID                     object
Instance                    int64
Previous Week Position    float64
Peak Position               int64
Weeks on Chart              int64
dtype: object

<IPython.core.display.Javascript object>

In [13]:
genres.shape

(28492, 6)

<IPython.core.display.Javascript object>

In [14]:
genres.isna().sum()

SongID                       0
Performer                    0
Song                         0
spotify_genre                0
spotify_track_album       4755
spotify_track_explicit    4749
dtype: int64

<IPython.core.display.Javascript object>

In [15]:
genres.head()

Unnamed: 0,SongID,Performer,Song,spotify_genre,spotify_track_album,spotify_track_explicit
0,"AdictoTainy, Anuel AA & Ozuna","Tainy, Anuel AA & Ozuna",Adicto,[pop reggaeton],Adicto (with Anuel AA & Ozuna),0.0
1,The Ones That Didn't Make It Back HomeJustin M...,Justin Moore,The Ones That Didn't Make It Back Home,"[arkansas country, contemporary country, count...",,
2,ShallowLady Gaga & Bradley Cooper,Lady Gaga & Bradley Cooper,Shallow,"[dance pop, pop]",A Star Is Born Soundtrack,0.0
3,EnemiesPost Malone Featuring DaBaby,Post Malone Featuring DaBaby,Enemies,"[dfw rap, melodic rap, rap]",Hollywood's Bleeding,1.0
4,"Bacc At It AgainYella Beezy, Gucci Mane & Quavo","Yella Beezy, Gucci Mane & Quavo",Bacc At It Again,"[dfw rap, rap, southern hip hop, trap]",Bacc At It Again,1.0


<IPython.core.display.Javascript object>

In [16]:
genres.dtypes

SongID                     object
Performer                  object
Song                       object
spotify_genre              object
spotify_track_album        object
spotify_track_explicit    float64
dtype: object

<IPython.core.display.Javascript object>

# Join the 2 tables

In [17]:
full_table = hot_100.merge(genres, left_on="SongID", right_on="SongID")
full_table

Unnamed: 0,WeekID,Week Position,Song_x,Performer_x,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album,spotify_track_explicit
0,2/2/2019,1,7 Rings,Ariana Grande,7 RingsAriana Grande,1,,1,1,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
1,5/25/2019,11,7 Rings,Ariana Grande,7 RingsAriana Grande,1,10.0,1,17,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
2,4/20/2019,4,7 Rings,Ariana Grande,7 RingsAriana Grande,1,3.0,1,12,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
3,6/1/2019,12,7 Rings,Ariana Grande,7 RingsAriana Grande,1,11.0,1,18,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
4,3/30/2019,1,7 Rings,Ariana Grande,7 RingsAriana Grande,1,1.0,1,9,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
321013,8/25/1962,18,What's A Matter Baby (Is It Hurting You),Timi Yuro,What's A Matter Baby (Is It Hurting You)Timi Yuro,1,23.0,18,7,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
321014,9/1/1962,12,What's A Matter Baby (Is It Hurting You),Timi Yuro,What's A Matter Baby (Is It Hurting You)Timi Yuro,1,18.0,12,8,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
321015,9/8/1962,14,What's A Matter Baby (Is It Hurting You),Timi Yuro,What's A Matter Baby (Is It Hurting You)Timi Yuro,1,12.0,12,9,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
321016,9/15/1962,31,What's A Matter Baby (Is It Hurting You),Timi Yuro,What's A Matter Baby (Is It Hurting You)Timi Yuro,1,14.0,12,10,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0


<IPython.core.display.Javascript object>

# Analysis

In [18]:
full_table.set_index("SongID")

Unnamed: 0_level_0,WeekID,Week Position,Song_x,Performer_x,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album,spotify_track_explicit
SongID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7 RingsAriana Grande,2/2/2019,1,7 Rings,Ariana Grande,1,,1,1,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
7 RingsAriana Grande,5/25/2019,11,7 Rings,Ariana Grande,1,10.0,1,17,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
7 RingsAriana Grande,4/20/2019,4,7 Rings,Ariana Grande,1,3.0,1,12,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
7 RingsAriana Grande,6/1/2019,12,7 Rings,Ariana Grande,1,11.0,1,18,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
7 RingsAriana Grande,3/30/2019,1,7 Rings,Ariana Grande,1,1.0,1,9,Ariana Grande,7 Rings,"[dance pop, pop, post-teen pop]",thank,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
What's A Matter Baby (Is It Hurting You)Timi Yuro,8/25/1962,18,What's A Matter Baby (Is It Hurting You),Timi Yuro,1,23.0,18,7,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
What's A Matter Baby (Is It Hurting You)Timi Yuro,9/1/1962,12,What's A Matter Baby (Is It Hurting You),Timi Yuro,1,18.0,12,8,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
What's A Matter Baby (Is It Hurting You)Timi Yuro,9/8/1962,14,What's A Matter Baby (Is It Hurting You),Timi Yuro,1,12.0,12,9,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0
What's A Matter Baby (Is It Hurting You)Timi Yuro,9/15/1962,31,What's A Matter Baby (Is It Hurting You),Timi Yuro,1,14.0,12,10,Timi Yuro,What's A Matter Baby (Is It Hurting You),"[adult standards, brill building pop]",The Best Of Timi Yuro,0.0


<IPython.core.display.Javascript object>

In [19]:
full_table.groupby("spotify_track_explicit").count()

Unnamed: 0_level_0,WeekID,Week Position,Song_x,Performer_x,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album
spotify_track_explicit,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0.0,249580,249580,249580,249580,249580,249580,226746,249580,249580,249580,249580,249580,249540
1.0,31089,31089,31089,31089,31089,31089,28064,31089,31089,31089,31089,31089,31067


<IPython.core.display.Javascript object>

In [20]:
expl = full_table[full_table["spotify_track_explicit"] == 1.0]
expl.shape

(31089, 14)

<IPython.core.display.Javascript object>

In [21]:
rap = full_table[full_table["spotify_genre"].apply(lambda x: "rap" in x)]
rap.shape

(27733, 14)

<IPython.core.display.Javascript object>

In [22]:
hip_hop = full_table[full_table["spotify_genre"].apply(lambda x: "hip hop" in x)]
hip_hop.shape

(25736, 14)

<IPython.core.display.Javascript object>

In [23]:
metal = full_table["spotify_genre"].apply(lambda x: "metal" in x)
hip_hop.shape

(25736, 14)

<IPython.core.display.Javascript object>

In [24]:
rap_weeks = rap.groupby("WeekID").count()
rap_weeks.head(20)

Unnamed: 0_level_0,Week Position,Song_x,Performer_x,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album,spotify_track_explicit
WeekID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1/1/1966,1,1,1,1,1,1,1,1,1,1,1,1,1
1/1/1994,11,11,11,11,11,11,11,11,11,11,11,10,10
1/1/2000,8,8,8,8,8,7,8,8,8,8,8,8,8
1/1/2005,31,31,31,31,31,27,31,31,31,31,31,31,31
1/1/2011,18,18,18,18,18,15,18,18,18,18,18,18,18
1/10/1970,1,1,1,1,1,1,1,1,1,1,1,1,1
1/10/1987,3,3,3,3,3,3,3,3,3,3,3,3,3
1/10/1998,12,12,12,12,12,11,12,12,12,12,12,11,11
1/10/2004,24,24,24,24,24,22,24,24,24,24,24,23,23
1/10/2009,15,15,15,15,15,13,15,15,15,15,15,15,15


<IPython.core.display.Javascript object>

In [34]:
top_25 = full_table[full_table["Week Position"] <= 10]
top_25.groupby("SongID").sum()

Unnamed: 0_level_0,Week Position,Instance,Previous Week Position,Peak Position,Weeks on Chart,spotify_track_explicit
SongID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
#9 DreamJohn Lennon,19,2,23.0,19,19,0.0
'03 Bonnie & ClydeJay-Z Featuring Beyonce Knowles,65,11,66.0,51,132,11.0
'65 Love AffairPaul Davis,54,7,58.0,51,84,0.0
('til) I Kissed YouThe Everly Brothers,33,7,41.0,31,49,0.0
(Can't Live Without Your) Love And AffectionNelson,38,7,40.0,22,91,0.0
...,...,...,...,...,...,...
Yummy Yummy YummyOhio Express,24,5,44.0,24,35,0.0
ZEZEKodak Black Featuring Travis Scott & Offset,102,13,99.0,26,94,13.0
Zip-A-Dee Doo-DahBob B. Soxx And The Blue Jeans,26,3,34.0,26,24,0.0
everything i wantedBillie Eilish,8,1,74.0,8,2,0.0


<IPython.core.display.Javascript object>