* With your research proposal in hand, it's time to conduct the analysis in Jupyter. Provide a complete research report using the framework introduced in the previous module. The report should tell the story to your intended audience and should include compelling visualizations and actionable insights. Walk through the analysis using clean, reproducible code. Include plenty of notes and comments to guide others through your thinking.

* Along the way, consider issues in the experiment design. What bias might be influencing the analysis? Can you test for it? If you could collect new data in this domain, what changes would you make in the data collection process? What other variables or samples might be useful to test?

In [1]:
%reload_ext nb_black

<IPython.core.display.Javascript object>

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

<IPython.core.display.Javascript object>

In [3]:
hot_100 = pd.read_csv("data/Hot_Stuff.csv")
genres = pd.read_excel("data/Hot_100_Audio_Features.xlsx")

<IPython.core.display.Javascript object>

# Drop columns that are not needed

In [4]:
hot_100 = hot_100.drop(columns=["url"])

<IPython.core.display.Javascript object>

In [6]:
genres = genres.drop(
    columns=[
        "spotify_track_id",
        "spotify_track_preview_url",
        "spotify_track_duration_ms",
        "spotify_track_popularity",
        "danceability",
        "energy",
        "key",
        "loudness",
        "mode",
        "acousticness",
        "speechiness",
        "liveness",
        "instrumentalness",
        "valence",
        "tempo",
        "time_signature",
    ]
)

<IPython.core.display.Javascript object>

# Basic table description data

In [7]:
hot_100.shape

(320495, 9)

<IPython.core.display.Javascript object>

In [8]:
hot_100.isna().sum()

WeekID                        0
Week Position                 0
Song                          0
Performer                     0
SongID                        0
Instance                      0
Previous Week Position    30784
Peak Position                 0
Weeks on Chart                0
dtype: int64

<IPython.core.display.Javascript object>

In [9]:
hot_100.head()

Unnamed: 0,WeekID,Week Position,Song,Performer,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart
0,8/2/1958,1,Poor Little Fool,Ricky Nelson,Poor Little FoolRicky Nelson,1,,1,1
1,12/2/1995,1,One Sweet Day,Mariah Carey & Boyz II Men,One Sweet DayMariah Carey & Boyz II Men,1,,1,1
2,10/11/1997,1,Candle In The Wind 1997/Something About The Wa...,Elton John,Candle In The Wind 1997/Something About The Wa...,1,,1,1
3,7/1/2006,1,Do I Make You Proud,Taylor Hicks,Do I Make You ProudTaylor Hicks,1,,1,1
4,10/24/2009,1,3,Britney Spears,3Britney Spears,1,,1,1


<IPython.core.display.Javascript object>

In [10]:
hot_100.dtypes

WeekID                     object
Week Position               int64
Song                       object
Performer                  object
SongID                     object
Instance                    int64
Previous Week Position    float64
Peak Position               int64
Weeks on Chart              int64
dtype: object

<IPython.core.display.Javascript object>

In [11]:
genres.shape

(28492, 6)

<IPython.core.display.Javascript object>

In [12]:
genres.isna().sum()

SongID                       0
Performer                    0
Song                         0
spotify_genre             1232
spotify_track_album       4755
spotify_track_explicit    4749
dtype: int64

<IPython.core.display.Javascript object>

In [13]:
genres.head()

Unnamed: 0,SongID,Performer,Song,spotify_genre,spotify_track_album,spotify_track_explicit
0,"AdictoTainy, Anuel AA & Ozuna","Tainy, Anuel AA & Ozuna",Adicto,['pop reggaeton'],Adicto (with Anuel AA & Ozuna),0.0
1,The Ones That Didn't Make It Back HomeJustin M...,Justin Moore,The Ones That Didn't Make It Back Home,"['arkansas country', 'contemporary country', '...",,
2,ShallowLady Gaga & Bradley Cooper,Lady Gaga & Bradley Cooper,Shallow,"['dance pop', 'pop']",A Star Is Born Soundtrack,0.0
3,EnemiesPost Malone Featuring DaBaby,Post Malone Featuring DaBaby,Enemies,"['dfw rap', 'melodic rap', 'rap']",Hollywood's Bleeding,1.0
4,"Bacc At It AgainYella Beezy, Gucci Mane & Quavo","Yella Beezy, Gucci Mane & Quavo",Bacc At It Again,"['dfw rap', 'rap', 'southern hip hop', 'trap']",Bacc At It Again,1.0


<IPython.core.display.Javascript object>

In [14]:
genres.dtypes

SongID                     object
Performer                  object
Song                       object
spotify_genre              object
spotify_track_album        object
spotify_track_explicit    float64
dtype: object

<IPython.core.display.Javascript object>

# Join the 2 tables

In [15]:
full_table = hot_100.merge(genres, left_on="SongID", right_on="SongID")

<IPython.core.display.Javascript object>

# Analysis

In [16]:
full_table.set_index("SongID")
full_table.groupby("spotify_track_explicit").count()

Unnamed: 0_level_0,WeekID,Week Position,Song_x,Performer_x,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album
spotify_track_explicit,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0.0,249580,249580,249580,249580,249580,249580,226746,249580,249580,249580,249580,248185,249540
1.0,31089,31089,31089,31089,31089,31089,28064,31089,31089,31089,31089,31011,31067


<IPython.core.display.Javascript object>

In [17]:
expl = full_table[full_table["spotify_track_explicit"] == 1.0]
expl

Unnamed: 0,WeekID,Week Position,Song_x,Performer_x,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart,Performer_y,Song_y,spotify_genre,spotify_track_album,spotify_track_explicit
0,2/2/2019,1,7 Rings,Ariana Grande,7 RingsAriana Grande,1,,1,1,Ariana Grande,7 Rings,"['dance pop', 'pop', 'post-teen pop']",thank,1.0
1,5/25/2019,11,7 Rings,Ariana Grande,7 RingsAriana Grande,1,10.0,1,17,Ariana Grande,7 Rings,"['dance pop', 'pop', 'post-teen pop']",thank,1.0
2,4/20/2019,4,7 Rings,Ariana Grande,7 RingsAriana Grande,1,3.0,1,12,Ariana Grande,7 Rings,"['dance pop', 'pop', 'post-teen pop']",thank,1.0
3,6/1/2019,12,7 Rings,Ariana Grande,7 RingsAriana Grande,1,11.0,1,18,Ariana Grande,7 Rings,"['dance pop', 'pop', 'post-teen pop']",thank,1.0
4,3/30/2019,1,7 Rings,Ariana Grande,7 RingsAriana Grande,1,1.0,1,9,Ariana Grande,7 Rings,"['dance pop', 'pop', 'post-teen pop']",thank,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
320933,9/16/2000,58,What'Chu Like,Da Brat Featuring Tyrese,What'Chu LikeDa Brat Featuring Tyrese,1,49.0,26,16,Da Brat Featuring Tyrese,What'Chu Like,"['chicago rap', 'hip hop', 'hip pop', 'new jac...",Unrestricted,1.0
320934,9/23/2000,62,What'Chu Like,Da Brat Featuring Tyrese,What'Chu LikeDa Brat Featuring Tyrese,1,58.0,26,17,Da Brat Featuring Tyrese,What'Chu Like,"['chicago rap', 'hip hop', 'hip pop', 'new jac...",Unrestricted,1.0
320935,9/30/2000,65,What'Chu Like,Da Brat Featuring Tyrese,What'Chu LikeDa Brat Featuring Tyrese,1,62.0,26,18,Da Brat Featuring Tyrese,What'Chu Like,"['chicago rap', 'hip hop', 'hip pop', 'new jac...",Unrestricted,1.0
320936,10/7/2000,68,What'Chu Like,Da Brat Featuring Tyrese,What'Chu LikeDa Brat Featuring Tyrese,1,65.0,26,19,Da Brat Featuring Tyrese,What'Chu Like,"['chicago rap', 'hip hop', 'hip pop', 'new jac...",Unrestricted,1.0


<IPython.core.display.Javascript object>

In [21]:
# mask = full_table["spotify_genre"].apply(lambda x: "rap" in x)
# rap = full_table[mask]
rap = full_table[full_table["spotify_genre"].apply(lambda x: "rap" in x)]

TypeError: argument of type 'float' is not iterable

<IPython.core.display.Javascript object>