# Data Collection and Cleaning

# Project summary and overview 

Here's a brief overview of the project (specifically what the inputs and outputs are) so that it's easier to follow:


*   Goal: predict popularity of hip-hop songs based on audio features, lyrics, and background information
*   Spotify API has a metric for each track called "Popularity" which ranges from 0-100. We use this to determine Popularity categories for each song i.e. low, medium or high (which we define later mathematically). The goal is to predict the popularity category. 
*   We use the Spotify API to collect input features including audio features; MusixMatch API for lyrics; webscraping for early-life background info from wikipedia. 
*   Instead of using an existing dataset, we created our own. We made 2 playlists on my own Spotify account, one with popular hiphop songs and one with underrated/not as popular/up and coming songs. We took these from existing playlists with similar names. We then created a function to extract audio features from the spotify api for each track in the playlist and then concatenated the dataframes row-wise to have one big playlist. 
*   This way, we were able to bypass the API's 100 song limit per playlist and this can even be extended in the future for as many songs as needed. 








# Setting up the Spotify API

In [19]:
!pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [20]:
#Using the key provided by Spotify to use the Spotify API
credentials = SpotifyClientCredentials(client_id="c9a7eac5e2d945bba0de1e3d35f48a13", client_secret="7920f5afbed94dbc8557837822fa5685")
sp = spotipy.Spotify(client_credentials_manager = credentials, requests_timeout=10, retries=10)

# Getting the Playlist Data using Spotify API

In [21]:
import pandas as pd

columns = ["Title", "Popularity", "Artist", "Album", "Number of Tracks in Album", "Related Artists", "Explicit", "Number of Segments", "brightness", "flatness", "attack strength", "danceability", "energy", "key", "loudness" ,"mode", 
            "speechiness", "instrumentalness", "liveness", "valence", "tempo", "duration_ms", "time_signature", "end_of_fade_in", "start_of_fade_out"]

The following function accepts a link to a playlist and then ouputs a dataframe where each row is a track and the columns are features of the track. 

In [22]:
def createDfForPlaylist(link):
  df = pd.DataFrame(columns = columns)
  uri = link.split("/")[4].split("?")[0] 
  tracks = sp.playlist_tracks(uri)["items"]

  for track in tracks: 

    features = {}

    # Meta-information
    features["Title"] = track["track"]["name"]
    features["Popularity"] = track["track"]["popularity"]
    features["Artist"] = track["track"]["album"]["artists"][0]["name"]
    features["Explicit"] = track["track"]["explicit"]

    #About the Album 
    features["Album"] = track["track"]["album"]["name"]
    features["Number of Tracks in Album"] = track["track"]["album"]["total_tracks"]

    # Related Artists
    related_artists = ""
    for artist in sp.artist_related_artists(track["track"]["artists"][0]["id"])["artists"]:
      related_artists += ", " + artist["name"]
    features["Related Artists"] = related_artists[2:]

    # Segment Analysis 
    segments = sp.audio_analysis(track["track"]["id"])["segments"]
    features["Number of Segments"] = len(segments)

    # Timbre Analysis 
    # According to Spotify website: Timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. 
    # It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment’s spectro-temporal surface, 
    # independently of pitch and loudness. The timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are 
    # high level abstractions of the spectral surface, ordered by degree of importance.
    # Since they're ordered by importance, I chose the first four initially. I excluded the first value because we already have loudness as a feature.
    # Each segment has a timbre vector. By averaging the components of the timbre vectors (i.e. using corresponding indices), we can get the average 
    # timbre components accross the track

    features["brightness"] = features["flatness"] = features["attack strength"] = 0
    for segment in segments:
      features["brightness"] += segment["timbre"][1]
      features["flatness"] += segment["timbre"][2]
      features["attack strength"] += segment["timbre"][3]

    for feature in columns[8:11]:
      features[feature] /= len(segments)

    # Audio Features
    audio_features = sp.audio_features(track["track"]["id"])[0]
    for feature in columns[11:23]:
      features[feature] = audio_features[feature]

    # Audio Analysis 
    audio_analysis = sp.audio_analysis(track["track"]["id"])
    for feature in columns[23: ]:
      features[feature] = audio_analysis["track"][feature]
    
    # Concatenating the data after every iteration
    track_df = pd.DataFrame(features, index = [0])
    df = pd.concat([df, track_df], ignore_index = True)

  return df

Now, we call the function on the 2 playlists we made, and then combine the playlists row-wise. We would've just combined it into one big playlist but the API only lets you take 100 songs at a time so this was how we circumvented that. 

In [8]:
playlist_link1 = "https://open.spotify.com/playlist/7jiIO3r04OiJrjgo3jyGI6?si=b479a9474fe14347"
playlist_link2 = "https://open.spotify.com/playlist/6lUInvchqIvPe9vxCkrWPN?si=5dea856b01b5483b"

df1 = createDfForPlaylist(playlist_link1)
df2 = createDfForPlaylist(playlist_link2)

dfs = [df1, df2]
df = pd.concat(dfs, ignore_index = True)

df

Unnamed: 0,Title,Popularity,Artist,Album,Number of Tracks in Album,Related Artists,Explicit,Number of Segments,brightness,flatness,...,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,end_of_fade_in,start_of_fade_out
0,God's Plan,83,Drake,Scorpion,25,"Big Sean, J. Cole, DJ Khaled, Meek Mill, Futur...",True,769,39.845464,21.759221,...,1,0.1090,0.000083,0.5520,0.357,77.169,198973,4,0.00000,185.37651
1,Lucid Dreams,83,Juice WRLD,Goodbye & Good Riddance,17,"NAV, Comethazine, Trippie Redd, Lil Tecca, Blu...",True,1002,26.918125,-22.883286,...,0,0.2000,0.0,0.3400,0.218,83.903,239836,4,0.07537,226.81831
2,SICKO MODE,81,Travis Scott,ASTROWORLD,17,"A$AP Rocky, A$AP Ferg, Chief Keef, Joey Bada$$...",True,1234,58.408027,-24.692282,...,1,0.2220,0.0,0.1240,0.446,155.008,312820,4,0.00000,307.04907
3,a lot,80,21 Savage,i am > i was,15,"Quavo, Gunna, Young Thug, Don Toliver, Huncho ...",True,1365,10.461452,-52.394945,...,1,0.0860,0.00125,0.3420,0.274,145.972,288624,4,0.06612,275.79500
4,Plug Walk,69,Rich The Kid,The World Is Yours,15,"Quality Control, Famous Dex, Baka Not Nice, Yu...",True,652,24.667761,6.185388,...,1,0.1430,0.0,0.1080,0.158,94.981,175230,4,0.09909,156.18323
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,Gwendolynn's Apprehension,46,Mick Jenkins,Pieces of a Man,17,"Saba, Pivot Gang, Kirk Knight, Rejjie Snow, AK...",True,996,9.048543,-10.534785,...,1,0.3030,0,0.1200,0.680,91.352,225533,4,0.09882,217.04272
125,on the rocks,29,Mereba,The Jungle Is The Only Way Out,13,"Marco McKinnis, Leven Kali, Jean Deaux, Rayana...",False,193,-52.385554,-33.543290,...,0,0.0561,0,0.0871,0.160,91.110,53760,4,0.31705,48.79093
126,Truman,35,Lil Dicky,Professional Rapper,20,"VIC MENSA, Bryce Vine, Chris Webby, Hoodie All...",True,2946,1.527897,13.110137,...,1,0.3470,0,0.3390,0.636,88.748,614564,4,0.08150,608.26996
127,Harlem Renaissance,28,Immortal Technique,The 3rd World,18,"Jedi Mind Tricks, Army Of The Pharaohs, R.A. T...",True,1030,66.654272,6.501915,...,0,0.2580,0,0.3920,0.752,90.736,224227,4,0.17410,221.15265


# Cleaning the Playlist Data

Just some miscellaneous cleaning for the playlist (audio features) part of the dataset.

In [9]:
# It doesn't make sense to use milliseconds, so we can convert it to minutes
df['duration_ms'] = df['duration_ms']/60/1000
df.rename(columns = {'duration_ms':'duration'}, inplace = True)

# Converting the boolean in the explicit column to explicit/clean instead of True/False for better readability
df["Explicit"] = df["Explicit"].replace({True: 'Explicit', False: 'Clean'})

# Instead of when the start and end fades are, it's more relevant to know how long the fade-ins and outs are relative to the track length
df["end_of_fade_in"] = df["end_of_fade_in"]/(df["duration"]*60)
df.rename(columns = {'end_of_fade_in':'fade in'}, inplace = True)

# since the start of fade out is given by the spotify API in terms of when in the song it occurs, we need to subtract it from the track length 
# to get how long the fade out is
df["start_of_fade_out"] = (df["duration"]*60 - df["start_of_fade_out"])/(df["duration"]*60)
df.rename(columns = {'start_of_fade_out':'fade out'}, inplace = True)

# Since longer tracks are likely to have more segments, instead of using the number of segments, 
# it's more relevant to use the average duration of a segment
df["Number of Segments"] = df["Number of Segments"]/df["duration"]
df.rename(columns = {'Number of Segments':'Average Duration of Segments'}, inplace = True)

df

Unnamed: 0,Title,Popularity,Artist,Album,Number of Tracks in Album,Related Artists,Explicit,Average Duration of Segments,brightness,flatness,...,mode,speechiness,instrumentalness,liveness,valence,tempo,duration,time_signature,fade in,fade out
0,God's Plan,83,Drake,Scorpion,25,"Big Sean, J. Cole, DJ Khaled, Meek Mill, Futur...",Explicit,231.890759,39.845464,21.759221,...,1,0.1090,0.000083,0.5520,0.357,77.169,3.316217,4,0.0,0.068333
1,Lucid Dreams,83,Juice WRLD,Goodbye & Good Riddance,17,"NAV, Comethazine, Trippie Redd, Lil Tecca, Blu...",Explicit,250.671292,26.918125,-22.883286,...,0,0.2000,0.0,0.3400,0.218,83.903,3.997267,4,0.000314,0.054277
2,SICKO MODE,81,Travis Scott,ASTROWORLD,17,"A$AP Rocky, A$AP Ferg, Chief Keef, Joey Bada$$...",Explicit,236.685634,58.408027,-24.692282,...,1,0.2220,0.0,0.1240,0.446,155.008,5.213667,4,0.0,0.018448
3,a lot,80,21 Savage,i am > i was,15,"Quavo, Gunna, Young Thug, Don Toliver, Huncho ...",Explicit,283.760186,10.461452,-52.394945,...,1,0.0860,0.00125,0.3420,0.274,145.972,4.8104,4,0.000229,0.044449
4,Plug Walk,69,Rich The Kid,The World Is Yours,15,"Quality Control, Famous Dex, Baka Not Nice, Yu...",Explicit,223.249444,24.667761,6.185388,...,1,0.1430,0.0,0.1080,0.158,94.981,2.9205,4,0.000565,0.108696
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,Gwendolynn's Apprehension,46,Mick Jenkins,Pieces of a Man,17,"Saba, Pivot Gang, Kirk Knight, Rejjie Snow, AK...",Explicit,264.97231,9.048543,-10.534785,...,1,0.3030,0,0.1200,0.680,91.352,3.758883,4,0.000438,0.037645
125,on the rocks,29,Mereba,The Jungle Is The Only Way Out,13,"Marco McKinnis, Leven Kali, Jean Deaux, Rayana...",Clean,215.401786,-52.385554,-33.543290,...,0,0.0561,0,0.0871,0.160,91.110,0.896,4,0.005898,0.092431
126,Truman,35,Lil Dicky,Professional Rapper,20,"VIC MENSA, Bryce Vine, Chris Webby, Hoodie All...",Explicit,287.618539,1.527897,13.110137,...,1,0.3470,0,0.3390,0.636,88.748,10.242733,4,0.000133,0.010241
127,Harlem Renaissance,28,Immortal Technique,The 3rd World,18,"Jedi Mind Tricks, Army Of The Pharaohs, R.A. T...",Explicit,275.613552,66.654272,6.501915,...,0,0.2580,0,0.3920,0.752,90.736,3.737117,4,0.000776,0.013711


In [10]:
#Checking if there are any missing values
df.isnull().sum()


Title                           0
Popularity                      0
Artist                          0
Album                           0
Number of Tracks in Album       0
Related Artists                 0
Explicit                        0
Average Duration of Segments    0
brightness                      0
flatness                        0
attack strength                 0
danceability                    0
energy                          0
key                             0
loudness                        0
mode                            0
speechiness                     0
instrumentalness                0
liveness                        0
valence                         0
tempo                           0
duration                        0
time_signature                  0
fade in                         0
fade out                        0
dtype: int64

# Gathering and merging other data 

Here, we collect data of the lyrics for each track and the early life info of the artist of each track using the MusixMatch API and webscraping respectively.











One factor we thought might factor into how popular an artist becomes is their background, including education, family, where they grew up, etc. To get this information, we scraped the first 5 paragraphs from Wikipedia for each of the artists and added it to our main DataFrame in the column info.

In [11]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

# We first create a new dataframe with the Title and Artist columns from the main DataFrame
df_artistInfo = df[["Title", "Artist"]]
df_artistInfo.insert(2, "info", '')

#Extracting text from wikipedia
for artist in range(df_artistInfo["Artist"].count()):
  # concatenating strings to form the wikipedia url for each artist
  wiki_link = "https://en.wikipedia.org/wiki/"
  for word in (df.loc[artist, "Artist"].split()):
    wiki_link += "_"
    wiki_link += word
  
  # using the requests library to scrape the webpage
  response = requests.get(wiki_link)
  soup = BeautifulSoup(response.text, "html")

  # scraping the first 5 <p> tags
  paras = []
  for para in soup.find_all("p")[:5]:
    paras.append(para.text.strip().split())#.string)

  # putting all the words from the list of paragraphs into a single list
  words = []
  for para in paras:
    for word in para:
      words.append(word)
  # joining all the words in words list into a single string
  wiki_intro = words[0]
  for i in range(1, len(words)):
    wiki_intro = wiki_intro + " " + words[i]

  # inserting the info string into the DataFrame
  df_artistInfo.loc[artist, "info"] = wiki_intro

In [12]:
import numpy as np

# adding the info column to main DataFrame
df["info"] = df_artistInfo["info"]

# checking if there are any artists for whom there are no wikipedia pages
conditions = [
    df["info"].str.startswith("Other") == True,
    df["info"].str.startswith("Other") == False
]

df['info'] = np.select(conditions, ["NaN", df["info"]], default=0)

#Printing the sum of NaN values. If zero, then all of our artists have wikipedia entries and our df is complete
(df["info"] == "NaN").sum()

0

In [13]:
df

Unnamed: 0,Title,Popularity,Artist,Album,Number of Tracks in Album,Related Artists,Explicit,Average Duration of Segments,brightness,flatness,...,speechiness,instrumentalness,liveness,valence,tempo,duration,time_signature,fade in,fade out,info
0,God's Plan,83,Drake,Scorpion,25,"Big Sean, J. Cole, DJ Khaled, Meek Mill, Futur...",Explicit,231.890759,39.845464,21.759221,...,0.1090,0.000083,0.5520,0.357,77.169,3.316217,4,0.0,0.068333,Drake may refer to:
1,Lucid Dreams,83,Juice WRLD,Goodbye & Good Riddance,17,"NAV, Comethazine, Trippie Redd, Lil Tecca, Blu...",Explicit,250.671292,26.918125,-22.883286,...,0.2000,0.0,0.3400,0.218,83.903,3.997267,4,0.000314,0.054277,"Jarad Anthony Higgins (December 2, 1998 – Dece..."
2,SICKO MODE,81,Travis Scott,ASTROWORLD,17,"A$AP Rocky, A$AP Ferg, Chief Keef, Joey Bada$$...",Explicit,236.685634,58.408027,-24.692282,...,0.2220,0.0,0.1240,0.446,155.008,5.213667,4,0.0,0.018448,"Jacques Bermon Webster II (born April 30, 1991..."
3,a lot,80,21 Savage,i am > i was,15,"Quavo, Gunna, Young Thug, Don Toliver, Huncho ...",Explicit,283.760186,10.461452,-52.394945,...,0.0860,0.00125,0.3420,0.274,145.972,4.8104,4,0.000229,0.044449,"Shéyaa Bin Abraham-Joseph (born October 22, 19..."
4,Plug Walk,69,Rich The Kid,The World Is Yours,15,"Quality Control, Famous Dex, Baka Not Nice, Yu...",Explicit,223.249444,24.667761,6.185388,...,0.1430,0.0,0.1080,0.158,94.981,2.9205,4,0.000565,0.108696,"Dimitri Leslie Roger (born July 13, 1992),[3][..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,Gwendolynn's Apprehension,46,Mick Jenkins,Pieces of a Man,17,"Saba, Pivot Gang, Kirk Knight, Rejjie Snow, AK...",Explicit,264.97231,9.048543,-10.534785,...,0.3030,0,0.1200,0.680,91.352,3.758883,4,0.000438,0.037645,Mick Jenkins may refer to:
125,on the rocks,29,Mereba,The Jungle Is The Only Way Out,13,"Marco McKinnis, Leven Kali, Jean Deaux, Rayana...",Clean,215.401786,-52.385554,-33.543290,...,0.0561,0,0.0871,0.160,91.110,0.896,4,0.005898,0.092431,"Marian Azeb Mereba (born September 19, 1990), ..."
126,Truman,35,Lil Dicky,Professional Rapper,20,"VIC MENSA, Bryce Vine, Chris Webby, Hoodie All...",Explicit,287.618539,1.527897,13.110137,...,0.3470,0,0.3390,0.636,88.748,10.242733,4,0.000133,0.010241,"David Andrew Burd (born March 15, 1988), bette..."
127,Harlem Renaissance,28,Immortal Technique,The 3rd World,18,"Jedi Mind Tricks, Army Of The Pharaohs, R.A. T...",Explicit,275.613552,66.654272,6.501915,...,0.2580,0,0.3920,0.752,90.736,3.737117,4,0.000776,0.013711,"Felipe Andres Coronel (born February 19, 1978)..."


This next part will be about getting the lyrics from the Musixmatch API

In [14]:
#Installing the musixmatch API
!pip install pymusixmatch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pymusixmatch
  Downloading pymusixmatch-0.3.tar.gz (7.2 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pymusixmatch
  Building wheel for pymusixmatch (setup.py) ... [?25l[?25hdone
  Created wheel for pymusixmatch: filename=pymusixmatch-0.3-py3-none-any.whl size=5873 sha256=bc28e6d8f74536e6142465baa610e7f06c50f4cb832fbd4895380f4abae9c5ab
  Stored in directory: /root/.cache/pip/wheels/71/7b/c8/8900a602def364e27277cf41a004b4116b532d553e5539a27b
Successfully built pymusixmatch
Installing collected packages: pymusixmatch
Successfully installed pymusixmatch-0.3


In [15]:
from musixmatch import Musixmatch

key = '37874e176a61635c9b8cfca6e0f0e528'
musixmatch = Musixmatch(key)

df["Lyrics"] = np.nan

allLyrics = []
songs = list(df["Title"])
artists = list(df["Artist"])

#A function that generates the lyrics for a given track (the artist is a parameter so that we can search for the track properly)
def generateLyrics(track, artist):
  body = musixmatch.matcher_lyrics_get(track, artist)
  if (body['message']['body'] == '' or body['message']['body'] == []):
    return None
  else:
    lyrics = body['message']['body']['lyrics']['lyrics_body'].split('***')[0]
    return [lyrics]


In [16]:
for i in range(len(songs)):
    song = songs[i]
    artist = artists[i]    
    body = generateLyrics(song, artist)
    
    if body == None:
        allLyrics.append("")
    else:
        allLyrics.append(body[0])


In [17]:
df["Lyrics"] = allLyrics

# removing new line characters (\n)
for i in range(df["Lyrics"].count()):
  lines_list = df.loc[i, "Lyrics"].splitlines()
  df.loc[i, "Lyrics"] = ''
  for line in lines_list:
    df.loc[i, "Lyrics"] += line
    df.loc[i, "Lyrics"] += ' '
df

Unnamed: 0,Title,Popularity,Artist,Album,Number of Tracks in Album,Related Artists,Explicit,Average Duration of Segments,brightness,flatness,...,instrumentalness,liveness,valence,tempo,duration,time_signature,fade in,fade out,info,Lyrics
0,God's Plan,83,Drake,Scorpion,25,"Big Sean, J. Cole, DJ Khaled, Meek Mill, Futur...",Explicit,231.890759,39.845464,21.759221,...,0.000083,0.5520,0.357,77.169,3.316217,4,0.0,0.068333,Drake may refer to:,"Yeah, they wishin' and wishin' and wishin' and..."
1,Lucid Dreams,83,Juice WRLD,Goodbye & Good Riddance,17,"NAV, Comethazine, Trippie Redd, Lil Tecca, Blu...",Explicit,250.671292,26.918125,-22.883286,...,0.0,0.3400,0.218,83.903,3.997267,4,0.000314,0.054277,"Jarad Anthony Higgins (December 2, 1998 – Dece...","Enviyon on the mix No, no, no, no No, no, no,..."
2,SICKO MODE,81,Travis Scott,ASTROWORLD,17,"A$AP Rocky, A$AP Ferg, Chief Keef, Joey Bada$$...",Explicit,236.685634,58.408027,-24.692282,...,0.0,0.1240,0.446,155.008,5.213667,4,0.0,0.018448,"Jacques Bermon Webster II (born April 30, 1991...","Astro' Yeah Sun is down, freezing cold That's..."
3,a lot,80,21 Savage,i am > i was,15,"Quavo, Gunna, Young Thug, Don Toliver, Huncho ...",Explicit,283.760186,10.461452,-52.394945,...,0.00125,0.3420,0.274,145.972,4.8104,4,0.000229,0.044449,"Shéyaa Bin Abraham-Joseph (born October 22, 19...",I love you Turn my headphone down a little bit...
4,Plug Walk,69,Rich The Kid,The World Is Yours,15,"Quality Control, Famous Dex, Baka Not Nice, Yu...",Explicit,223.249444,24.667761,6.185388,...,0.0,0.1080,0.158,94.981,2.9205,4,0.000565,0.108696,"Dimitri Leslie Roger (born July 13, 1992),[3][...","Ayy, ayy, plug walk (plug walk, plug, plug) I ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,Gwendolynn's Apprehension,46,Mick Jenkins,Pieces of a Man,17,"Saba, Pivot Gang, Kirk Knight, Rejjie Snow, AK...",Explicit,264.97231,9.048543,-10.534785,...,0,0.1200,0.680,91.352,3.758883,4,0.000438,0.037645,Mick Jenkins may refer to:,"We, real cool We, left school We, lurk late We..."
125,on the rocks,29,Mereba,The Jungle Is The Only Way Out,13,"Marco McKinnis, Leven Kali, Jean Deaux, Rayana...",Clean,215.401786,-52.385554,-33.543290,...,0,0.0871,0.160,91.110,0.896,4,0.005898,0.092431,"Marian Azeb Mereba (born September 19, 1990), ...",
126,Truman,35,Lil Dicky,Professional Rapper,20,"VIC MENSA, Bryce Vine, Chris Webby, Hoodie All...",Explicit,287.618539,1.527897,13.110137,...,0,0.3390,0.636,88.748,10.242733,4,0.000133,0.010241,"David Andrew Burd (born March 15, 1988), bette...",Ay Man Man Ay I guess ya'll on board Young ma...
127,Harlem Renaissance,28,Immortal Technique,The 3rd World,18,"Jedi Mind Tricks, Army Of The Pharaohs, R.A. T...",Explicit,275.613552,66.654272,6.501915,...,0,0.3920,0.752,90.736,3.737117,4,0.000776,0.013711,"Felipe Andres Coronel (born February 19, 1978)...","""Let me welcome both of you Uh, to the show th..."


Downloading the dataframe as a CSV to be used in the next notebook

In [18]:
# downloading the DataFrame as CSV
from google.colab import files
df.to_csv('df_tracks.csv', encoding = 'utf-8-sig') 
files.download('df_tracks.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

We're also downloading a file here for our concluding extension (which we'll explain in the ML notebook). For now were just downloading the df because the create df for playlist function is in this notebook. 

In [23]:
dfTest = createDfForPlaylist("https://open.spotify.com/playlist/1XpxD4svANv1h22V1rHTzf?si=72cdf2dcbac84c61")
#Cleaning 
dfTest.rename(columns = {'duration_ms':'duration'}, inplace = True)
dfTest.rename(columns = {'end_of_fade_in':'fade in'}, inplace = True)
dfTest.rename(columns = {'start_of_fade_out':'fade out'}, inplace = True)

dfTest

Unnamed: 0,Title,Popularity,Artist,Album,Number of Tracks in Album,Related Artists,Explicit,Number of Segments,brightness,flatness,...,mode,speechiness,instrumentalness,liveness,valence,tempo,duration,time_signature,fade in,fade out
0,POPPIN,15,SUHAS,POPPIN,1,"Reyaan Luthra, K4Y, Ronn, Rohan Prakash, Mm Sr...",True,410,55.505144,-42.721707,...,0,0.102,0,0.115,0.423,133.956,101050,4,0.0,98.02304


In [24]:
# downloading the DataFrame as CSV
from google.colab import files
dfTest.to_csv('suhas.csv', encoding = 'utf-8-sig') 
files.download('suhas.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>