![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

<body>
    <p style="font-size:28px;text-align:center"><b>Project 03 - Part 01 | Web Scrapping & API</b></p>
</body>

# Introduction

The objective of the first part of this project was collect data to answer a problem statement, practicing web scrapping and using API.

---

<body>
    <p style="font-size:20px"><b>Problem Statement</b></p>
</body>

_What some the TikTok viral songs have in common?_

---

To answer this problem, 69 songs, which were obtained from the **PopSugar** website, were analyzed. The post that contained this list of songs was made in March 27th, 2020 by Hedy Phillips around the same time people started to quarantine, because of the COVIVD-19 pandemic, and people began to use it more to spend their time at home.

The sources of information used to gather data were **Spotify**,  **Last.fm**, **Chartmetric** and **MusicBrainz**.

---

Sources:
- Websites:
  - PopSugar: https://www.popsugar.com/entertainment/popular-tiktok-songs-47289804?stream_view=1#photo-47289832
  - MusicBrainz:https://musicbrainz.org/genres
 
- APIs
  - Spotify API: https://developer.spotify.com/
  - Spotipy (Spotify API wrapper for Python): https://spotipy.readthedocs.io/en/2.15.0/
  - Last.fm API: https://www.last.fm/api
  - Chartmetric API: https://api.chartmetric.com/apidoc/

# Setup

## Import

In [1]:
import os
import re
import requests

from ast import literal_eval
from time import sleep

import numpy as np
import pandas as pd
import spotipy

from bs4 import BeautifulSoup
from dotenv import load_dotenv, find_dotenv
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
from tqdm.auto import tqdm

# Web Scrapping

The web scrapping was necessary to collect the following data:

<table>
  <thead>
    <tr>
      <th>INFORMATION</th>
      <th>SOURCE</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>69 TikTok viral songs</td>
      <td>PopSugar</td>
    </tr>
    <tr>
      <td>Music genres</td>
      <td>MusicBrainz</td>
    </tr>
  </tbody>
</table>



## List of viral songs on TikTok

### Get response

In [2]:
# Get response from the url and check it
url = 'https://www.popsugar.com/entertainment/popular-tiktok-songs-47289804?stream_view=1#photo-47289832'
response = requests.get(url)
response

<Response [200]>

### Data Collection

In [3]:
# Get the content in the url
content_popsugar = BeautifulSoup(response.text)

# Get the date the post was made
popsugar_date = content_popsugar.find('time').text.replace('\n', '').strip()

# Get only the songs and artists
popsugar_html = content_popsugar.find_all('span', attrs={'class': 'count-copy'})

In [4]:
# Conver the list 'html_popsugar' to a Pandas DataFrame
df_base = pd.DataFrame([re.split(' by ', song.text.replace('"', '').strip()) for song in popsugar_html], 
                       columns=['song', 'artists'])

# Check the result
df_base

Unnamed: 0,song,artists
0,Roxanne,Arizona Zervas
1,Say So,Doja Cat
2,My Oh My,Camila Cabello feat. DaBaby
3,Moon,Kid Francescoli
4,Vibe,Cookiee Kawaii
...,...,...
64,What the Hell,Avril Lavigne
65,Towards the Sun,Rihanna
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker"
67,Myself,Bazzi


### Data Cleaning

In [5]:
# Create a column with a list of artists for each song
df_base['artists_list'] = [re.split(',* and |, * | [Ff]eat. ', artists.strip()) for artists in df_base.artists]

# Create a column with the number of artists for each song
df_base['number_artists'] = df_base.artists_list.apply(len)

In [6]:
# Check possible number of artists for one song
df_base.number_artists.value_counts()

1    48
2    18
3     3
Name: number_artists, dtype: int64

Seeing the result above, the maximum number of artists for a song is 3.

In [7]:
# Check the dataframe
df_base.head()

Unnamed: 0,song,artists,artists_list,number_artists
0,Roxanne,Arizona Zervas,[Arizona Zervas],1
1,Say So,Doja Cat,[Doja Cat],1
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2
3,Moon,Kid Francescoli,[Kid Francescoli],1
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1


### Make backup dataframe

In [8]:
df_base_raw_bck = df_base.copy()

## List of music genre

### Get the response

In [9]:
'''# Get response from the url and check it
url = 'https://musicbrainz.org/genres'
response = requests.get(url)
response'''

"# Get response from the url and check it\nurl = 'https://musicbrainz.org/genres'\nresponse = requests.get(url)\nresponse"

### Data collection

In [10]:
'''# Get the content in the url
musicbrainz_content = BeautifulSoup(response.content)

# Create a list with the music genres listed in the url
musicbrainz_genre = [genre.text for genre in musicbrainz_content.find_all('bdi')]'''

"# Get the content in the url\nmusicbrainz_content = BeautifulSoup(response.content)\n\n# Create a list with the music genres listed in the url\nmusicbrainz_genre = [genre.text for genre in musicbrainz_content.find_all('bdi')]"

The data cleaning for this list will be made later.

# Spotify

From the Spotify and with the Spotipy's help, some data about each song will be gathered. It is relevant to point that there is a possibility that some songs may not be in the Spotify's library.

In [11]:
# Create a copy of the dataframe
df_sp = df_base.copy()

# Check the result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists
0,Roxanne,Arizona Zervas,[Arizona Zervas],1
1,Say So,Doja Cat,[Doja Cat],1
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2
3,Moon,Kid Francescoli,[Kid Francescoli],1
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1


## Connecting to the API

In [270]:
load_dotenv(find_dotenv())

True

In [271]:
cid = os.getenv('spotify_p03_key')
csecret = os.getenv('spotify_p03_secret')
cc_manager = SpotifyClientCredentials(client_id=cid, client_secret=csecret)
sp = spotipy.Spotify(client_credentials_manager=cc_manager)

## Songs

### Search information about each song

In [14]:
# Search information about each song, using the Spotipy
spotify_songs = [sp.search(q=df_base.iloc[index, 0], type='track', limit=50) for index in tqdm(df_base.index)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [15]:
# Check if there are 69 items in this list
len(spotify_songs)

69

In [16]:
# Add a column in the dataframe with the data that were just collected
df_sp['spotify_search'] = spotify_songs

In [17]:
# Check the result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...


In [18]:
# Function to add new information in a copy of the dataframe
def get_spotify_track_info(df):
    
    '''
    Filters some data of the songs and adds them to a copy of the dataframe
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_spotify_track_id = []
    list_spotify_track_duration = []
    list_spotify_track_popularity = []
    list_spotify_album_release_date = []
    list_spotify_track_explicit = []
    lists_spotify = [list_spotify_track_id, list_spotify_track_duration, list_spotify_track_popularity,
                     list_spotify_album_release_date, list_spotify_track_explicit]
    
    
    # Check for each row of the dataframe
    for index in df.index:
        
        # Information necessary from the dataframe to use during the process
        song_name = df.iloc[index, 0][:3].lower()
        artists_list = [artist.lower() for artist in df.iloc[index, 2]]
        total_artists = df.iloc[index, 3]
        mask = df.iloc[index, 4]['tracks']['items']
        
        # If the track was not found in the Spotify library, a 'not-found' string is added to the final lists
        if len(mask) == 0:
            for lst in lists_spotify:
                lst.append('not-found')
            #print(f'{index} - {song_name} - NOT FOUND')
        
        # If the track was found in the Spotify
        else:
            
            # Variable necessary to check if the information about a song has been added to the final lists
            added = 0
            
            # For each track it was listed 50 tracks related to the query 
            for idx, each_found in enumerate(mask):
                
                # Information necessary from the Spotify API to use during the process
                track_name = mask[idx]['name'].lower()
                track_id = mask[idx]['id']
                track_duration = mask[idx]['duration_ms']
                track_popularity = mask[idx]['popularity']
                album_release_date = mask[idx]['album']['release_date']
                track_explicit = mask[idx]['explicit']
                n_artists = len(mask[idx]['artists'])
                first_artist_name = mask[idx]['artists'][0]['name'].lower()
            
                # Check if the name of the song, the artists from both sources match and if an information about the
                # song has been added to the final lists 
                if ((song_name in track_name) & (total_artists == n_artists) & 
                    ((first_artist_name in artists_list) | (artists_list[0][:5] in first_artist_name)) & (added == 0)):
                    list_spotify_track_id.append(track_id)
                    list_spotify_track_duration.append(track_duration)
                    list_spotify_track_popularity.append(track_popularity)
                    list_spotify_album_release_date.append(album_release_date)
                    list_spotify_track_explicit.append(track_explicit)
                    added += 1
                    #print(f'{index} - {track_name} - {track_id}')
                        
                
                # If the track found in the search is not a math, itis the last one and information about the track 
                # has not been added to the final list, then add a 'not-found' string to the final lists
                elif (idx == len(mask) - 1) & (added == 0):
                    for lst in lists_spotify:
                        lst.append('not-found')
                    #print(f'{index} - {song_name} - NOT FOUND')
    
    # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_id'] = list_spotify_track_id
    df_copy['sp_duration_ms'] = list_spotify_track_duration
    df_copy['sp_popularity'] = list_spotify_track_popularity
    df_copy['sp_release_date'] = list_spotify_album_release_date
    df_copy['sp_explicit'] = list_spotify_track_explicit
                    
    return df_copy

In [19]:
# Add desired information to the dataframe
df_sp = get_spotify_track_info(df_sp)

# Check the result
df_sp

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True
...,...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False
65,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True
67,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False


In [20]:
# Check if songs were not found in Spotify
df_sp[df_sp.sp_id == 'not-found']

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit
11,How About Now,Bryson Tiller,[Bryson Tiller],1,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found
36,Shibuya — Chanel Funk Remix,Frank Ocean and L.Dre,"[Frank Ocean, L.Dre]",2,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found
39,WOP,J. Dash feat. Flo Rida,"[J. Dash, Flo Rida]",2,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found


3 songs were not found in Spotify. There were checked manually using the Spotify Desktop. The first two do not exist there. Finally, the third song exists, but the version with the featured artist in the dataframe does not exist there.

### Find the audio features for each song

In [21]:
# Search in the API wrapper
spotify_audio_features = [sp.audio_features(track_id)  if track_id != 'not-found'else 'not-found' 
                          for track_id in tqdm(df_sp.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [22]:
# Check if there are 69 items in this list
len(spotify_audio_features)

69

In [23]:
# Function to add new information in a copy of the dataframe
def get_spotify_audio_features(df, audio_features: list):
    
    '''
    Adds new information from a list to a copy of the dataframe
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_danceability = []
    list_energy = []
    list_key = []
    list_loudness = []
    list_mode = []
    list_speechiness = []
    list_acousticness = []
    list_instrumentalness = []
    list_liveness = []
    list_valence = []
    list_tempo = []
    list_time_signature = []
    lists_features = [list_danceability, list_energy, list_key, list_loudness, list_mode, list_speechiness, 
                      list_acousticness, list_instrumentalness, list_liveness, list_valence, list_tempo,
                      list_time_signature]
    
    # Check for each row of the dataframe
    for index in df.index:
        
        # Get the track's Spotify id
        track_id = df.iloc[index, 5]
        
        # If the track was not found in the Spotify library, a 'not-found' string is added to the final lists
        if track_id == 'not-found':
            
            for lst in lists_features:
                lst.append('not-found')
    
        # If the track was found in the Spotify library
        else:
            
            # Add the information to the final lists
            list_danceability.append(audio_features[index][0]['danceability'])
            list_energy.append(audio_features[index][0]['energy'])
            list_key.append(audio_features[index][0]['key'])
            list_loudness.append(audio_features[index][0]['loudness'])
            list_mode.append(audio_features[index][0]['mode'])
            list_speechiness.append(audio_features[index][0]['speechiness'])
            list_acousticness.append(audio_features[index][0]['acousticness'])
            list_instrumentalness.append(audio_features[index][0]['instrumentalness'])
            list_liveness.append(audio_features[index][0]['liveness'])
            list_valence.append(audio_features[index][0]['valence'])
            list_tempo.append(audio_features[index][0]['tempo'])
            list_time_signature.append(audio_features[index][0]['time_signature'])
     
     # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_danceability'] = list_danceability
    df_copy['sp_energy'] = list_energy
    df_copy['sp_key'] = list_key
    df_copy['sp_loudness'] = list_loudness
    df_copy['sp_mode'] = list_mode
    df_copy['sp_speechiness'] = list_speechiness
    df_copy['sp_acousticness'] = list_acousticness
    df_copy['sp_instrumentalness'] = list_instrumentalness
    df_copy['sp_liveness'] = list_liveness
    df_copy['sp_valence'] = list_valence
    df_copy['sp_tempo'] = list_tempo
    df_copy['sp_time_signature'] = list_time_signature
    
    return df_copy

In [24]:
# Add new information to the dataframe
df_sp = get_spotify_audio_features(df_sp, spotify_audio_features)

# Check result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0.0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4


### Final Dataframe

In [25]:
df_tracks_sp = df_sp[~(df_sp.sp_id == 'not-found')]

# Check the result
df_tracks_sp

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False,...,6,-3.689,0,0.0548,0.00472,0.0127,0.14,0.877,149.976,4
65,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False,...,4,-6.207,0,0.0392,0.0531,0,0.152,0.263,170.18,4
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True,...,7,-4.718,1,0.0379,0.0257,0,0.313,0.277,119.921,4
67,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False,...,9,-5.513,0,0.072,0.465,1.12e-06,0.0338,0.902,195.918,4


In [237]:
# Dataframe metadata
df_tracks_sp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66 entries, 0 to 68
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   song                 66 non-null     object
 1   artists              66 non-null     object
 2   artists_list         66 non-null     object
 3   number_artists       66 non-null     int64 
 4   spotify_search       66 non-null     object
 5   sp_id                66 non-null     object
 6   sp_duration_ms       66 non-null     object
 7   sp_popularity        66 non-null     object
 8   sp_release_date      66 non-null     object
 9   sp_explicit          66 non-null     object
 10  sp_danceability      66 non-null     object
 11  sp_energy            66 non-null     object
 12  sp_key               66 non-null     object
 13  sp_loudness          66 non-null     object
 14  sp_mode              66 non-null     object
 15  sp_speechiness       66 non-null     object
 16  sp_acousti

### Export dataset

In [240]:
# Export
df_tracks_sp.to_csv('exported_df/sp_tracks_info.csv', index=False)

## Artists

### Get artists ID

In [27]:
# Function to add new information in a copy of the dataframe
def get_spotify_artists_id(df):
    
    '''
    Creates a new dataframe about the artists.
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_artists_id (Pandas DataFrame): a new dataframe with the name and Spotify ID of the artists
    '''
    
    # Create auxiliary empty dictionary
    dict_artists_ids = {}
      
    # Check for each row of the dataframe
    for index in df.index:
                      
        # Information necessary from the dataframe to use during the process
        song_name = df.iloc[index, 0][:3].lower()
        artists_list = [artist.lower() for artist in df.iloc[index, 2]]
        total_artists = df.iloc[index, 3]
        mask = df.iloc[index, 4]['tracks']['items']
        added = 0
        
        # For each track it was listed 50 tracks related to the query 
        for idx, each_found in enumerate(mask):
                
            # Information necessary from the Spotify API to use during the process
            track_name = mask[idx]['name'].lower()
            n_artists = len(mask[idx]['artists'])
            first_artist_name = mask[idx]['artists'][0]['name'].lower()
            
            # Check if the name of the song, the artists from both sources match and if an information about the
            # song has been added to the final lists 
            if ((song_name in track_name) & (total_artists == n_artists) & 
                ((first_artist_name in artists_list) | (artists_list[0][:5] in first_artist_name)) & (added == 0)
                & (df.iloc[index, 5] != 'not-found')):
                
                for artist in mask[idx]['artists']:
                    artist_name = artist['name']
                    artist_id = artist['id']
                        
                    # Add to dict
                    dict_artists_ids[artist_name] = artist_id
                    
    # Create a Pandas DataFrame
    
    df_artists_id = pd.DataFrame(dict_artists_ids.items(), columns=['artist', 'sp_id'])
                    
    return df_artists_id

In [28]:
# Create a Pandas DataFrame with artists
df_artists_sp = get_spotify_artists_id(df_sp)

# Check result
df_artists_sp

Unnamed: 0,artist,sp_id
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF
3,DaBaby,4r63FhuTkUYltbVAg5TQnk
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD
...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW
85,Travis Barker,4exLIFE8sISLr28sqG1qNX
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl


In [29]:
# Search in the API wrapper
spotify_artists_info = [sp.artist(artist) for artist in  tqdm(df_artists_sp.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))




In [30]:
# Add a column in the dataframe with the data that were just collected
df_artists_sp['spotify_artist'] = spotify_artists_info

# Check the result
df_artists_sp.head()

Unnamed: 0,artist,sp_id,spotify_artist
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...


### Add information about the artists to the dataframe

In [31]:
# Function to add new information in a copy of the dataframe
def get_spotify_artist_info(df):
    
    '''
    Adds information about the artists to a copy of the dataframe.
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the artists and their Spotify ID
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_spotify_artist_genres = []
    list_spotify_artist_popularity = []
    list_spotify_artist_followers = []
    
    # Check for each row of the dataframe
    for index in df.index:
        
        sp_id = df.iloc[index, 1]
        mask = df.iloc[index, 2]
        search_sp_id = mask['id']
        #artist_name = mask['name']
        
        # Check if 'id's match
        if sp_id == search_sp_id:
            #print(f'{index} - {artist_name}: OK - {n_artist}')
            
            search_sp_genres = mask['genres']
            list_spotify_artist_genres.append(search_sp_genres)
            
            search_sp_popularity = mask['popularity']
            list_spotify_artist_popularity.append(search_sp_popularity)
            
            search_sp_followers = mask['followers']['total']
            list_spotify_artist_followers.append(search_sp_followers)
    
    # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_genres'] = list_spotify_artist_genres
    df_copy['sp_popularity'] = list_spotify_artist_popularity
    df_copy['sp_followers'] = list_spotify_artist_followers
                    
    return df_copy

In [244]:
# Add desired information to the dataframe
df_artists_sp = get_spotify_artist_info(df_artists_sp)

# Check the result
df_artists_sp.head()

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,486779
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3445699
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17989469
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4326193
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82296


### Final Dataframe

In [245]:
df_artists_sp

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,486779
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3445699
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17989469
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4326193
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82296
...,...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,{'external_urls': {'spotify': 'https://open.sp...,"[ohio hip hop, pop rap, rap]",86,2454191
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,{'external_urls': {'spotify': 'https://open.sp...,"[british indie rock, modern alternative rock, ...",78,1085134
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,{'external_urls': {'spotify': 'https://open.sp...,[rap rock],77,252326
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,{'external_urls': {'spotify': 'https://open.sp...,"[pop, post-teen pop]",83,3671755


### Export the dataframe

In [246]:
# Export
df_artists_sp.to_csv('exported_df/sp_artists_info.csv', index=False)

## Playlists

In [33]:
# Search in the API wrapper
spotify_tiktok = sp.search(q='tiktok', type='playlist', limit=50)

In [34]:
# Check the number of playlists
len(spotify_tiktok['playlists']['items'])

50

## Get playlist ID

In [268]:
# Get the playlists IDs and convert to a dataframe
sp_playlists = pd.DataFrame([playlist['id'] for playlist in spotify_tiktok['playlists']['items']], columns=['sp_playlist_id'])
sp_playlists.head()

Unnamed: 0,sp_playlist_id
0,37i9dQZF1DX2L0iB23Enbq
1,65LdqYCLcsV0lJoxpeQ6fW
2,0JFatPoPq82gNcPa4esOzj
3,2NNzPH70CakBbbU8JHrZRG
4,4FLeoROn5GT7n2tZq5XB4V


## Get information about the playlists

In [272]:
# Get additional information about the playlists
sp_playlists['sp_playlist_info'] = [sp.playlist(id) for id in tqdm(sp_playlists.sp_playlist_id)]

# Check the result
sp_playlists.head()

HBox(children=(FloatProgress(value=0.0, max=50.0), HTML(value='')))




NameError: name 'sp_playlist' is not defined

# Chartmetric

## Connecting to the API

In [224]:
load_dotenv(find_dotenv())

True

In [225]:
def get_token_cmc():
    url = "https://api.chartmetric.com/api/token"
    payload = r'{"refreshtoken":"%s"}' % os.getenv('chartmetric_rftoken')

    headers = {
      'Content-Type': 'application/json',
      'Cookie': 'connect.sid=s%3A96446210-f75b-11ea-bb65-c97242076514.wtHGb%2BZnACRtIERUXXeoBVsOmhNDPPnBzgPW9UL%2Bhpc'
    }

    response = requests.request("POST", url, headers=headers, data=payload)
    print(response)
    
    token = re.findall("[A-Za-z0-9._-]+", response.text)[1]
    
    return token

In [226]:
token = get_token_cmc()

<Response [200]>


## Songs

In [38]:
# Create a dataframe with songs for the Chartmetric dataset
df_cmc_tracks = df_sp.iloc[:, 0:6].drop(columns='spotify_search')
df_cmc_tracks

Unnamed: 0,song,artists,artists_list,number_artists,sp_id
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q
...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3


### Get Chartmetric track ID

In [39]:
def get_cmc_id_spotify(query: str, limit=10, offset=0):
    
    '''
    Searches for Chartmetric track ID using Spotify ID
    
    Args: spotify ID
    ------
    
    Returns: json
    '''
    
    search = f'https://open.spotify.com/track/{query}'
    url = f'https://api.chartmetric.com/api/search?q={search}&limit={limit}&offset={offset}&type=tracks'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [40]:
cmc_tracks_id = [get_cmc_id_spotify(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc_tracks.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [41]:
len(cmc_tracks_id)

69

In [42]:
df_cmc_tracks['cmc_song_seach'] = cmc_tracks_id
df_cmc_tracks

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '..."
...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '..."
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '..."
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""..."
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '..."


In [47]:
df_cmc_tracks['cmc_id'] = [cmc_search['obj']['tracks'][0]['id'] if df_cmc_tracks.iloc[index, 4] != 'not-found' 
                           else 'not-found' for index, cmc_search in enumerate(df_cmc_tracks.cmc_song_seach)]

df_cmc_tracks.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356


### Get track metadata

In [43]:
def get_cmc_metadata(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/track/{cmc_id}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [48]:
cmc_track_metadata = [get_cmc_metadata(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc_tracks.cmc_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [50]:
df_cmc_tracks['cmc_track_metadata'] = cmc_track_metadata

In [57]:
df_cmc_tracks.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_metadata
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."


### Get track stats - TikTok

In [58]:
def get_cmc_track_stats_tiktok(cmc_id, since="2020-01-01"):
    
    
    
    url = f'https://api.chartmetric.com/api/track/{cmc_id}/tiktok/stats?since={since}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [None]:
cmc_track_stats_tiktok = [get_cmc_track_stats_tiktok(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc_tracks.cmc_id)]

In [None]:
df_cmc_tracks['cmc_track_stats_tiktok'] = cmc_track_stats_tiktok
df_cmc_tracks.head()

### Export dataset

In [52]:
df_cmc_tracks.to_csv('cmc_tracks_info.csv', index=False)

## Artists

In [134]:
df_cmc_artists = df_artists.drop(columns=['spotify_artist'])
df_cmc_artists

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486779
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3445699
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17989469
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4326193
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82296
...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,"[ohio hip hop, pop rap, rap]",86,2454191
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,"[british indie rock, modern alternative rock, ...",78,1085134
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,[rap rock],77,252326
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,"[pop, post-teen pop]",83,3671755


### Get Artists IDs

In [135]:
def cmc_search_artist_spotify(query: str, limit=10, offset=0):
    
    '''
    Searches for Chartmetric artist ID using Spotify ID
    
    Args: spotify ID
    ------
    
    Returns: json
    '''
    
    search = f'https://open.spotify.com/artist/{query}'
    url = f'https://api.chartmetric.com/api/search?q={search}&limit={limit}&offset={offset}&type=artists'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [227]:
cmc_artist_id = [cmc_search_artist_spotify(id) for id in tqdm(df_cmc_artists['sp_id'])]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))




In [228]:
df_cmc_artists['cmc_search'] = cmc_artist_id

In [229]:
df_cmc_artists.head()

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers,cmc_search
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486779,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar..."
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3445699,"{'obj': {'artists': [{'id': 217671, 'name': 'D..."
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17989469,"{'obj': {'artists': [{'id': 454302, 'name': 'C..."
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4326193,"{'obj': {'artists': [{'id': 398544, 'name': 'D..."
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82296,"{'obj': {'artists': [{'id': 147365, 'name': 'K..."


In [230]:
df_cmc_artists.cmc_search[0]['obj']['artists'][0]

{'id': 64150,
 'name': 'Arizona Zervas',
 'image_url': 'https://i.scdn.co/image/d549aadbb8b3a254fdc8e5ac93535a706463dce6',
 'isni': None,
 'code2': 'us',
 'hometown_city': None,
 'current_city': None,
 'sp_followers': 485529,
 'sp_popularity': 80,
 'sp_monthly_listeners': 15313268,
 'deezer_fans': 29274,
 'tags': ['pop', 'pop rap'],
 'spotify_artist_ids': ['0vRvGUQVUjytro0xpb26bs'],
 'itunes_artist_ids': [1026196272],
 'deezer_artist_ids': ['8650540'],
 'cm_artist_rank': 353,
 'amazon_artist_ids': ['B013CV8F1E']}

In [231]:
df_cmc_artists['cmc_id'] = [search['obj']['artists'][0]['id'] for search in df_cmc_artists.cmc_search]
df_cmc_artists

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_id
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486779,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3445699,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17989469,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4326193,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82296,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365
...,...,...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,"[ohio hip hop, pop rap, rap]",86,2454191,"{'obj': {'artists': [{'id': 3991, 'name': 'Mac...",3991
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,"[british indie rock, modern alternative rock, ...",78,1085134,"{'obj': {'artists': [{'id': 558951, 'name': 'Y...",558951
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,[rap rock],77,252326,"{'obj': {'artists': [{'id': 216839, 'name': 'T...",216839
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,"[pop, post-teen pop]",83,3671755,"{'obj': {'artists': [{'id': 213807, 'name': 'B...",213807


### Get artist metadata

In [232]:
def get_cmc_artist_metadata(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [233]:
cmc_art_metadata = [get_cmc_artist_metadata(id) for id in tqdm(df_cmc_artists.cmc_id)]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))




In [234]:
df_cmc_artists['cmc_art_metadata'] = cmc_art_metadata
df_cmc_artists

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_id,cmc_art_metadata
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486779,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150,"{'obj': {'id': 64150, 'name': 'Arizona Zervas'..."
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3445699,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671,"{'obj': {'id': 217671, 'name': 'Doja Cat', 'cr..."
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17989469,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302,"{'obj': {'id': 454302, 'name': 'Camila Cabello..."
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4326193,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544,"{'obj': {'id': 398544, 'name': 'DaBaby', 'crea..."
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82296,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365,"{'obj': {'id': 147365, 'name': 'Kid Francescol..."
...,...,...,...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,"[ohio hip hop, pop rap, rap]",86,2454191,"{'obj': {'artists': [{'id': 3991, 'name': 'Mac...",3991,"{'obj': {'id': 3991, 'name': 'Machine Gun Kell..."
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,"[british indie rock, modern alternative rock, ...",78,1085134,"{'obj': {'artists': [{'id': 558951, 'name': 'Y...",558951,"{'obj': {'id': 558951, 'name': 'YUNGBLUD', 'cr..."
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,[rap rock],77,252326,"{'obj': {'artists': [{'id': 216839, 'name': 'T...",216839,"{'obj': {'id': 216839, 'name': 'Travis Barker'..."
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,"[pop, post-teen pop]",83,3671755,"{'obj': {'artists': [{'id': 213807, 'name': 'B...",213807,"{'obj': {'id': 213807, 'name': 'Bazzi', 'creat..."


### Get spotify Monthly Listeners by City

In [235]:
def get_cmc_artist_sp_city(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}/where-people-listen'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [236]:
cmc_art_sp_city = [get_cmc_artist_sp_city(id) for id in tqdm(df_cmc_artists.cmc_id)]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
df_cmc_artists['cmc_art_sp_city'] = cmc_art_sp_city
df_cmc_artists.head()

## Tiktok Audience Data


In [None]:
def get_cmc_artist_tiktok_audience_data(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}/tiktok-audience-stats'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [None]:
cmc_art_tiktok_audience = [get_cmc_artist_tiktok_audience_data(id) for id in tqdm(df_cmc_artists.cmc_id)]

In [None]:
df_cmc_artists['cmc_art_tiktok_audience'] = cmc_art_tiktok_audience
df_cmc_artists.head()

### Export dataset

In [None]:
df_cmc_artists.to_csv('cmc_artists.csv', index=False)

In [None]:
df_teste2 = pd.read_csv('cmc_artists.csv')
df_teste2

# Charts

In [None]:
def get_cmc_tiktok_charts(date):
    
    
    
    url = f'https://api.chartmetric.com/api/charts/tiktok/tracks?date={date}&interval=weekly'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [None]:
cmc_tiktok_chart = get_cmc_tiktok_charts('2020-09-08')

In [None]:
df_cmc_chart = pd.DataFrame(cmc_tiktok_chart).reset_index()
df_cmc_chart.to_csv('cmc_chart.csv', index=False)

# Final Dataframe

## Songs

### Songs - Spotify

In [126]:
# Check the dataframe
df_tracks_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0.0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4


In [152]:
# Filter only the songs that were found
df_tracks_sp_found = df_tracks_sp[df_tracks_sp.sp_id != 'not-found']

# Check the result
df_tracks_sp_found

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False,...,6,-3.689,0,0.0548,0.00472,0.0127,0.14,0.877,149.976,4
65,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False,...,4,-6.207,0,0.0392,0.0531,0,0.152,0.263,170.18,4
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True,...,7,-4.718,1,0.0379,0.0257,0,0.313,0.277,119.921,4
67,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False,...,9,-5.513,0,0.072,0.465,1.12e-06,0.0338,0.902,195.918,4


### Songs - Chartmetric

In [87]:
# Check the dataframe
df_cmc_tracks.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_metadata
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."


In [89]:
# Filter only the songs that were found in Spotify
df_cmc_tracks_found = df_cmc_tracks[df_cmc_tracks.sp_id != 'not-found']
df_cmc_tracks_found

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_metadata
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."
...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '...",15674355,"{'obj': {'id': 15674355, 'name': 'What The Hel..."
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '...",13795060,"{'obj': {'id': 13795060, 'name': 'Towards The ..."
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""...",23903635,"{'obj': {'id': 23903635, 'name': 'I Think I'm ..."
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '...",19040153,"{'obj': {'id': 19040153, 'name': 'Myself', 'is..."


In [124]:
# Add Chartmetric information to the dataframe

# Duration ms
df_cmc_tracks_found['cmc_track_duration_ms'] = [each_row['obj']['duration_ms'] 
                                                if type(each_row) == dict 
                                                else 'not-found' 
                                                for each_row in df_cmc_tracks_found.cmc_track_metadata]

# Tags
df_cmc_tracks_found['cmc_track_tags'] = [each_row['obj']['tags'] 
                                         if type(each_row) == dict 
                                         else 'not-found' 
                                         for each_row in df_cmc_tracks_found.cmc_track_metadata]

# Release date
df_cmc_tracks_found['cmc_track_release_date'] = [each_row['obj']['release_date'] 
                                                 if type(each_row) == dict 
                                                 else 'not-found' 
                                                 for each_row in df_cmc_tracks_found.cmc_track_metadata]

# Chartmetric features
## Key
df_cmc_tracks_found['cmc_track_feat_key'] = [each_row['obj']['cm_audio_features']['key'] 
                                             if type(each_row['obj']['cm_audio_features']) == dict 
                                             else 'not-value' 
                                             for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Mode
df_cmc_tracks_found['cmc_track_feat_mode'] = [each_row['obj']['cm_audio_features']['mode'] 
                                              if type(each_row['obj']['cm_audio_features']) == dict 
                                              else 'not-value' 
                                              for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Danceability %
df_cmc_tracks_found['cmc_track_feat_danceability'] = [each_row['obj']['cm_audio_features']['danceability'] 
                                                      if type(each_row['obj']['cm_audio_features']) == dict 
                                                      else 'not-value' 
                                                      for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Energy %
df_cmc_tracks_found['cmc_track_feat_energy'] = [each_row['obj']['cm_audio_features']['energy'] 
                                                if type(each_row['obj']['cm_audio_features']) == dict 
                                                else 'not-value' 
                                                for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Speechiness
df_cmc_tracks_found['cmc_track_feat_speechiness'] = [each_row['obj']['cm_audio_features']['speechiness'] 
                                                     if type(each_row['obj']['cm_audio_features']) == dict 
                                                     else 'not-value' 
                                                     for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Acousticness %
df_cmc_tracks_found['cmc_track_feat_acousticness'] = [each_row['obj']['cm_audio_features']['acousticness'] 
                                                      if type(each_row['obj']['cm_audio_features']) == dict 
                                                      else 'not-value' 
                                                      for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Instrumentalness
df_cmc_tracks_found['cmc_track_feat_instrumentalness'] = [each_row['obj']['cm_audio_features']['instrumentalness'] 
                                                          if type(each_row['obj']['cm_audio_features']) == dict 
                                                          else 'not-value' 
                                                          for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Liveness %
df_cmc_tracks_found['cmc_track_feat_liveness'] = [each_row['obj']['cm_audio_features']['liveness'] 
                                                  if type(each_row['obj']['cm_audio_features']) == dict 
                                                  else 'not-value' 
                                                  for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Valence %
df_cmc_tracks_found['cmc_track_feat_valence'] = [each_row['obj']['cm_audio_features']['valence'] 
                                                 if type(each_row['obj']['cm_audio_features']) == dict 
                                                 else 'not-value' 
                                                 for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Tempo
df_cmc_tracks_found['cmc_track_feat_tempo'] = [each_row['obj']['cm_audio_features']['tempo'] 
                                               if type(each_row['obj']['cm_audio_features']) == dict 
                                               else 'not-value' 
                                               for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Loudness
df_cmc_tracks_found['cmc_track_feat_loudness'] = [each_row['obj']['cm_audio_features']['loudness'] 
                                                  if type(each_row['obj']['cm_audio_features']) == dict 
                                                  else 'not-value' 
                                                  for each_row in df_cmc_tracks_found.cmc_track_metadata]

# Chartmetric statistics
## Spotify Popularity
df_cmc_tracks_found['cmc_track_stat_sp_pop'] = [each_row['obj']['cm_statistics']['sp_popularity'] 
                                                if type(each_row['obj']['cm_audio_features']) == dict 
                                                else 'not-value' 
                                                for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Number of TikTok posts
df_cmc_tracks_found['cmc_track_stat_tiktok_counts'] = [each_row['obj']['cm_statistics']['tiktok_counts'] 
                                                       if type(each_row['obj']['cm_audio_features']) == dict 
                                                       else 'not-value' 
                                                       for each_row in df_cmc_tracks_found.cmc_track_metadata]

## Youtube views
df_cmc_tracks_found['cmc_track_stat_youtube_views'] = [each_row['obj']['cm_statistics']['youtube_views'] 
                                                       if type(each_row['obj']['cm_audio_features']) == dict 
                                                       else 'not-value' 
                                                       for each_row in df_cmc_tracks_found.cmc_track_metadata]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cmc_tracks_found['cmc_track_duration_ms'] = [each_row['obj']['duration_ms']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cmc_tracks_found['cmc_track_tags'] = [each_row['obj']['tags']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cmc_tracks_found['cmc_track_release_date'] = [each_row['ob

In [125]:
# Check the result
df_cmc_tracks_found.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_metadata,cmc_track_duration_ms,cmc_track_tags,...,cmc_track_feat_speechiness,cmc_track_feat_acousticness,cmc_track_feat_instrumentalness,cmc_track_feat_liveness,cmc_track_feat_valence,cmc_track_feat_tempo,cmc_track_feat_loudness,cmc_track_stat_sp_pop,cmc_track_stat_tiktok_counts,cmc_track_stat_youtube_views
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i...",163636,"Hip-Hop/Rap,Music,Pop,Música,Pop,Musik",...,0.148,0.0522,0.0,0.46,0.457,116.735,-5.616,88.0,2300000,96679674.0
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': {'id': 26951096, 'name': 'Say So', 'is...",237893,"Dance,Musica,R&B/Soul,Music",...,0.158,0.256,3.57e-06,0.0904,0.786,110.962,-4.577,89.0,19300000,198853609.0
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': {'id': 27597895, 'name': 'My Oh My', '...",170746,"Pop,Music",...,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,-6.024,,1900000,49709340.0
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': {'id': 12263271, 'name': 'Moon', 'isrc...",390638,"Pop,Music",...,0.0345,0.291,0.863,0.101,0.0578,117.984,-10.002,20.0,759300,
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc...",84008,"Dance,Music,R&B/Soul",...,0.384,0.0676,0.00951,0.117,0.169,159.995,-8.74,21.0,1900000,


### Merge the two dataframes

In [159]:
# Merge the dataframes
df_tracks_final_raw = pd.merge(left=df_tracks_sp_found, right=df_cmc_tracks_found, on='song')

# Check the result
df_tracks_final_raw

Unnamed: 0,song,artists_x,artists_list_x,number_artists_x,spotify_search,sp_id_x,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,cmc_track_feat_speechiness,cmc_track_feat_acousticness,cmc_track_feat_instrumentalness,cmc_track_feat_liveness,cmc_track_feat_valence,cmc_track_feat_tempo,cmc_track_feat_loudness,cmc_track_stat_sp_pop,cmc_track_stat_tiktok_counts,cmc_track_stat_youtube_views
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,0.148,0.0522,0,0.46,0.457,116.735,-5.616,88,2300000,96679674
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,...,0.158,0.256,3.57e-06,0.0904,0.786,110.962,-4.577,89,19300000,198853609
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,-6.024,,1900000,49709340
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,0.0345,0.291,0.863,0.101,0.0578,117.984,-10.002,20,759300,
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,0.384,0.0676,0.00951,0.117,0.169,159.995,-8.74,21,1900000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False,...,0.0584,0.00483,0.00956,0.163,0.885,150.081,-5.162,74,1600000,337047363
62,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False,...,0.0408,0.0559,0,0.156,0.241,170.211,-6.143,59,,912388
63,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True,...,0.036,0.0327,0,0.3,0.24,119.94,-4.691,20,254900,50808170
64,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False,...,0.072,0.465,1.12e-06,0.0338,0.902,195.918,-5.513,77,3700000,43521205


In [160]:
# Remove unwanted columns
df_tracks_final = df_tracks_final_raw.drop(columns=['spotify_search', 'cmc_song_seach', 'cmc_track_metadata'])

# Check the result
df_tracks_final

Unnamed: 0,song,artists_x,artists_list_x,number_artists_x,sp_id_x,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,sp_danceability,...,cmc_track_feat_speechiness,cmc_track_feat_acousticness,cmc_track_feat_instrumentalness,cmc_track_feat_liveness,cmc_track_feat_valence,cmc_track_feat_tempo,cmc_track_feat_loudness,cmc_track_stat_sp_pop,cmc_track_stat_tiktok_counts,cmc_track_stat_youtube_views
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,0.621,...,0.148,0.0522,0,0.46,0.457,116.735,-5.616,88,2300000,96679674
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,237893,88,2019-11-07,True,0.787,...,0.158,0.256,3.57e-06,0.0904,0.786,110.962,-4.577,89,19300000,198853609
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,0.724,...,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,-6.024,,1900000,49709340
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,0.662,...,0.0345,0.291,0.863,0.101,0.0578,117.984,-10.002,20,759300,
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,0.754,...,0.384,0.0676,0.00951,0.117,0.169,159.995,-8.74,21,1900000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False,0.578,...,0.0584,0.00483,0.00956,0.163,0.885,150.081,-5.162,74,1600000,337047363
62,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False,0.261,...,0.0408,0.0559,0,0.156,0.241,170.211,-6.143,59,,912388
63,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True,0.628,...,0.036,0.0327,0,0.3,0.24,119.94,-4.691,20,254900,50808170
64,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False,0.745,...,0.072,0.465,1.12e-06,0.0338,0.902,195.918,-5.513,77,3700000,43521205


### Export the dataframe

In [165]:
# Export the dataframe
df_tracks_final_raw.to_csv('01-tracks_data_final_raw.csv', index=False)
df_tracks_final.to_csv('01-tracks_data_final.csv', index=False)

## Artists

### Artists - Spotify

In [163]:
# Check the dataframe
df_artists

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,486779
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3445699
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17989469
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4326193
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82296
...,...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,{'external_urls': {'spotify': 'https://open.sp...,"[ohio hip hop, pop rap, rap]",86,2454191
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,{'external_urls': {'spotify': 'https://open.sp...,"[british indie rock, modern alternative rock, ...",78,1085134
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,{'external_urls': {'spotify': 'https://open.sp...,[rap rock],77,252326
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,{'external_urls': {'spotify': 'https://open.sp...,"[pop, post-teen pop]",83,3671755


### Artists - Chartmetric

In [184]:
# Import the dataset
cmc_art_teste = pd.read_csv('cmc_artists_info.csv')

# Check the result
cmc_art_teste

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_id,cmc_art_metadata,cmc_art_sp_city,cmc_art_tiktok_audience
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"['pop rap', 'rap', 'rhode island rap']",80,486779,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150,"{'obj': {'id': 64150, 'name': 'Arizona Zervas'...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,{'obj': {'top_countries': [{'name': 'United St...
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"['la indie', 'pop']",88,3445699,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671,"{'obj': {'id': 217671, 'name': 'Doja Cat', 'cr...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,{'obj': {'top_countries': [{'name': 'United St...
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"['dance pop', 'pop', 'post-teen pop']",87,17989469,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302,"{'obj': {'id': 454302, 'name': 'Camila Cabello...",{'obj': {'Brisbane': [{'timestp': '2020-03-21T...,{'obj': {'top_countries': [{'name': 'United St...
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"['north carolina hip hop', 'rap']",95,4326193,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544,"{'obj': {'id': 398544, 'name': 'DaBaby', 'crea...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,"{'obj': {'top_countries': [], 'audience_gender..."
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"['electronica', 'french indie pop', 'french in...",61,82296,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365,"{'obj': {'id': 147365, 'name': 'Kid Francescol...",{'obj': {'Berlin': [{'timestp': '2020-03-21T00...,"{'obj': {'top_countries': [], 'audience_gender..."
...,...,...,...,...,...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,"['ohio hip hop', 'pop rap', 'rap']",86,2454191,"{'obj': {'artists': [{'id': 3991, 'name': 'Mac...",3991,"{'obj': {'id': 3991, 'name': 'Machine Gun Kell...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,{'obj': {'top_countries': [{'name': 'United St...
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,"['british indie rock', 'modern alternative roc...",78,1085134,"{'obj': {'artists': [{'id': 558951, 'name': 'Y...",558951,"{'obj': {'id': 558951, 'name': 'YUNGBLUD', 'cr...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,{'obj': {'top_countries': [{'name': 'United St...
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,['rap rock'],77,252326,"{'obj': {'artists': [{'id': 216839, 'name': 'T...",216839,"{'obj': {'id': 216839, 'name': 'Travis Barker'...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,"{'obj': {'top_countries': [], 'audience_gender..."
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,"['pop', 'post-teen pop']",83,3671755,"{'obj': {'artists': [{'id': 213807, 'name': 'B...",213807,"{'obj': {'id': 213807, 'name': 'Bazzi', 'creat...",{'obj': {'Atlanta': [{'timestp': '2020-03-21T0...,{'obj': {'top_countries': [{'name': 'United St...


In [186]:
# Convert the strings to list or dictionaries
cmc_art_teste.sp_genres = cmc_art_teste.sp_genres.apply(literal_eval)
cmc_art_teste.cmc_search = cmc_art_teste.cmc_search.apply(literal_eval)
cmc_art_teste.cmc_art_metadata = cmc_art_teste.cmc_art_metadata.apply(literal_eval)
cmc_art_teste.cmc_art_sp_city = cmc_art_teste.cmc_art_sp_city.apply(literal_eval)
cmc_art_teste.cmc_art_tiktok_audience = cmc_art_teste.cmc_art_tiktok_audience.apply(literal_eval)

In [189]:
# Check the results
print(f'sp_genres: {type(cmc_art_teste.sp_genres[0])}')
print(f'cmc_search: {type(cmc_art_teste.cmc_search[0])}')
print(f'cmc_art_metadata: {type(cmc_art_teste.cmc_art_metadata[0])}')
print(f'cmc_art_sp_city: {type(cmc_art_teste.cmc_art_sp_city[0])}')
print(f'cmc_art_tiktok_audience: {type(cmc_art_teste.cmc_art_tiktok_audience[0])}')

sp_genres: <class 'list'>
cmc_search: <class 'dict'>
cmc_art_metadata: <class 'dict'>
cmc_art_sp_city: <class 'dict'>
cmc_art_tiktok_audience: <class 'dict'>


In [201]:
cmc_art_teste.cmc_search[0]['obj']['artists'][0]

{'id': 64150,
 'name': 'Arizona Zervas',
 'image_url': 'https://i.scdn.co/image/d549aadbb8b3a254fdc8e5ac93535a706463dce6',
 'isni': None,
 'code2': 'us',
 'hometown_city': None,
 'current_city': None,
 'sp_followers': 485529,
 'sp_popularity': 80,
 'sp_monthly_listeners': 15313268,
 'deezer_fans': 29274,
 'tags': ['pop', 'pop rap'],
 'spotify_artist_ids': ['0vRvGUQVUjytro0xpb26bs'],
 'itunes_artist_ids': [1026196272],
 'deezer_artist_ids': ['8650540'],
 'cm_artist_rank': 353,
 'amazon_artist_ids': ['B013CV8F1E']}

In [205]:
cmc_art_teste.cmc_art_metadata[0]['obj']

{'id': 64150,
 'name': 'Arizona Zervas',
 'created_at': '2016-11-26T00:00:00.000Z',
 'code2': 'us',
 'gender': 1,
 'isni': None,
 'cm_artist_rank': 366,
 'cover_url': 'https://i.scdn.co/image/ab67616d0000b273bf30549de1c332630a11133f',
 'image_url': 'https://i.scdn.co/image/d549aadbb8b3a254fdc8e5ac93535a706463dce6',
 'hometown_city': None,
 'current_city': None,
 'current_city_id': None,
 'record_label': None,
 'band_members': None,
 'press_contact': None,
 'booking_agent': None,
 'description': 'Without a label or mainstream presence, hip-hop- and R&B-influenced singer and rapper Arizona Zervas amassed a devoted following that boosted streams of his early singles into the millions, culminating in 2019\'s viral smash single "Roxanne."\r\n\nDespite his name, Zervas was born and raised in Maryland in 1995. Writing since high school, he uploaded his early songs online, building an audience with his freestyles and tracks including his 2016 debut "Don\'t Hit My Line." Blending the styles and

In [223]:
cmc_a

[{'timestp': '2020-03-21T00:00:00.000Z', 'code2': 'US', 'listeners': 437048},
 {'timestp': '2020-03-22T00:00:00.000Z', 'code2': 'US', 'listeners': 432350},
 {'timestp': '2020-03-23T00:00:00.000Z', 'code2': 'US', 'listeners': 427598},
 {'timestp': '2020-03-24T00:00:00.000Z', 'code2': 'US', 'listeners': 422772},
 {'timestp': '2020-03-25T00:00:00.000Z', 'code2': 'US', 'listeners': 419548},
 {'timestp': '2020-03-26T00:00:00.000Z', 'code2': 'US', 'listeners': 416165},
 {'timestp': '2020-03-27T00:00:00.000Z', 'code2': 'US', 'listeners': 413550},
 {'timestp': '2020-03-28T00:00:00.000Z', 'code2': 'US', 'listeners': 409917},
 {'timestp': '2020-03-29T00:00:00.000Z', 'code2': 'US', 'listeners': 404642},
 {'timestp': '2020-03-30T00:00:00.000Z', 'code2': 'US', 'listeners': 398183},
 {'timestp': '2020-03-31T00:00:00.000Z', 'code2': 'US', 'listeners': 392530},
 {'timestp': '2020-04-01T00:00:00.000Z', 'code2': 'US', 'listeners': 388646},
 {'timestp': '2020-04-23T00:00:00.000Z', 'code2': 'US', 'listene

### Export the dataframe

# Postgre