![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

<body>
    <p style="font-size:28px;text-align:center"><b>Project 03 - Part 01 | Web Scrapping & API</b></p>
</body>

# Introduction

The objective of the first part of this project was collect data to answer a problem statement, practicing web scrapping and using API.

---

<body>
    <p style="font-size:20px"><b>Problem Statement</b></p>
</body>

_What some the TikTok viral songs have in common?_

---

To answer this problem, 69 songs, which were obtained from the **PopSugar** website, were analyzed. The post that contained this list of songs was made in March 27th, 2020 by Hedy Phillips around the same time people started to quarantine, because of the COVIVD-19 pandemic, and people began to use it more to spend their time at home.

The sources of information used to gather data were **Spotify**,  **Last.fm**, **Chartmetric** and **MusicBrainz**.

---

Sources:
- Websites:
  - PopSugar: https://www.popsugar.com/entertainment/popular-tiktok-songs-47289804?stream_view=1#photo-47289832
  - MusicBrainz:https://musicbrainz.org/genres
 
- APIs
  - Spotify API: https://developer.spotify.com/
  - Spotipy (Spotify API wrapper for Python): https://spotipy.readthedocs.io/en/2.15.0/
  - Last.fm API: https://www.last.fm/api
  - Chartmetric API: https://api.chartmetric.com/apidoc/

# Setup

## Import

In [92]:
import os
import re
import requests
from time import sleep

import numpy as np
import pandas as pd
import spotipy

from bs4 import BeautifulSoup
from dotenv import load_dotenv, find_dotenv
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
from tqdm.auto import tqdm

# Web Scrapping

The web scrapping was necessary to collect the following data:

<table>
  <thead>
    <tr>
      <th>INFORMATION</th>
      <th>SOURCE</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>69 TikTok viral songs</td>
      <td>PopSugar</td>
    </tr>
    <tr>
      <td>Music genres</td>
      <td>MusicBrainz</td>
    </tr>
  </tbody>
</table>



## List of viral songs on TikTok

### Get response

In [93]:
# Get response from the url and check it
url = 'https://www.popsugar.com/entertainment/popular-tiktok-songs-47289804?stream_view=1#photo-47289832'
response = requests.get(url)
response

<Response [200]>

### Data Collection

In [94]:
# Get the content in the url
content_popsugar = BeautifulSoup(response.text)

# Get the date the post was made
popsugar_date = content_popsugar.find('time').text.replace('\n', '').strip()

# Get only the songs and artists
popsugar_html = content_popsugar.find_all('span', attrs={'class': 'count-copy'})

In [95]:
# Conver the list 'html_popsugar' to a Pandas DataFrame
df_base = pd.DataFrame([re.split(' by ', song.text.replace('"', '').strip()) for song in popsugar_html], 
                       columns=['song', 'artists'])

# Check the result
df_base

Unnamed: 0,song,artists
0,Roxanne,Arizona Zervas
1,Say So,Doja Cat
2,My Oh My,Camila Cabello feat. DaBaby
3,Moon,Kid Francescoli
4,Vibe,Cookiee Kawaii
...,...,...
64,What the Hell,Avril Lavigne
65,Towards the Sun,Rihanna
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker"
67,Myself,Bazzi


### Data Cleaning

In [96]:
# Create a column with a list of artists for each song
df_base['artists_list'] = [re.split(',* and |, * | [Ff]eat. ', artists.strip()) for artists in df_base.artists]

# Create a column with the number of artists for each song
df_base['number_artists'] = df_base.artists_list.apply(len)

In [97]:
# Check possible number of artists for one song
df_base.number_artists.value_counts()

1    48
2    18
3     3
Name: number_artists, dtype: int64

Seeing the result above, the maximum number of artists for a song is 3.

In [98]:
# Check the dataframe
df_base.head()

Unnamed: 0,song,artists,artists_list,number_artists
0,Roxanne,Arizona Zervas,[Arizona Zervas],1
1,Say So,Doja Cat,[Doja Cat],1
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2
3,Moon,Kid Francescoli,[Kid Francescoli],1
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1


### Make backup dataframe

In [99]:
df_base_raw_bck = df_base.copy()

## List of music genre

### Get the response

In [100]:
'''# Get response from the url and check it
url = 'https://musicbrainz.org/genres'
response = requests.get(url)
response'''

"# Get response from the url and check it\nurl = 'https://musicbrainz.org/genres'\nresponse = requests.get(url)\nresponse"

### Data collection

In [101]:
'''# Get the content in the url
musicbrainz_content = BeautifulSoup(response.content)

# Create a list with the music genres listed in the url
musicbrainz_genre = [genre.text for genre in musicbrainz_content.find_all('bdi')]'''

"# Get the content in the url\nmusicbrainz_content = BeautifulSoup(response.content)\n\n# Create a list with the music genres listed in the url\nmusicbrainz_genre = [genre.text for genre in musicbrainz_content.find_all('bdi')]"

The data cleaning for this list will be made later.

# Spotify

From the Spotify and with the Spotipy's help, some data about each song will be gathered. It is relevant to point that there is a possibility that some songs may not be in the Spotify's library.

In [102]:
# Create a copy of the dataframe
df_sp = df_base.copy()

# Check the result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists
0,Roxanne,Arizona Zervas,[Arizona Zervas],1
1,Say So,Doja Cat,[Doja Cat],1
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2
3,Moon,Kid Francescoli,[Kid Francescoli],1
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1


## Connecting to the API

In [103]:
load_dotenv(find_dotenv())

True

In [104]:
cid = os.getenv('spotify_p03_key')
csecret = os.getenv('spotify_p03_secret')
cc_manager = SpotifyClientCredentials(client_id=cid, client_secret=csecret)
sp = spotipy.Spotify(client_credentials_manager=cc_manager)

## Songs

### Search information about each song

In [105]:
# Search information about each song, using the Spotipy
spotify_songs = [sp.search(q=df_base.iloc[index, 0], type='track', limit=50) for index in tqdm(df_base.index)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [106]:
# Check if there are 69 items in this list
len(spotify_songs)

69

In [107]:
# Add a column in the dataframe with the data that were just collected
df_sp['spotify_search'] = spotify_songs

In [108]:
# Check the result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...


In [109]:
# Function to add new information in a copy of the dataframe
def get_spotify_track_info(df):
    
    '''
    Filters some data of the songs and adds them to a copy of the dataframe
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_spotify_track_id = []
    list_spotify_track_duration = []
    list_spotify_track_popularity = []
    list_spotify_album_release_date = []
    list_spotify_track_explicit = []
    lists_spotify = [list_spotify_track_id, list_spotify_track_duration, list_spotify_track_popularity,
                     list_spotify_album_release_date, list_spotify_track_explicit]
    
    
    # Check for each row of the dataframe
    for index in df.index:
        
        # Information necessary from the dataframe to use during the process
        song_name = df.iloc[index, 0][:3].lower()
        artists_list = [artist.lower() for artist in df.iloc[index, 2]]
        total_artists = df.iloc[index, 3]
        mask = df.iloc[index, 4]['tracks']['items']
        
        # If the track was not found in the Spotify library, a 'not-found' string is added to the final lists
        if len(mask) == 0:
            for lst in lists_spotify:
                lst.append('not-found')
            #print(f'{index} - {song_name} - NOT FOUND')
        
        # If the track was found in the Spotify
        else:
            
            # Variable necessary to check if the information about a song has been added to the final lists
            added = 0
            
            # For each track it was listed 50 tracks related to the query 
            for idx, each_found in enumerate(mask):
                
                # Information necessary from the Spotify API to use during the process
                track_name = mask[idx]['name'].lower()
                track_id = mask[idx]['id']
                track_duration = mask[idx]['duration_ms']
                track_popularity = mask[idx]['popularity']
                album_release_date = mask[idx]['album']['release_date']
                track_explicit = mask[idx]['explicit']
                n_artists = len(mask[idx]['artists'])
                first_artist_name = mask[idx]['artists'][0]['name'].lower()
            
                # Check if the name of the song, the artists from both sources match and if an information about the
                # song has been added to the final lists 
                if ((song_name in track_name) & (total_artists == n_artists) & 
                    ((first_artist_name in artists_list) | (artists_list[0][:5] in first_artist_name)) & (added == 0)):
                    list_spotify_track_id.append(track_id)
                    list_spotify_track_duration.append(track_duration)
                    list_spotify_track_popularity.append(track_popularity)
                    list_spotify_album_release_date.append(album_release_date)
                    list_spotify_track_explicit.append(track_explicit)
                    added += 1
                    #print(f'{index} - {track_name} - {track_id}')
                        
                
                # If the track found in the search is not a math, itis the last one and information about the track 
                # has not been added to the final list, then add a 'not-found' string to the final lists
                elif (idx == len(mask) - 1) & (added == 0):
                    for lst in lists_spotify:
                        lst.append('not-found')
                    #print(f'{index} - {song_name} - NOT FOUND')
    
    # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_id'] = list_spotify_track_id
    df_copy['sp_duration_ms'] = list_spotify_track_duration
    df_copy['sp_popularity'] = list_spotify_track_popularity
    df_copy['sp_release_date'] = list_spotify_album_release_date
    df_copy['sp_explicit'] = list_spotify_track_explicit
                    
    return df_copy

In [110]:
# Add desired information to the dataframe
df_sp = get_spotify_track_info(df_sp)

# Check the result
df_sp

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,89,2019-11-07,True
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True
...,...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False
65,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True
67,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False


In [111]:
# Check if songs were not found in Spotify
df_sp[df_sp.sp_id == 'not-found']

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit
11,How About Now,Bryson Tiller,[Bryson Tiller],1,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found
36,Shibuya — Chanel Funk Remix,Frank Ocean and L.Dre,"[Frank Ocean, L.Dre]",2,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found
39,WOP,J. Dash feat. Flo Rida,"[J. Dash, Flo Rida]",2,{'tracks': {'href': 'https://api.spotify.com/v...,not-found,not-found,not-found,not-found,not-found


3 songs were not found in Spotify. There were checked manually using the Spotify Desktop. The first two do not exist there. Finally, the third song exists, but the version with the featured artist in the dataframe does not exist there.

### Find the audio features for each song

In [112]:
# Search in the API wrapper
spotify_audio_features = [sp.audio_features(track_id)  if track_id != 'not-found'else 'not-found' 
                          for track_id in tqdm(df_sp.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [113]:
# Check if there are 69 items in this list
len(spotify_audio_features)

69

In [114]:
# Function to add new information in a copy of the dataframe
def get_spotify_audio_features(df, audio_features: list):
    
    '''
    Adds new information from a list to a copy of the dataframe
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_danceability = []
    list_energy = []
    list_key = []
    list_loudness = []
    list_mode = []
    list_speechiness = []
    list_acousticness = []
    list_instrumentalness = []
    list_liveness = []
    list_valence = []
    list_tempo = []
    list_time_signature = []
    lists_features = [list_danceability, list_energy, list_key, list_loudness, list_mode, list_speechiness, 
                      list_acousticness, list_instrumentalness, list_liveness, list_valence, list_tempo,
                      list_time_signature]
    
    # Check for each row of the dataframe
    for index in df.index:
        
        # Get the track's Spotify id
        track_id = df.iloc[index, 5]
        
        # If the track was not found in the Spotify library, a 'not-found' string is added to the final lists
        if track_id == 'not-found':
            
            for lst in lists_features:
                lst.append('not-found')
    
        # If the track was found in the Spotify library
        else:
            
            # Add the information to the final lists
            list_danceability.append(audio_features[index][0]['danceability'])
            list_energy.append(audio_features[index][0]['energy'])
            list_key.append(audio_features[index][0]['key'])
            list_loudness.append(audio_features[index][0]['loudness'])
            list_mode.append(audio_features[index][0]['mode'])
            list_speechiness.append(audio_features[index][0]['speechiness'])
            list_acousticness.append(audio_features[index][0]['acousticness'])
            list_instrumentalness.append(audio_features[index][0]['instrumentalness'])
            list_liveness.append(audio_features[index][0]['liveness'])
            list_valence.append(audio_features[index][0]['valence'])
            list_tempo.append(audio_features[index][0]['tempo'])
            list_time_signature.append(audio_features[index][0]['time_signature'])
     
     # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_danceability'] = list_danceability
    df_copy['sp_energy'] = list_energy
    df_copy['sp_key'] = list_key
    df_copy['sp_loudness'] = list_loudness
    df_copy['sp_mode'] = list_mode
    df_copy['sp_speechiness'] = list_speechiness
    df_copy['sp_acousticness'] = list_acousticness
    df_copy['sp_instrumentalness'] = list_instrumentalness
    df_copy['sp_liveness'] = list_liveness
    df_copy['sp_valence'] = list_valence
    df_copy['sp_tempo'] = list_tempo
    df_copy['sp_time_signature'] = list_time_signature
    
    return df_copy

In [115]:
# Add new information to the dataframe
df_sp = get_spotify_audio_features(df_sp, spotify_audio_features)

# Check result
df_sp.head()

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0.0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,89,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4


### Final Dataframe

In [116]:
df_tracks_sp = df_sp[~(df_sp.sp_id == 'not-found')]

# Check the result
df_tracks_sp

Unnamed: 0,song,artists,artists_list,number_artists,spotify_search,sp_id,sp_duration_ms,sp_popularity,sp_release_date,sp_explicit,...,sp_key,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,{'tracks': {'href': 'https://api.spotify.com/v...,696DnlkuDOXcMAnKlTgXXK,163636,88,2019-10-10,True,...,6,-5.616,0,0.148,0.0522,0,0.46,0.457,116.735,5
1,Say So,Doja Cat,[Doja Cat],1,{'tracks': {'href': 'https://api.spotify.com/v...,3Dv1eDb0MEgF93GpLXlucZ,237893,89,2019-11-07,True,...,11,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,{'tracks': {'href': 'https://api.spotify.com/v...,3yOlyBJuViE2YSGn3nVE1K,170746,82,2019-12-06,False,...,8,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4
3,Moon,Kid Francescoli,[Kid Francescoli],1,{'tracks': {'href': 'https://api.spotify.com/v...,24upABZ8A0sAepfu91sEYr,390638,70,2017-03-03,False,...,7,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,{'tracks': {'href': 'https://api.spotify.com/v...,4gOgQTv9RYYFZ1uQNnlk3q,83940,72,2019-03-29,True,...,10,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,{'tracks': {'href': 'https://api.spotify.com/v...,2z4U9d5OAA4YLNXoCgioxo,220706,74,2011-03-08,False,...,6,-3.689,0,0.0548,0.00472,0.0127,0.14,0.877,149.976,4
65,Towards the Sun,Rihanna,[Rihanna],1,{'tracks': {'href': 'https://api.spotify.com/v...,1UuZhGTon3gzXQAJzNa2A4,273293,55,2015-03-23,False,...,4,-6.207,0,0.0392,0.0531,0,0.152,0.263,170.18,4
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,{'tracks': {'href': 'https://api.spotify.com/v...,2gTdDMpNxIRFSiu7HutMCg,169397,81,2019-07-05,True,...,7,-4.718,1,0.0379,0.0257,0,0.313,0.277,119.921,4
67,Myself,Bazzi,[Bazzi],1,{'tracks': {'href': 'https://api.spotify.com/v...,5YLHLxoZsodDWjqSgjhBf3,167552,76,2018-04-12,False,...,9,-5.513,0,0.072,0.465,1.12e-06,0.0338,0.902,195.918,4


In [117]:
# Dataframe metadata
df_tracks_sp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66 entries, 0 to 68
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   song                 66 non-null     object
 1   artists              66 non-null     object
 2   artists_list         66 non-null     object
 3   number_artists       66 non-null     int64 
 4   spotify_search       66 non-null     object
 5   sp_id                66 non-null     object
 6   sp_duration_ms       66 non-null     object
 7   sp_popularity        66 non-null     object
 8   sp_release_date      66 non-null     object
 9   sp_explicit          66 non-null     object
 10  sp_danceability      66 non-null     object
 11  sp_energy            66 non-null     object
 12  sp_key               66 non-null     object
 13  sp_loudness          66 non-null     object
 14  sp_mode              66 non-null     object
 15  sp_speechiness       66 non-null     object
 16  sp_acousti

## Artists

### Get information about each artist

In [118]:
# Function to add new information in a copy of the dataframe
def get_spotify_artists_id(df):
    
    '''
    Creates a new dataframe about the artists.
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the songs and their artists
    
    Returns:
    --------
        df_artists_id (Pandas DataFrame): a new dataframe with the name and Spotify ID of the artists
    '''
    
    # Create auxiliary empty dictionary
    dict_artists_ids = {}
      
    # Check for each row of the dataframe
    for index in df.index:
                      
        # Information necessary from the dataframe to use during the process
        song_name = df.iloc[index, 0][:3].lower()
        artists_list = [artist.lower() for artist in df.iloc[index, 2]]
        total_artists = df.iloc[index, 3]
        mask = df.iloc[index, 4]['tracks']['items']
        added = 0
        
        # For each track it was listed 50 tracks related to the query 
        for idx, each_found in enumerate(mask):
                
            # Information necessary from the Spotify API to use during the process
            track_name = mask[idx]['name'].lower()
            n_artists = len(mask[idx]['artists'])
            first_artist_name = mask[idx]['artists'][0]['name'].lower()
            
            # Check if the name of the song, the artists from both sources match and if an information about the
            # song has been added to the final lists 
            if ((song_name in track_name) & (total_artists == n_artists) & 
                ((first_artist_name in artists_list) | (artists_list[0][:5] in first_artist_name)) & (added == 0)
                & (df.iloc[index, 5] != 'not-found')):
                
                for artist in mask[idx]['artists']:
                    artist_name = artist['name']
                    artist_id = artist['id']
                        
                    # Add to dict
                    dict_artists_ids[artist_name] = artist_id
                    
    # Create a Pandas DataFrame
    
    df_artists_id = pd.DataFrame(dict_artists_ids.items(), columns=['artist', 'sp_id'])
                    
    return df_artists_id

In [119]:
# Create a Pandas DataFrame with artists
df_artists_sp = get_spotify_artists_id(df_sp)

# Check result
df_artists_sp

Unnamed: 0,artist,sp_id
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF
3,DaBaby,4r63FhuTkUYltbVAg5TQnk
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD
...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW
85,Travis Barker,4exLIFE8sISLr28sqG1qNX
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl


In [120]:
# Search in the API wrapper
spotify_artists_info = [sp.artist(artist) for artist in  tqdm(df_artists_sp.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))




In [121]:
# Add a column in the dataframe with the data that were just collected
df_artists_sp['spotify_artist'] = spotify_artists_info

# Check the result
df_artists_sp.head()

Unnamed: 0,artist,sp_id,spotify_artist
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...


In [122]:
# Function to add new information in a copy of the dataframe
def get_spotify_artist_info(df):
    
    '''
    Adds information about the artists to a copy of the dataframe.
    
    Args:
    -----
        df (Pandas DataFrame): a dataframe containing the artists and their Spotify ID
    
    Returns:
    --------
        df_copy (Pandas DataFrame): a copy of the dataframe with some new information appended
    '''
    
    # Create auxiliary empty lists (final lists)
    list_spotify_artist_genres = []
    list_spotify_artist_popularity = []
    list_spotify_artist_followers = []
    
    # Check for each row of the dataframe
    for index in df.index:
        
        sp_id = df.iloc[index, 1]
        mask = df.iloc[index, 2]
        search_sp_id = mask['id']
        #artist_name = mask['name']
        
        # Check if 'id's match
        if sp_id == search_sp_id:
            #print(f'{index} - {artist_name}: OK - {n_artist}')
            
            search_sp_genres = mask['genres']
            list_spotify_artist_genres.append(search_sp_genres)
            
            search_sp_popularity = mask['popularity']
            list_spotify_artist_popularity.append(search_sp_popularity)
            
            search_sp_followers = mask['followers']['total']
            list_spotify_artist_followers.append(search_sp_followers)
    
    # Make a copy of the dataframe
    df_copy = df.copy()
    
    # Add columns with the desired information
    # Not an inplace process
    df_copy['sp_genres'] = list_spotify_artist_genres
    df_copy['sp_popularity'] = list_spotify_artist_popularity
    df_copy['sp_followers'] = list_spotify_artist_followers
                    
    return df_copy

In [123]:
# Add desired information to the dataframe
df_artists = get_spotify_artist_info(df_artists_sp)

# Check the result
df_artists.head()

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,486223
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3435338
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17978647
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4317753
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82230


## Playlists

In [124]:
# Search in the API wrapper
spotify_tiktok = sp.search(q='tiktok', type='playlist', limit=50)

In [125]:
len(spotify_tiktok['playlists']['items'])

50

# Chartmetric

## Connecting to the API

In [180]:
load_dotenv(find_dotenv())

True

In [181]:
def get_token_cmc():
    url = "https://api.chartmetric.com/api/token"
    payload = r'{"refreshtoken":"%s"}' % os.getenv('chartmetric_rftoken')

    headers = {
      'Content-Type': 'application/json',
      'Cookie': 'connect.sid=s%3A96446210-f75b-11ea-bb65-c97242076514.wtHGb%2BZnACRtIERUXXeoBVsOmhNDPPnBzgPW9UL%2Bhpc'
    }

    response = requests.request("POST", url, headers=headers, data=payload)
    print(response)
    
    token = re.findall("[A-Za-z0-9._-]+", response.text)[1]
    
    return token

In [182]:
token = get_token_cmc()

<Response [200]>


## Songs

In [47]:
# Create a dataframe with songs for the Chartmetric dataset
df_cmc = df_sp.iloc[:, 0:6].drop(columns='spotify_search')
df_cmc

Unnamed: 0,song,artists,artists_list,number_artists,sp_id
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q
...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3


### Get Chartmetric track ID

In [137]:
def get_cmc_id_spotify(query: str, limit=10, offset=0):
    
    '''
    Searches for Chartmetric track ID using Spotify ID
    
    Args: spotify ID
    ------
    
    Returns: json
    '''
    
    search = f'https://open.spotify.com/track/{query}'
    url = f'https://api.chartmetric.com/api/search?q={search}&limit={limit}&offset={offset}&type=tracks'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [138]:
cmc_songs_id = [get_cmc_id_spotify(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc.sp_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [44]:
len(cmc_songs_id)

69

In [48]:
df_cmc['cmc_song_seach'] = cmc_songs_id
df_cmc

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '..."
...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '..."
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '..."
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""..."
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '..."


In [52]:
df_cmc['cmc_id'] = [cmc_search['obj']['tracks'][0]['id'] if df_cmc.iloc[index, 4] != 'not-found' else 'not-found' 
                    for index, cmc_search in enumerate(df_cmc.cmc_song_seach)]

df_cmc.cmc_id.head()

0    27228348
1    26951096
2    27597895
3    12263271
4    25138356
Name: cmc_id, dtype: object

In [55]:
df_cmc.to_csv('cmc_test.csv', index=False)

In [56]:
df_cmc.to_csv('cmc_test.csv', index=False)
df_teste = pd.read_csv('cmc_test.csv')
df_teste

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id
0,Roxanne,Arizona Zervas,['Arizona Zervas'],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348
1,Say So,Doja Cat,['Doja Cat'],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096
2,My Oh My,Camila Cabello feat. DaBaby,"['Camila Cabello', 'DaBaby']",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895
3,Moon,Kid Francescoli,['Kid Francescoli'],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271
4,Vibe,Cookiee Kawaii,['Cookiee Kawaii'],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356
...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,['Avril Lavigne'],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '...",15674355
65,Towards the Sun,Rihanna,['Rihanna'],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '...",13795060
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","['Machine Gun Kelly', 'YUNGBLUD', 'Travis Bark...",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""...",23903635
67,Myself,Bazzi,['Bazzi'],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '...",19040153


### Get track metadata

In [57]:
def get_cmc_metadata(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/track/{cmc_id}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [58]:
cmc_track_metadata = cmc_track_metadata = [get_cmc_metadata(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc.cmc_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [152]:
df_cmc['cmc_track_metadata'] = cmc_track_metadata

In [153]:
df_cmc.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_stats_tiktok,cmc_track_metadata
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': [{'value': 2100000, 'timestp': '2020-0...","{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': [{'value': 5300000, 'timestp': '2020-0...","{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': [{'value': 791700, 'timestp': '2020-01...","{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': [{'value': 382197, 'timestp': '2020-02...","{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': [{'value': 948913, 'timestp': '2020-03...","{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."


### Get track stats - TikTok

In [59]:
def get_cmc_track_stats_tiktok(cmc_id, since="2020-01-01"):
    
    
    
    url = f'https://api.chartmetric.com/api/track/{cmc_id}/tiktok/stats?since={since}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [60]:
cmc_track_stats_tiktok = [get_cmc_track_stats_tiktok(id) if id != 'not-found' else 'not-found' for id in tqdm(df_cmc.cmc_id)]

HBox(children=(FloatProgress(value=0.0, max=69.0), HTML(value='')))




In [62]:
df_cmc['cmc_track_stats_tiktok'] = cmc_track_stats_tiktok
df_cmc.head()

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_stats_tiktok
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': [{'value': 2100000, 'timestp': '2020-0..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': [{'value': 5300000, 'timestp': '2020-0..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': [{'value': 791700, 'timestp': '2020-01..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': [{'value': 382197, 'timestp': '2020-02..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': [{'value': 948913, 'timestp': '2020-03..."


In [159]:
df_cmc

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_stats_tiktok,cmc_track_metadata
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': [{'value': 2100000, 'timestp': '2020-0...","{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,[Doja Cat],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': [{'value': 5300000, 'timestp': '2020-0...","{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': [{'value': 791700, 'timestp': '2020-01...","{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,[Kid Francescoli],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': [{'value': 382197, 'timestp': '2020-02...","{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': [{'value': 948913, 'timestp': '2020-03...","{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."
...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,[Avril Lavigne],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '...",15674355,"{'obj': [{'value': 315800, 'timestp': '2020-02...","{'obj': {'id': 15674355, 'name': 'What The Hel..."
65,Towards the Sun,Rihanna,[Rihanna],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '...",13795060,"{'obj': [{'value': 0, 'timestp': '2020-01-07T0...","{'obj': {'id': 13795060, 'name': 'Towards The ..."
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","[Machine Gun Kelly, YUNGBLUD, Travis Barker]",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""...",23903635,"{'obj': [{'value': 60100, 'timestp': '2020-01-...","{'obj': {'id': 23903635, 'name': 'I Think I'm ..."
67,Myself,Bazzi,[Bazzi],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '...",19040153,"{'obj': [{'value': 1348745, 'timestp': '2020-0...","{'obj': {'id': 19040153, 'name': 'Myself', 'is..."


In [162]:
df_cmc.to_csv('cmc_tracks_data.csv', index=False)

In [163]:
df_teste = pd.read_csv('cmc_tracks_data.csv')
df_teste

Unnamed: 0,song,artists,artists_list,number_artists,sp_id,cmc_song_seach,cmc_id,cmc_track_stats_tiktok,cmc_track_metadata
0,Roxanne,Arizona Zervas,['Arizona Zervas'],1,696DnlkuDOXcMAnKlTgXXK,"{'obj': {'tracks': [{'id': 27228348, 'name': '...",27228348,"{'obj': [{'value': 2100000, 'timestp': '2020-0...","{'obj': {'id': 27228348, 'name': 'ROXANNE', 'i..."
1,Say So,Doja Cat,['Doja Cat'],1,3Dv1eDb0MEgF93GpLXlucZ,"{'obj': {'tracks': [{'id': 26951096, 'name': '...",26951096,"{'obj': [{'value': 5300000, 'timestp': '2020-0...","{'obj': {'id': 26951096, 'name': 'Say So', 'is..."
2,My Oh My,Camila Cabello feat. DaBaby,"['Camila Cabello', 'DaBaby']",2,3yOlyBJuViE2YSGn3nVE1K,"{'obj': {'tracks': [{'id': 27597895, 'name': '...",27597895,"{'obj': [{'value': 791700, 'timestp': '2020-01...","{'obj': {'id': 27597895, 'name': 'My Oh My', '..."
3,Moon,Kid Francescoli,['Kid Francescoli'],1,24upABZ8A0sAepfu91sEYr,"{'obj': {'tracks': [{'id': 12263271, 'name': '...",12263271,"{'obj': [{'value': 382197, 'timestp': '2020-02...","{'obj': {'id': 12263271, 'name': 'Moon', 'isrc..."
4,Vibe,Cookiee Kawaii,['Cookiee Kawaii'],1,4gOgQTv9RYYFZ1uQNnlk3q,"{'obj': {'tracks': [{'id': 25138356, 'name': '...",25138356,"{'obj': [{'value': 948913, 'timestp': '2020-03...","{'obj': {'id': 25138356, 'name': 'Vibe', 'isrc..."
...,...,...,...,...,...,...,...,...,...
64,What the Hell,Avril Lavigne,['Avril Lavigne'],1,2z4U9d5OAA4YLNXoCgioxo,"{'obj': {'tracks': [{'id': 15674355, 'name': '...",15674355,"{'obj': [{'value': 315800, 'timestp': '2020-02...","{'obj': {'id': 15674355, 'name': 'What The Hel..."
65,Towards the Sun,Rihanna,['Rihanna'],1,1UuZhGTon3gzXQAJzNa2A4,"{'obj': {'tracks': [{'id': 13795060, 'name': '...",13795060,"{'obj': [{'value': 0, 'timestp': '2020-01-07T0...","{'obj': {'id': 13795060, 'name': 'Towards The ..."
66,I Think I'm OKAY,"Machine Gun Kelly, YUNGBLUD, and Travis Barker","['Machine Gun Kelly', 'YUNGBLUD', 'Travis Bark...",3,2gTdDMpNxIRFSiu7HutMCg,"{'obj': {'tracks': [{'id': 23903635, 'name': ""...",23903635,"{'obj': [{'value': 60100, 'timestp': '2020-01-...","{'obj': {'id': 23903635, 'name': ""I Think I'm ..."
67,Myself,Bazzi,['Bazzi'],1,5YLHLxoZsodDWjqSgjhBf3,"{'obj': {'tracks': [{'id': 19040153, 'name': '...",19040153,"{'obj': [{'value': 1348745, 'timestp': '2020-0...","{'obj': {'id': 19040153, 'name': 'Myself', 'is..."


### Export dataset

In [None]:
df_cmc.to_csv('cmc_tracks_info.csv', index=False)

In [None]:
df_teste = pd.read_csv('cmc_tracks_info.csv')
df_teste

## Artists

In [139]:
df_cmc_artists = df_artists.drop(columns=['spotify_artist'])
df_cmc_artists

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486223
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3435338
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17978647
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4317753
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82230
...,...,...,...,...,...
83,Machine Gun Kelly,6TIYQ3jFPwQSRmorSezPxX,"[ohio hip hop, pop rap, rap]",86,2451137
84,YUNGBLUD,6Ad91Jof8Niiw0lGLLi3NW,"[british indie rock, modern alternative rock, ...",78,1083768
85,Travis Barker,4exLIFE8sISLr28sqG1qNX,[rap rock],77,252209
86,Bazzi,4GvEc3ANtPPjt1ZJllr5Zl,"[pop, post-teen pop]",83,3669318


### Get Artists IDs

In [183]:
teste = cmc_search_artist_spotify('5cj0lLjcoR7YOSnhnX0Po5')

In [184]:
teste

{'obj': {'artists': [{'id': 217671,
    'name': 'Doja Cat',
    'image_url': 'https://i.scdn.co/image/c0492ddbdf41c4595ee1334d3f896ea786005fe9',
    'isni': '0000000465726144',
    'code2': 'us',
    'hometown_city': 'Los Angeles',
    'current_city': 'Los Angeles',
    'sp_followers': 3414060,
    'sp_popularity': 88,
    'sp_monthly_listeners': 31770199,
    'deezer_fans': 345458,
    'tags': ['pop', 'pop rap', 'la indie', 'electropop'],
    'spotify_artist_ids': ['5cj0lLjcoR7YOSnhnX0Po5'],
    'itunes_artist_ids': [830588310],
    'deezer_artist_ids': ['5578942'],
    'cm_artist_rank': 58,
    'amazon_artist_ids': ['B00IUPTW5G']}]}}

In [164]:
def cmc_search_artist_spotify(query: str, limit=10, offset=0):
    
    '''
    Searches for Chartmetric artist ID using Spotify ID
    
    Args: spotify ID
    ------
    
    Returns: json
    '''
    
    search = f'https://open.spotify.com/artist/{query}'
    url = f'https://api.chartmetric.com/api/search?q={search}&limit={limit}&offset={offset}&type=artists'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [185]:
cmc_artist_id = [cmc_search_artist_spotify(id) for id in tqdm(df_cmc_artists['sp_id'])]

HBox(children=(FloatProgress(value=0.0, max=88.0), HTML(value='')))

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [176]:
df_cmc_artists['cmc_search'] = cmc_artist_id

In [177]:
df_cmc_artists.head()

Unnamed: 0,artist,sp_id,sp_genres,sp_popularity,sp_followers,cmc_search
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,"[pop rap, rap, rhode island rap]",80,486223,{'error': 'jwt expired'}
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,"[la indie, pop]",88,3435338,{'error': 'jwt expired'}
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,"[dance pop, pop, post-teen pop]",87,17978647,{'error': 'jwt expired'}
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,"[north carolina hip hop, rap]",95,4317753,{'error': 'jwt expired'}
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"[electronica, french indie pop, french indietr...",61,82230,{'error': 'jwt expired'}


### Get artist metadata

In [110]:
def get_cmc_artist_metadata(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [111]:
cmc_art_metadata = [get_cmc_artist_metadata(id) for id in tqdm(df_cmc_artists.cmc_art_id)]

HBox(children=(FloatProgress(value=0.0, max=83.0), HTML(value='')))




In [112]:
df_cmc_artists['cmc_art_metadata'] = cmc_art_metadata
df_cmc_artists.head()

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_art_id,cmc_art_metadata
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,485529,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150,"{'obj': {'id': 64150, 'name': 'Arizona Zervas'..."
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3424764,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671,"{'obj': {'id': 217671, 'name': 'Doja Cat', 'cr..."
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17967991,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302,"{'obj': {'id': 454302, 'name': 'Camila Cabello..."
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4308915,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544,"{'obj': {'id': 398544, 'name': 'DaBaby', 'crea..."
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82134,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365,"{'obj': {'id': 147365, 'name': 'Kid Francescol..."


### Get spotify Monthly Listeners by City

In [114]:
def get_cmc_artist_sp_city(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}/where-people-listen'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [115]:
cmc_art_sp_city = [get_cmc_artist_sp_city(id) for id in tqdm(df_cmc_artists.cmc_art_id)]

HBox(children=(FloatProgress(value=0.0, max=83.0), HTML(value='')))




In [116]:
df_cmc_artists['cmc_art_sp_city'] = cmc_art_sp_city
df_cmc_artists.head()

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_art_id,cmc_art_metadata,cmc_art_sp_city
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,485529,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150,"{'obj': {'id': 64150, 'name': 'Arizona Zervas'...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3424764,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671,"{'obj': {'id': 217671, 'name': 'Doja Cat', 'cr...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17967991,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302,"{'obj': {'id': 454302, 'name': 'Camila Cabello...",{'obj': {'Brisbane': [{'timestp': '2020-03-20T...
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4308915,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544,"{'obj': {'id': 398544, 'name': 'DaBaby', 'crea...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82134,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365,"{'obj': {'id': 147365, 'name': 'Kid Francescol...",{'obj': {'Berlin': [{'timestp': '2020-03-20T00...


## Tiktok Audience Data


In [118]:
def get_cmc_artist_tiktok_audience_data(cmc_id: str):
    
    
    
    url = f'https://api.chartmetric.com/api/artist/{cmc_id}/tiktok-audience-stats'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [119]:
cmc_art_tiktok_audience = [get_cmc_artist_tiktok_audience_data(id) for id in tqdm(df_cmc_artists.cmc_art_id)]

HBox(children=(FloatProgress(value=0.0, max=83.0), HTML(value='')))




In [120]:
df_cmc_artists['cmc_art_tiktok_audience'] = cmc_art_tiktok_audience
df_cmc_artists.head()

Unnamed: 0,artist,sp_id,spotify_artist,sp_genres,sp_popularity,sp_followers,cmc_search,cmc_art_id,cmc_art_metadata,cmc_art_sp_city,cmc_art_tiktok_audience
0,Arizona Zervas,0vRvGUQVUjytro0xpb26bs,{'external_urls': {'spotify': 'https://open.sp...,"[pop rap, rap, rhode island rap]",80,485529,"{'obj': {'artists': [{'id': 64150, 'name': 'Ar...",64150,"{'obj': {'id': 64150, 'name': 'Arizona Zervas'...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...,{'obj': {'top_countries': [{'name': 'United St...
1,Doja Cat,5cj0lLjcoR7YOSnhnX0Po5,{'external_urls': {'spotify': 'https://open.sp...,"[la indie, pop]",88,3424764,"{'obj': {'artists': [{'id': 217671, 'name': 'D...",217671,"{'obj': {'id': 217671, 'name': 'Doja Cat', 'cr...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...,{'obj': {'top_countries': [{'name': 'United St...
2,Camila Cabello,4nDoRrQiYLoBzwC5BhVJzF,{'external_urls': {'spotify': 'https://open.sp...,"[dance pop, pop, post-teen pop]",87,17967991,"{'obj': {'artists': [{'id': 454302, 'name': 'C...",454302,"{'obj': {'id': 454302, 'name': 'Camila Cabello...",{'obj': {'Brisbane': [{'timestp': '2020-03-20T...,{'obj': {'top_countries': [{'name': 'United St...
3,DaBaby,4r63FhuTkUYltbVAg5TQnk,{'external_urls': {'spotify': 'https://open.sp...,"[north carolina hip hop, rap]",95,4308915,"{'obj': {'artists': [{'id': 398544, 'name': 'D...",398544,"{'obj': {'id': 398544, 'name': 'DaBaby', 'crea...",{'obj': {'Atlanta': [{'timestp': '2020-03-20T0...,"{'obj': {'top_countries': [], 'audience_gender..."
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,{'external_urls': {'spotify': 'https://open.sp...,"[electronica, french indie pop, french indietr...",61,82134,"{'obj': {'artists': [{'id': 147365, 'name': 'K...",147365,"{'obj': {'id': 147365, 'name': 'Kid Francescol...",{'obj': {'Berlin': [{'timestp': '2020-03-20T00...,"{'obj': {'top_countries': [], 'audience_gender..."


### Export dataset

In [None]:
df_cmc_artists.to_csv('cmc_artists_info.csv', index=False)

In [None]:
df_teste2 = pd.read_csv('cmc_artists_info.csv')
df_teste2

# Charts

In [174]:
def get_cmc_tiktok_charts(date):
    
    
    
    url = f'https://api.chartmetric.com/api/charts/tiktok/tracks?date={date}&interval=weekly'
    
    headers = {
    'Authorization': 'Bearer ' + token
    }
    
    response = requests.get(url, headers=headers)
    
    return response.json()

In [None]:
cmc_tiktok_chart = get_cmc_tiktok_charts('2020-09-08')

In [None]:
df_cmc_chart = pd.DataFrame(cmc_tiktok_chart).reset_index()
df_cmc_chart.to_csv('cmc_chart.csv', index=False)

# Final Dataframe

In [103]:
'''df = df_base.drop(['spotify_search'], axis=1)
df.head()'''

Unnamed: 0,song,artists,artists_list,number_artists,artist_1,artist_2,artist_3,sp_id,sp_duration_ms,sp_popularity,...,sp_loudness,sp_mode,sp_speechiness,sp_acousticness,sp_instrumentalness,sp_liveness,sp_valence,sp_tempo,sp_time_signature,lastfm_tags
0,Roxanne,Arizona Zervas,[Arizona Zervas],1,Arizona Zervas,no-artist,no-artist,696DnlkuDOXcMAnKlTgXXK,163636,89,...,-5.616,0,0.148,0.0522,0.0,0.46,0.457,116.735,5,[hip hop]
1,Say So,Doja Cat,[Doja Cat],1,Doja Cat,no-artist,no-artist,3Dv1eDb0MEgF93GpLXlucZ,237893,89,...,-4.577,0,0.158,0.256,3.57e-06,0.0904,0.786,110.962,4,"[pop, disco, hip hop]"
2,My Oh My,Camila Cabello feat. DaBaby,"[Camila Cabello, DaBaby]",2,Camila Cabello,DaBaby,no-artist,3yOlyBJuViE2YSGn3nVE1K,170746,83,...,-6.024,1,0.0296,0.018,1.29e-05,0.0887,0.383,105.046,4,[pop]
3,Moon,Kid Francescoli,[Kid Francescoli],1,Kid Francescoli,no-artist,no-artist,24upABZ8A0sAepfu91sEYr,390638,70,...,-10.002,1,0.0345,0.288,0.856,0.102,0.0584,117.986,4,"[chillout, indie pop]"
4,Vibe,Cookiee Kawaii,[Cookiee Kawaii],1,Cookiee Kawaii,no-artist,no-artist,4gOgQTv9RYYFZ1uQNnlk3q,83940,73,...,-8.719,1,0.344,0.0635,0.00932,0.118,0.175,159.947,4,[no-tag]
