# ETL Paraiso Records con Deezer Music

Este proyecto inicia desde el proceso inicial para el análisis de datos, llamado ETL. Utilice la API de Deezer for Developers como un reto y además para seguir desarrollando mis habilidades en la obtención de datos y no usar datos procesados de Kaggle. Los artistas pertenecen al sello discografico [Paraiso Records](https://paraisorecords.be/)

## Inicio del proceso de ETL

### Preparación de variables
En primer lugar, hay que importar las librerias que vamos a necesitar

In [1]:
# Importar librerias
import pandas as pd
import requests
import json
import time

Se crea la lista con los 10 artistas pertenecientes a [Paraiso Records](https://paraisorecords.be/artists)

In [2]:
# Lista de los 10 artistas del sello
artistas_del_sello = [
    'https://api.deezer.com/artist/61577792',
    'https://api.deezer.com/artist/238244491',
    'https://api.deezer.com/artist/154687631',
    'https://api.deezer.com/artist/131863182',
    'https://api.deezer.com/artist/13052145',
    'https://api.deezer.com/artist/120701732',
    'https://api.deezer.com/artist/283613031',
    'https://api.deezer.com/artist/4843148',
    'https://api.deezer.com/artist/11327858',
    'https://api.deezer.com/artist/132110672'
]

Se crea la lista para los guardar los indices o etiquetas de cada canción, lo vamos a usar mas adelante

In [3]:
# Campos para los tracks
campos_track = ['id', 'title', 'isrc', 'link', 'duration', 'track_position', 'rank', 'release_date', 'bpm', 'gain']

Se crean las listas para guardar los datos obtenidos de artistas, playlist y canciones de cada uno de los artistas

In [4]:
# Listas para almacenar los datos
lista_artistas = []
lista_playlists = []
lista_tracks = []

Se crea la función para evitar problemas al comunicarme con la pagina

In [5]:
# Función para hacer pausas entre requests (evitar rate limiting)
def hacer_pausa(segundos=0.5):
    time.sleep(segundos)

### 1. Paso: Obtener datos de cada artista

In [6]:
# 1. OBTENER DATOS DE LOS ARTISTAS
print("Obteniendo datos de los artistas...")
for i, url_artista in enumerate(artistas_del_sello):
    try:
        response = requests.get(url_artista)
        if response.status_code == 200:
            data_artista = response.json()
            
            # Agregar artista a la lista
            data_artista['artist_url'] = url_artista  # Guardar la URL original
            lista_artistas.append(data_artista)
            
            print(f"Artista {i+1}/{len(artistas_del_sello)}: {data_artista.get('name', 'Nombre no disponible')}")
            
            # Pequeña pausa entre requests
            hacer_pausa(0.3)
        else:
            print(f"Error obteniendo artista {url_artista}: Código {response.status_code}")
    except Exception as e:
        print(f"Error procesando artista {url_artista}: {e}")

Obteniendo datos de los artistas...
Artista 1/10: THIERRY VON DER WARTH
Artista 2/10: Chiara Meloni
Artista 3/10: Jorden Dux
Artista 4/10: Jonas Dufrasne
Artista 5/10: C3DRIC
Artista 6/10: Voltage DJ
Artista 7/10: Jack David
Artista 8/10: Alon
Artista 9/10: Alber-K
Artista 10/10: Mecdoux


Luego de obtenerlos de manera correcta, procedemos a mostrar el resultado como un marco de datos

In [7]:
# Crear DataFrame de artistas
df_artistas = pd.json_normalize(lista_artistas)
print(f"\nSe obtuvieron datos de {len(df_artistas)} artistas")


Se obtuvieron datos de 10 artistas


In [8]:
df_artistas.head()

Unnamed: 0,id,name,link,share,picture,picture_small,picture_medium,picture_big,picture_xl,nb_album,nb_fan,radio,tracklist,type,artist_url
0,61577792,THIERRY VON DER WARTH,https://www.deezer.com/artist/61577792,https://www.deezer.com/artist/61577792?utm_sou...,https://api.deezer.com/artist/61577792/image,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,53,298,True,https://api.deezer.com/artist/61577792/top?lim...,artist,https://api.deezer.com/artist/61577792
1,238244491,Chiara Meloni,https://www.deezer.com/artist/238244491,https://www.deezer.com/artist/238244491?utm_so...,https://api.deezer.com/artist/238244491/image,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,7,3,True,https://api.deezer.com/artist/238244491/top?li...,artist,https://api.deezer.com/artist/238244491
2,154687631,Jorden Dux,https://www.deezer.com/artist/154687631,https://www.deezer.com/artist/154687631?utm_so...,https://api.deezer.com/artist/154687631/image,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,15,32,True,https://api.deezer.com/artist/154687631/top?li...,artist,https://api.deezer.com/artist/154687631
3,131863182,Jonas Dufrasne,https://www.deezer.com/artist/131863182,https://www.deezer.com/artist/131863182?utm_so...,https://api.deezer.com/artist/131863182/image,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,4,4,True,https://api.deezer.com/artist/131863182/top?li...,artist,https://api.deezer.com/artist/131863182
4,13052145,C3DRIC,https://www.deezer.com/artist/13052145,https://www.deezer.com/artist/13052145?utm_sou...,https://api.deezer.com/artist/13052145/image,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,40,48,True,https://api.deezer.com/artist/13052145/top?lim...,artist,https://api.deezer.com/artist/13052145


Ahora guardamos el resultado en un archivo CSV para usar posteriormente en el analisis

In [9]:
df_artistas.to_csv('artistas.csv', index=False)

### 2. Paso: Obtener datos de playlist de cada artista

In [10]:
# 2. OBTENER DATOS DE LAS PLAYLISTS (TRACKLIST) DE CADA ARTISTA
print("\nObteniendo playlists de los artistas...")
for artista in lista_artistas:
    try:
        artist_id = artista['id']
        artist_name = artista['name']
        tracklist_url = artista['tracklist']
        
        response_playlist = requests.get(tracklist_url)
        if response_playlist.status_code == 200:
            data_playlist = response_playlist.json()
            
            if data_playlist.get('data'):
                # Agregar información del artista a cada track de la playlist
                for track in data_playlist['data']:
                    track['artist_id'] = artist_id
                    track['artist_name'] = artist_name
                    track['artist_tracklist_url'] = tracklist_url
                    lista_playlists.append(track)
                
                print(f"Playlist de {artist_name}: {len(data_playlist['data'])} tracks")
            else:
                print(f"Artista {artist_name}: No tiene tracks en su playlist")
        
        # Pequeña pausa entre requests
        hacer_pausa(0.3)
        
    except Exception as e:
        print(f"Error obteniendo playlist de {artista.get('name', 'Artista desconocido')}: {e}")


Obteniendo playlists de los artistas...
Playlist de THIERRY VON DER WARTH: 37 tracks
Playlist de Chiara Meloni: 6 tracks
Playlist de Jorden Dux: 7 tracks
Playlist de Jonas Dufrasne: 2 tracks
Playlist de C3DRIC: 24 tracks
Artista Voltage DJ: No tiene tracks en su playlist
Playlist de Jack David: 4 tracks
Playlist de Alon: 15 tracks
Artista Alber-K: No tiene tracks en su playlist
Playlist de Mecdoux: 50 tracks


Luego de obtener una pequeña verificación de los datos podemos proceder a guardarlos en otro marco de datos

In [11]:
# Crear DataFrame de playlists
df_playlists = pd.DataFrame(lista_playlists)
if not df_playlists.empty:
    # Seleccionar columnas relevantes
    columnas_playlist = ['id', 'title', 'link', 'duration', 'rank', 'artist_id', 'artist_name']
    df_playlists = df_playlists[columnas_playlist]
    print(f"\nSe obtuvieron {len(df_playlists)} tracks en total de todas las playlists")
else:
    df_playlists = pd.DataFrame(columns=['id', 'title', 'link', 'duration', 'rank', 'artist_id', 'artist_name'])
    print("\nNo se pudieron obtener datos de las playlists")


Se obtuvieron 145 tracks en total de todas las playlists


Ahora mostramos el marco de datos

In [12]:
df_playlists.head()

Unnamed: 0,id,title,link,duration,rank,artist_id,artist_name
0,2396806575,Paradise,https://www.deezer.com/track/2396806575,132,463335,61577792,THIERRY VON DER WARTH
1,1825965807,Sunset Lovers,https://www.deezer.com/track/1825965807,176,479212,61577792,THIERRY VON DER WARTH
2,2286552947,I Don't Wanna Know,https://www.deezer.com/track/2286552947,133,374467,61577792,THIERRY VON DER WARTH
3,3375647031,Mina Kupenda,https://www.deezer.com/track/3375647031,124,454490,61577792,THIERRY VON DER WARTH
4,2321698585,Waiting For You,https://www.deezer.com/track/2321698585,157,286123,61577792,THIERRY VON DER WARTH


Guardamos...

In [13]:
df_playlists.to_csv('playlists.csv', index=False)

### 3. Paso: Obtener datos de cada pista

In [14]:
# 3. OBTENER DATOS DETALLADOS DE CADA TRACK
print("\nObteniendo datos detallados de los tracks...")
tracks_procesados = 0

for i, track_id in enumerate(df_playlists['id'].unique()):
    try:
        url_track = f'https://api.deezer.com/track/{track_id}'
        response_track = requests.get(url_track)
        
        if response_track.status_code == 200:
            data_track = response_track.json()
            
            # Extraer solo los campos que nos interesan
            track_data = {campo: data_track.get(campo) for campo in campos_track}
            
            # Agregar información del artista
            track_info_playlist = df_playlists[df_playlists['id'] == track_id].iloc[0]
            track_data['artist_id'] = track_info_playlist['artist_id']
            track_data['artist_name'] = track_info_playlist['artist_name']
            
            lista_tracks.append(track_data)
            tracks_procesados += 1
            
            if (i + 1) % 10 == 0:
                print(f"Procesados {i + 1} tracks...")
        
        # Pausa más larga entre requests de tracks para evitar rate limiting
        hacer_pausa(0.5)
        
    except Exception as e:
        print(f"Error obteniendo track {track_id}: {e}")


Obteniendo datos detallados de los tracks...
Procesados 10 tracks...
Procesados 20 tracks...
Procesados 30 tracks...
Procesados 40 tracks...
Procesados 50 tracks...
Procesados 60 tracks...
Procesados 70 tracks...
Procesados 80 tracks...
Procesados 90 tracks...
Procesados 100 tracks...
Procesados 110 tracks...
Procesados 120 tracks...
Procesados 130 tracks...


In [15]:
# Crear DataFrame de tracks
df_tracks = pd.DataFrame(lista_tracks)
print(f"\nSe obtuvieron datos detallados de {len(df_tracks)} tracks")


Se obtuvieron datos detallados de 139 tracks


In [16]:
df_tracks.head()

Unnamed: 0,id,title,isrc,link,duration,track_position,rank,release_date,bpm,gain,artist_id,artist_name
0,2396806575,Paradise,BE8LH2300176,https://www.deezer.com/track/2396806575,132,1,463335,2023-09-01,0,-9.8,61577792,THIERRY VON DER WARTH
1,1825965807,Sunset Lovers,BE8LH2200090,https://www.deezer.com/track/1825965807,176,1,479212,2022-08-12,0,-9.3,61577792,THIERRY VON DER WARTH
2,2286552947,I Don't Wanna Know,BE8LH2300136,https://www.deezer.com/track/2286552947,133,1,374467,2023-06-16,0,-10.0,61577792,THIERRY VON DER WARTH
3,3375647031,Mina Kupenda,NL2J92505861,https://www.deezer.com/track/3375647031,124,1,454490,2025-05-23,0,-8.7,61577792,THIERRY VON DER WARTH
4,2321698585,Waiting For You,BE8LH2300143,https://www.deezer.com/track/2321698585,157,1,286123,2023-07-14,0,-8.7,61577792,THIERRY VON DER WARTH


In [17]:
df_tracks.to_csv('tracks.csv', index=False)

Al final guardamos el otro marco de datos y procedemos a mostrar los tres marcos de datos obtenidos

### Resumen

In [18]:
df_artistas.head()

Unnamed: 0,id,name,link,share,picture,picture_small,picture_medium,picture_big,picture_xl,nb_album,nb_fan,radio,tracklist,type,artist_url
0,61577792,THIERRY VON DER WARTH,https://www.deezer.com/artist/61577792,https://www.deezer.com/artist/61577792?utm_sou...,https://api.deezer.com/artist/61577792/image,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,https://cdn-images.dzcdn.net/images/artist/b65...,53,298,True,https://api.deezer.com/artist/61577792/top?lim...,artist,https://api.deezer.com/artist/61577792
1,238244491,Chiara Meloni,https://www.deezer.com/artist/238244491,https://www.deezer.com/artist/238244491?utm_so...,https://api.deezer.com/artist/238244491/image,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,https://cdn-images.dzcdn.net/images/artist/b92...,7,3,True,https://api.deezer.com/artist/238244491/top?li...,artist,https://api.deezer.com/artist/238244491
2,154687631,Jorden Dux,https://www.deezer.com/artist/154687631,https://www.deezer.com/artist/154687631?utm_so...,https://api.deezer.com/artist/154687631/image,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,https://cdn-images.dzcdn.net/images/artist/f7d...,15,32,True,https://api.deezer.com/artist/154687631/top?li...,artist,https://api.deezer.com/artist/154687631
3,131863182,Jonas Dufrasne,https://www.deezer.com/artist/131863182,https://www.deezer.com/artist/131863182?utm_so...,https://api.deezer.com/artist/131863182/image,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,https://cdn-images.dzcdn.net/images/artist/8aa...,4,4,True,https://api.deezer.com/artist/131863182/top?li...,artist,https://api.deezer.com/artist/131863182
4,13052145,C3DRIC,https://www.deezer.com/artist/13052145,https://www.deezer.com/artist/13052145?utm_sou...,https://api.deezer.com/artist/13052145/image,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,https://cdn-images.dzcdn.net/images/artist/4cb...,40,48,True,https://api.deezer.com/artist/13052145/top?lim...,artist,https://api.deezer.com/artist/13052145


In [19]:
df_playlists.head()

Unnamed: 0,id,title,link,duration,rank,artist_id,artist_name
0,2396806575,Paradise,https://www.deezer.com/track/2396806575,132,463335,61577792,THIERRY VON DER WARTH
1,1825965807,Sunset Lovers,https://www.deezer.com/track/1825965807,176,479212,61577792,THIERRY VON DER WARTH
2,2286552947,I Don't Wanna Know,https://www.deezer.com/track/2286552947,133,374467,61577792,THIERRY VON DER WARTH
3,3375647031,Mina Kupenda,https://www.deezer.com/track/3375647031,124,454490,61577792,THIERRY VON DER WARTH
4,2321698585,Waiting For You,https://www.deezer.com/track/2321698585,157,286123,61577792,THIERRY VON DER WARTH


In [20]:
df_tracks.head()

Unnamed: 0,id,title,isrc,link,duration,track_position,rank,release_date,bpm,gain,artist_id,artist_name
0,2396806575,Paradise,BE8LH2300176,https://www.deezer.com/track/2396806575,132,1,463335,2023-09-01,0,-9.8,61577792,THIERRY VON DER WARTH
1,1825965807,Sunset Lovers,BE8LH2200090,https://www.deezer.com/track/1825965807,176,1,479212,2022-08-12,0,-9.3,61577792,THIERRY VON DER WARTH
2,2286552947,I Don't Wanna Know,BE8LH2300136,https://www.deezer.com/track/2286552947,133,1,374467,2023-06-16,0,-10.0,61577792,THIERRY VON DER WARTH
3,3375647031,Mina Kupenda,NL2J92505861,https://www.deezer.com/track/3375647031,124,1,454490,2025-05-23,0,-8.7,61577792,THIERRY VON DER WARTH
4,2321698585,Waiting For You,BE8LH2300143,https://www.deezer.com/track/2321698585,157,1,286123,2023-07-14,0,-8.7,61577792,THIERRY VON DER WARTH


Ahora con los tres marcos de datos guardados podemos proceder con el analisis...