# SpinEDA: Análisis exploratorio, NLP y + de la obra de Luis alberto Spinetta

Disclaimer: El proyecto tiene como objetivos ayudarme a aprender e incorporar diversos conceptos de python (usar APIs x ejemplo) y data science. Es probable que haya errores, formas de hacer mejor las cosas, etc. Se aceptan sugerencias de todo tipo.

# Libraries y funciones

In [1]:
import pandas as pd
#import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
import feather
from IPython.display import clear_output
from config import SPOT_CLIENT_ID, SPOT_CLIENT_SECRET, GENIUS_CLIENT_ID, GENIUS_CLIENT_SECRET, GENIUS_CLIENT_TOKEN

%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
def print_full(df):
    """
    Mostrar el dataframe completo, no solo el head y tail.
    
    df: Pandas.DataFrame object
    """
    #pd.set_option('display.max_rows', len(df))
    #display(df)
    #pd.reset_option('display.max_rows')
    with pd.option_context("display.max_rows", len(df), "display.max_columns", len(df.columns)):
        display(df)

***

# Spotify "scraping"

La idea es:
- Aprender a usar la API de spotify, en este a travaés de la librería Spotipy (python friendly API)
- Inicialmente pensé en usar la tabla de [Wikipedia](https://es.wikipedia.org/wiki/Anexo:Canciones_de_Luis_Alberto_Spinetta#1966_-_2020) que lista sus canciones, pero encontré varios problemas como que no contiene todos los temas de cada disco (no hay repeticiones por ejemplo, si el track X está en uno en vivo no aparece quizá en su versión de estudio), hay errores de variado tipo, no está separado el nro de track ni duración de cada track, faltaba el disco póstumo de principios de 2020 (lo tuve que agregar 💪), etc. así que esperamos con quizá un poco más de trabajo, obtener más información desde spotify.

In [3]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
scope = 'user-library-read'
client_credentials_manager = SpotifyClientCredentials(client_id=SPOT_CLIENT_ID,
                                                      client_secret=SPOT_CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
#Esconder el request response (headers, get, etc)
sp.trace=False 

In [5]:
artistas_id = {'Luis Alberto Spinetta': '', 'Invisible': '', 
               'Spinetta Jade': '', 
               'Almendra': '',
               'Pescado Rabioso': '',
               'Spinetta y los socios del desierto': ''}

Veamos un sample de la respuesta

In [6]:
sp.search(q='Almendra', type='artist')

{'artists': {'href': 'https://api.spotify.com/v1/search?query=Almendra&type=artist&offset=0&limit=10',
  'items': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7x2a9uyqlWbE9LwcoQWDTo'},
    'followers': {'href': None, 'total': 156007},
    'genres': ['argentine indie',
     'argentine rock',
     'latin alternative',
     'latin rock',
     'rock en espanol',
     'rock nacional'],
    'href': 'https://api.spotify.com/v1/artists/7x2a9uyqlWbE9LwcoQWDTo',
    'id': '7x2a9uyqlWbE9LwcoQWDTo',
    'images': [{'height': 640,
      'url': 'https://i.scdn.co/image/ca4882e0e7595714deeaefe3f66b9c17680c7d3c',
      'width': 640},
     {'height': 300,
      'url': 'https://i.scdn.co/image/31fd6d92426b3f223eb0a3dcd19468d94a002eac',
      'width': 300},
     {'height': 64,
      'url': 'https://i.scdn.co/image/205be44e74d4e045acb6e4a5dd495adab7caf5d4',
      'width': 64}],
    'name': 'Almendra',
    'popularity': 54,
    'type': 'artist',
    'uri': 'spotify:artist:7x2a9uyqlWbE9Lw

From Spotipy docs: Spotipy supports a number of different ID types:

        Spotify URI - The resource identifier that you can enter, for example, in the Spotify Desktop client’s search box to locate an artist, album, or track. Example: spotify:track:6rqhFgbbKwnb9MLmUQDhG6
        Spotify URL - An HTML link that opens a track, album, app, playlist or other Spotify resource in a Spotify client. Example: http://open.spotify.com/track/6rqhFgbbKwnb9MLmUQDhG6
        Spotify ID - A base-62 number that you can find at the end of the Spotify URI (see above) for an artist, track, album, etc. Example: 6rqhFgbbKwnb9MLmUQDhG6

In general, any Spotipy method that needs an artist, album, track or playlist ID will accept ids in any of the above form

***

Spotipy nos devuelve un diccionario medio nesteado con información variada de los artistas que matchean. Dada la popularidad del flaco, vamos a limitarnos al primer resultado, si estamos pifiando, luego lo sabremos.
Vamos a obtener el **id** asignado por spotify a cada artista para luego buscar todos respectivos álbumes y tracks.

In [7]:
for banda in artistas_id:
    artistas_id[banda] = sp.search(q=banda, type='artist', limit=1)[
        'artists']['items'][0]['id']

artistas_id

{'Luis Alberto Spinetta': '1MuQ2m2tg7naeRGAOxYZer',
 'Invisible': '3FjdJbt6Myq32uv7P4owM1',
 'Spinetta Jade': '3WxVICwFDAWMTWH8sELmRe',
 'Almendra': '7x2a9uyqlWbE9LwcoQWDTo',
 'Pescado Rabioso': '3q1NXsv9XypOUCJfEatXH9',
 'Spinetta y los socios del desierto': '7F5m9Jw4sg853wTDmmJvLi'}

Obtengamos los discos: artist_albums devuelve un dict donde key='items' es una lista con cada disco. De cada disco nos interesa  `id`, `name`, `album_type`, `total_tracks` y `release_date`. La API tiene un límite de 50 resultados por request. Por otro lado spotify tiene mal catalogado varios discos (figuran como albums pero son compilados), etc.

In [8]:
def get_albums_info(artist_id, album_type=None):
    '''
    artist_id: spotify id (https://spotipy.readthedocs.io/en/2.7.1/#ids-uris-and-urls)
    album_type: 'album', 'single', 'appears_on', 'compilation'. 
    Si album_type > 1 elemento, tiene que ser un string separado por comas: 'album,single'
    '''
    albums = []
    next_p = '_'
    offset = 0
    while next_p:
        alb_page = sp.artist_albums(
            artist_id, limit=50, offset=offset, album_type=album_type)

        albums.extend([(album['artists'][0]['name'], album['name'], album['album_type'],
                        album['release_date'], album['total_tracks'], album['id']) for album in alb_page['items']])
        offset += 50
        next_p = alb_page['next']
    return albums

In [9]:
full_list = []
for album_id in artistas_id.values():
    full_list.extend(get_albums_info(artist_id=album_id, album_type='album,single'))

In [10]:
albums_df = pd.DataFrame(data=full_list, columns=[
                         'Artista', 'Album', 'Tipo', 'Año', 'Total tracks', 'id'])
albums_df['Año'] = albums_df['Año'].apply(lambda x: x[:4])
print_full(albums_df)

Unnamed: 0,Artista,Album,Tipo,Año,Total tracks,id
0,Luis Alberto Spinetta,Ya No Mires Atrás,album,2020,7,6o3S8QBSO9oVNAbcVxdJsa
1,Luis Alberto Spinetta,Spinetta y Las Bandas Eternas,album,2010,48,3kuu5DGs4n815JjDI5dlNx
2,Luis Alberto Spinetta,Sí o Sí - Diario del Rock Argentino - Spinetta,album,2008,18,4F6R2CwxpEbdCkZuP6j65E
3,Luis Alberto Spinetta,Un Mañana,album,2008,12,4x9cchTAtN2KEyoE3Mg7FT
4,Fito Paez,La La La,album,2007,20,7MF4MER3ediE5kYLD6NrQ2
5,Luis Alberto Spinetta,Obras Cumbres,album,2006,35,2jRV6FSPqiJaNSwgVBD4G4
6,Luis Alberto Spinetta,Pan,album,2005,12,0xF4fvy8iMh3sT2O4bSgjA
7,Luis Alberto Spinetta,Para Los Arboles,album,2003,12,2Sx9o6kk0gRleobOzhSxkd
8,Luis Alberto Spinetta,Argentina Sorgo Films (Live At Obras / 2001),album,2002,12,40YUyQFHCZ4JaRbfZlMras
9,Luis Alberto Spinetta,Silver Sorgo,album,2001,12,0iSxQ0pzlYkXTOJdJ6rXWa


No es perfecto, pero es lo que hay. Vamos a sacar compilados y algunos repetidos.

In [11]:
albums_df.drop([2,4, 5, 10, 12, 17, 19, 33, 39, 40, 41, 42, 43, 47, 49], inplace=True)

Vamos también a dropear los discos en vivo y Singles/EPs por el momento y guardar en otro df

In [12]:
albums_studio_df = albums_df.drop(albums_df[albums_df['Tipo'] == 'single'].index)
albums_studio_df.drop([1,8,11,15, 25, 45], inplace=True)

In [13]:
print_full(albums_studio_df)

Unnamed: 0,Artista,Album,Tipo,Año,Total tracks,id
0,Luis Alberto Spinetta,Ya No Mires Atrás,album,2020,7,6o3S8QBSO9oVNAbcVxdJsa
3,Luis Alberto Spinetta,Un Mañana,album,2008,12,4x9cchTAtN2KEyoE3Mg7FT
6,Luis Alberto Spinetta,Pan,album,2005,12,0xF4fvy8iMh3sT2O4bSgjA
7,Luis Alberto Spinetta,Para Los Arboles,album,2003,12,2Sx9o6kk0gRleobOzhSxkd
9,Luis Alberto Spinetta,Silver Sorgo,album,2001,12,0iSxQ0pzlYkXTOJdJ6rXWa
13,Luis Alberto Spinetta,Fuego Gris,album,1993,17,5I7prWCiB53GD5hgkI0Br3
14,Luis Alberto Spinetta,Peluson Of Milk,album,1991,15,0sEqp7Del2dp8HmXE8Geqv
16,Luis Alberto Spinetta,Don Lucero,album,1989,9,4L5xTcKgy8CrAc2H1kN9VY
18,Luis Alberto Spinetta,Tester de Violencia,album,1988,11,3TU8Q7HQS3lNjkVIz5J2yv
20,Luis Alberto Spinetta,Prive,album,1986,11,11NynTWjF8rIm2Eu4RWiNS


Corrijamos la fila 48

In [14]:
albums_studio_df.at[48, 'Album'] = 'Almendra'

***

# Tracks features

In [143]:
#ejemplo del response
pd.DataFrame(sp.album_tracks('3XxDmLiEBKqFu1MiZ3fayw')['items']).head()

Unnamed: 0,artists,available_markets,disc_number,duration_ms,explicit,external_urls,href,id,is_local,name,preview_url,track_number,type,uri
0,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AR, AT, AU, BE, BG, BO, BR, CA, CH, CL, C...",1,226226,False,{'spotify': 'https://open.spotify.com/track/2r...,https://api.spotify.com/v1/tracks/2ri0lTjVeBkW...,2ri0lTjVeBkWAmRVkJoHXL,False,Post-Crucifixion,https://p.scdn.co/mp3-preview/7b6382a37b3a21d8...,1,track,spotify:track:2ri0lTjVeBkWAmRVkJoHXL
1,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AR, AT, AU, BE, BG, BO, BR, CA, CH, CL, C...",1,182773,False,{'spotify': 'https://open.spotify.com/track/6w...,https://api.spotify.com/v1/tracks/6wGKkvlvUHbd...,6wGKkvlvUHbdi6hlp3C8q7,False,Nena Boba,https://p.scdn.co/mp3-preview/b03f6a3e1e386d2a...,2,track,spotify:track:6wGKkvlvUHbdi6hlp3C8q7
2,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AR, AT, AU, BE, BG, BO, BR, CA, CH, CL, C...",1,209293,False,{'spotify': 'https://open.spotify.com/track/3A...,https://api.spotify.com/v1/tracks/3AIa10wSpvJp...,3AIa10wSpvJpugn8gZmeYv,False,Blues de Cris,https://p.scdn.co/mp3-preview/fe11c6425ee3afb2...,3,track,spotify:track:3AIa10wSpvJpugn8gZmeYv
3,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AR, AT, AU, BE, BG, BO, BR, CA, CH, CL, C...",1,306040,False,{'spotify': 'https://open.spotify.com/track/1q...,https://api.spotify.com/v1/tracks/1qf2RbvAdkOr...,1qf2RbvAdkOrjKeBmSYzXV,False,Como el Viento Voy a Ver,https://p.scdn.co/mp3-preview/751460e6a36e37fd...,4,track,spotify:track:1qf2RbvAdkOrjKeBmSYzXV
4,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AR, AT, AU, BE, BG, BO, BR, CA, CH, CL, C...",1,196160,False,{'spotify': 'https://open.spotify.com/track/2J...,https://api.spotify.com/v1/tracks/2JwfGxkK4BrW...,2JwfGxkK4BrWTCw0QQjGXN,False,Despiértate Nena,https://p.scdn.co/mp3-preview/0321c92cbbf42580...,5,track,spotify:track:2JwfGxkK4BrWTCw0QQjGXN


In [144]:
df_tracks = pd.DataFrame()
col_utiles = zip(albums_studio_df['Artista'], albums_studio_df['Album'],
                 albums_studio_df['Año'], albums_studio_df['id'])

for art, alb, año, alb_id in col_utiles:
    sp_response = pd.DataFrame(sp.album_tracks(alb_id)['items'])
    sp_response['Artista'], sp_response['Album'], sp_response['Año'] = art, alb, año
    df_tracks = pd.concat([df_tracks, sp_response], axis=0,
                          join='outer', ignore_index=True)

In [145]:
df_tracks

Unnamed: 0,artists,available_markets,disc_number,duration_ms,explicit,external_urls,href,id,is_local,name,preview_url,track_number,type,uri,Artista,Album,Año
0,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,179130,False,{'spotify': 'https://open.spotify.com/track/17...,https://api.spotify.com/v1/tracks/17a6EMs038tu...,17a6EMs038tuh87eggDJ1G,False,Veinte Ciudades,https://p.scdn.co/mp3-preview/81f84acf06679ce9...,1,track,spotify:track:17a6EMs038tuh87eggDJ1G,Luis Alberto Spinetta,Ya No Mires Atrás,2020
1,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,223532,False,{'spotify': 'https://open.spotify.com/track/53...,https://api.spotify.com/v1/tracks/535rcofAUexz...,535rcofAUexzCqS9xHkVkv,False,Ya No Mires Atrás,https://p.scdn.co/mp3-preview/bc1bb2c34d329c7c...,2,track,spotify:track:535rcofAUexzCqS9xHkVkv,Luis Alberto Spinetta,Ya No Mires Atrás,2020
2,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,326647,False,{'spotify': 'https://open.spotify.com/track/1S...,https://api.spotify.com/v1/tracks/1Sa25LKUNSTC...,1Sa25LKUNSTCxfd1METeBT,False,Agua de Río,https://p.scdn.co/mp3-preview/415c24347e8ae116...,3,track,spotify:track:1Sa25LKUNSTCxfd1METeBT,Luis Alberto Spinetta,Ya No Mires Atrás,2020
3,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,353587,False,{'spotify': 'https://open.spotify.com/track/1z...,https://api.spotify.com/v1/tracks/1zZCrhlS7oUs...,1zZCrhlS7oUsAs3rjGfmJK,False,"Nueva Luna, Mundo Arjo",https://p.scdn.co/mp3-preview/eac1fbb2da4acb46...,4,track,spotify:track:1zZCrhlS7oUsAs3rjGfmJK,Luis Alberto Spinetta,Ya No Mires Atrás,2020
4,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,267527,False,{'spotify': 'https://open.spotify.com/track/1r...,https://api.spotify.com/v1/tracks/1rabqQO8uUz8...,1rabqQO8uUz86di4NkbVPo,False,Merecer,https://p.scdn.co/mp3-preview/212e0d4e527aec6f...,5,track,spotify:track:1rabqQO8uUz86di4NkbVPo,Luis Alberto Spinetta,Ya No Mires Atrás,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
338,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,216506,False,{'spotify': 'https://open.spotify.com/track/2r...,https://api.spotify.com/v1/tracks/2rGHar8C29r3...,2rGHar8C29r3t1wExKeuQb,False,2 de Enero,https://p.scdn.co/mp3-preview/f695b9e973583d79...,12,track,spotify:track:2rGHar8C29r3t1wExKeuQb,Spinetta Y Los Socios Del Desierto,Spinetta y los Socios del Desierto,1997
339,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,49133,False,{'spotify': 'https://open.spotify.com/track/0L...,https://api.spotify.com/v1/tracks/0Lnk6DhCuPb0...,0Lnk6DhCuPb0UmAXMZWz2O,False,Cuenco de Sal,https://p.scdn.co/mp3-preview/3709b57a58097f3b...,13,track,spotify:track:0Lnk6DhCuPb0UmAXMZWz2O,Spinetta Y Los Socios Del Desierto,Spinetta y los Socios del Desierto,1997
340,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,235866,False,{'spotify': 'https://open.spotify.com/track/40...,https://api.spotify.com/v1/tracks/40V9rPDytBeE...,40V9rPDytBeEp7xutQjtDR,False,El Sol y la Afeitadora Eléctrica,https://p.scdn.co/mp3-preview/686770ee5b79c748...,14,track,spotify:track:40V9rPDytBeEp7xutQjtDR,Spinetta Y Los Socios Del Desierto,Spinetta y los Socios del Desierto,1997
341,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,145560,False,{'spotify': 'https://open.spotify.com/track/2y...,https://api.spotify.com/v1/tracks/2ygmx2TzNLGe...,2ygmx2TzNLGeHp4DNvgXAQ,False,Collar,https://p.scdn.co/mp3-preview/846c45ee8ca737ab...,15,track,spotify:track:2ygmx2TzNLGeHp4DNvgXAQ,Spinetta Y Los Socios Del Desierto,Spinetta y los Socios del Desierto,1997


In [146]:
for idx, row in df_tracks.iterrows():
    
    #audio_analysis = sp.audio_analysis(row['id'])['track']
    audio_features = sp.audio_features([row['id']])[0]
    df_tracks.at[idx, 'Duration'] = audio_features['duration_ms']
    df_tracks.at[idx, 'Acousticness'] = audio_features['acousticness']
    df_tracks.at[idx, 'Danceability'] = audio_features['danceability']
    df_tracks.at[idx, 'Energy'] = audio_features['energy']
    df_tracks.at[idx, 'Instrumentalness'] = audio_features['instrumentalness']
    df_tracks.at[idx, 'Liveness'] = audio_features['liveness']
    df_tracks.at[idx, 'Loudness'] = audio_features['loudness']
    df_tracks.at[idx, 'Speechiness'] = audio_features['speechiness']
    df_tracks.at[idx, 'Key'] = audio_features['key']
    df_tracks.at[idx, 'Tempo'] = audio_features['tempo']
    df_tracks.at[idx, 'Time_signature'] = audio_features['time_signature']
    df_tracks.at[idx, 'Valence'] = audio_features['valence']
    df_tracks.at[idx, 'Mode'] = audio_features['mode']
    clear_output(wait=True)
    print(idx)
    #print(idx, df_tracks.iloc[idx])

342


In [147]:
df_tracks

Unnamed: 0,artists,available_markets,disc_number,duration_ms,explicit,external_urls,href,id,is_local,name,...,Energy,Instrumentalness,Liveness,Loudness,Speechiness,Key,Tempo,Time_signature,Valence,Mode
0,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,179130,False,{'spotify': 'https://open.spotify.com/track/17...,https://api.spotify.com/v1/tracks/17a6EMs038tu...,17a6EMs038tuh87eggDJ1G,False,Veinte Ciudades,...,0.794,0.000046,0.2210,-9.396,0.0350,6.0,123.029,4.0,0.694,0.0
1,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,223532,False,{'spotify': 'https://open.spotify.com/track/53...,https://api.spotify.com/v1/tracks/535rcofAUexz...,535rcofAUexzCqS9xHkVkv,False,Ya No Mires Atrás,...,0.524,0.002480,0.1500,-11.140,0.0276,6.0,79.985,4.0,0.860,0.0
2,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,326647,False,{'spotify': 'https://open.spotify.com/track/1S...,https://api.spotify.com/v1/tracks/1Sa25LKUNSTC...,1Sa25LKUNSTCxfd1METeBT,False,Agua de Río,...,0.644,0.009090,0.2770,-10.334,0.0318,7.0,114.989,4.0,0.696,1.0
3,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,353587,False,{'spotify': 'https://open.spotify.com/track/1z...,https://api.spotify.com/v1/tracks/1zZCrhlS7oUs...,1zZCrhlS7oUsAs3rjGfmJK,False,"Nueva Luna, Mundo Arjo",...,0.472,0.000008,0.2700,-11.822,0.0318,2.0,116.831,3.0,0.279,0.0
4,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",1,267527,False,{'spotify': 'https://open.spotify.com/track/1r...,https://api.spotify.com/v1/tracks/1rabqQO8uUz8...,1rabqQO8uUz86di4NkbVPo,False,Merecer,...,0.669,0.000014,0.0984,-10.747,0.0492,2.0,141.955,4.0,0.808,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
338,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,216506,False,{'spotify': 'https://open.spotify.com/track/2r...,https://api.spotify.com/v1/tracks/2rGHar8C29r3...,2rGHar8C29r3t1wExKeuQb,False,2 de Enero,...,0.596,0.000018,0.1550,-6.132,0.0329,9.0,135.126,4.0,0.390,0.0
339,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,49133,False,{'spotify': 'https://open.spotify.com/track/0L...,https://api.spotify.com/v1/tracks/0Lnk6DhCuPb0...,0Lnk6DhCuPb0UmAXMZWz2O,False,Cuenco de Sal,...,0.533,0.004760,0.0922,-10.536,0.0424,9.0,139.666,4.0,0.148,1.0
340,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,235866,False,{'spotify': 'https://open.spotify.com/track/40...,https://api.spotify.com/v1/tracks/40V9rPDytBeE...,40V9rPDytBeEp7xutQjtDR,False,El Sol y la Afeitadora Eléctrica,...,0.483,0.000002,0.2220,-7.699,0.0282,2.0,145.406,4.0,0.411,1.0
341,[{'external_urls': {'spotify': 'https://open.s...,"[AD, AE, AR, AT, AU, BE, BG, BH, BO, BR, CA, C...",2,145560,False,{'spotify': 'https://open.spotify.com/track/2y...,https://api.spotify.com/v1/tracks/2ygmx2TzNLGe...,2ygmx2TzNLGeHp4DNvgXAQ,False,Collar,...,0.405,0.000026,0.1490,-9.758,0.0230,7.0,93.375,3.0,0.193,1.0


In [148]:
df_tracks.columns

Index(['artists', 'available_markets', 'disc_number', 'duration_ms',
       'explicit', 'external_urls', 'href', 'id', 'is_local', 'name',
       'preview_url', 'track_number', 'type', 'uri', 'Artista', 'Album', 'Año',
       'Duration', 'Acousticness', 'Danceability', 'Energy',
       'Instrumentalness', 'Liveness', 'Loudness', 'Speechiness', 'Key',
       'Tempo', 'Time_signature', 'Valence', 'Mode'],
      dtype='object')

Borremos columnas inútiles

In [149]:
df_tracks.drop(['artists', 'available_markets', 'explicit', 'duration_ms',
                'external_urls', 'href', 'is_local',
                'preview_url', 'type', 'uri'], axis=1, inplace=True)

Rename & reorder

In [150]:
df_tracks.columns

Index(['disc_number', 'id', 'name', 'track_number', 'Artista', 'Album', 'Año',
       'Duration', 'Acousticness', 'Danceability', 'Energy',
       'Instrumentalness', 'Liveness', 'Loudness', 'Speechiness', 'Key',
       'Tempo', 'Time_signature', 'Valence', 'Mode'],
      dtype='object')

In [151]:
df_tracks.columns = ['Nro_disco', 'id', 'Nombre', 'Track_nro', 'Artista', 'Album', 
                     'Año', 'Duracion', 'Acousticness', 'Danceability', 'Energy',
                     'Instrumentalness', 'Liveness', 'Loudness', 'Speechiness', 'Key',
                     'Tempo', 'Time_signature', 'Valence', 'Mode']

In [153]:
df_tracks = df_tracks[['Nombre', 'Album', 'Artista', 'Track_nro', 'Nro_disco', 'Año',
                       'Duracion', 'Acousticness', 'Danceability', 'Energy',
                       'Instrumentalness', 'Liveness', 'Loudness', 'Speechiness', 'Key',
                       'Tempo', 'Time_signature', 'Valence', 'Mode','id']]

In [154]:
df_tracks.columns

Index(['Nombre', 'Album', 'Artista', 'Track_nro', 'Nro_disco', 'Año',
       'Duracion', 'Acousticness', 'Danceability', 'Energy',
       'Instrumentalness', 'Liveness', 'Loudness', 'Speechiness', 'Key',
       'Tempo', 'Time_signature', 'Valence', 'Mode', 'id'],
      dtype='object')

In [155]:
df_tracks.loc[:, 'Duracion'] = df_tracks.loc[:, 'Duracion'] / (1000*60)

In [157]:
print_full(df_tracks)

Unnamed: 0,Nombre,Album,Artista,Track_nro,Nro_disco,Año,Duracion,Acousticness,Danceability,Energy,Instrumentalness,Liveness,Loudness,Speechiness,Key,Tempo,Time_signature,Valence,Mode,id
0,Veinte Ciudades,Ya No Mires Atrás,Luis Alberto Spinetta,1,1,2020,2.9855,0.0872,0.751,0.794,4.6e-05,0.221,-9.396,0.035,6.0,123.029,4.0,0.694,0.0,17a6EMs038tuh87eggDJ1G
1,Ya No Mires Atrás,Ya No Mires Atrás,Luis Alberto Spinetta,2,1,2020,3.72555,0.433,0.65,0.524,0.00248,0.15,-11.14,0.0276,6.0,79.985,4.0,0.86,0.0,535rcofAUexzCqS9xHkVkv
2,Agua de Río,Ya No Mires Atrás,Luis Alberto Spinetta,3,1,2020,5.444117,0.246,0.722,0.644,0.00909,0.277,-10.334,0.0318,7.0,114.989,4.0,0.696,1.0,1Sa25LKUNSTCxfd1METeBT
3,"Nueva Luna, Mundo Arjo",Ya No Mires Atrás,Luis Alberto Spinetta,4,1,2020,5.893117,0.645,0.556,0.472,8e-06,0.27,-11.822,0.0318,2.0,116.831,3.0,0.279,0.0,1zZCrhlS7oUsAs3rjGfmJK
4,Merecer,Ya No Mires Atrás,Luis Alberto Spinetta,5,1,2020,4.4588,0.194,0.75,0.669,1.4e-05,0.0984,-10.747,0.0492,2.0,141.955,4.0,0.808,0.0,1rabqQO8uUz86di4NkbVPo
5,Luces y Sombras,Ya No Mires Atrás,Luis Alberto Spinetta,6,1,2020,3.381833,0.0681,0.64,0.421,0.000662,0.107,-11.478,0.0285,2.0,81.497,4.0,0.617,0.0,4RPeRJIDDEEuIhu3g3STq1
6,Diadema,Ya No Mires Atrás,Luis Alberto Spinetta,7,1,2020,4.752583,0.582,0.772,0.271,0.000323,0.113,-16.008,0.0523,1.0,110.004,4.0,0.188,0.0,1nOcIY9qJquHzgyw90IwS5
7,La Mendiga,Un Mañana,Luis Alberto Spinetta,1,1,2008,5.813333,0.641,0.316,0.537,0.0264,0.141,-7.971,0.0341,2.0,80.833,3.0,0.244,1.0,4EdJEG1WBMCuJyxCK3a6Ls
8,Vacío Sideral,Un Mañana,Luis Alberto Spinetta,2,1,2008,4.359117,0.744,0.233,0.523,0.000231,0.184,-9.495,0.102,7.0,77.791,5.0,0.258,0.0,0cnuTB89ZHCbsqg8RfgHiM
9,No Quiere Decir,Un Mañana,Luis Alberto Spinetta,3,1,2008,3.985333,0.676,0.483,0.453,0.00107,0.102,-10.789,0.0377,7.0,141.753,4.0,0.286,1.0,4qwaQvUrRwK2t1G6Q5pwbY


In [158]:
df_tracks.at[49, 'Nombre'] = 'Abrázame Inocentemente'

Guardamos el dataframe

In [159]:
df_tracks.to_feather('Tracks')

# Tracks letras

In [160]:
#df_tracks = pd.read_feather('tracks')

In [161]:
import lyricsgenius

In [163]:
genius = lyricsgenius.Genius(GENIUS_CLIENT_TOKEN, sleep_time=1)
genius.verbose = False # Turn off status messages
genius.remove_section_headers = True # Remove section headers (e.g. [Chorus]) from lyrics when searching
genius.skip_non_songs = False # Include hits thought to be non-songs (e.g. track lists)
genius.excluded_terms = ["(Remix)", "(Live)"] # Exclude songs with these words in their title

In [164]:
df_tracks['Letra'] = ''

In [167]:
def get_lyrics():
#REVISAR POR QUE not_found  SE VACÍA EN EL ÚLTIMO CICLO
#si no encuentra la letra, salta un AttributeError, así que hay que agarrarlo
#luego analizamos cuáles fallaron a ver por qué

    not_found = []

    for i in df_tracks.index:
        art = df_tracks.loc[i, 'Artista']
        nom = df_tracks.loc[i, 'Nombre']
        
        if df_tracks.loc[i, 'Letra'] == '':
            try:
                letra = genius.search_song(nom, artist=art, get_full_info=False)
                df_tracks.at[i, 'Letra'] = letra.lyrics
                print(letra.lyrics)
                clear_output(wait=True)
            except AttributeError:
                not_found.append((i,nom))
                print(*not_found,sep='\n')
                clear_output(wait=True)
                continue
                

In [168]:
get_lyrics()

(118, 'Paquidermo De Luxe - Instrumental')
(122, 'Símil Bahion - Instrumental')
(127, 'El Bálsamo - Instrumental')
(128, 'Para Valen - Instrumental')
(130, 'Tango Cromado - Instrumental')
(135, 'Almendra')
(146, 'Interlude (Jade I)')
(147, 'Interlude (Jade II)')
(152, 'Interlude (Jade III)')
(154, 'Telgopor')
(156, 'A Dieciocho Minutos del Sol')
(163, 'Tema de Pedro')
(165, 'Estrella')
(169, 'Alteración de Tiempo')
(174, 'Alarma Entre Los Angeles')
(188, 'Tema de Elmo Lesto')
(198, 'Mula Alma - Instrumental')
(204, 'Ping Pong - Instrumental')
(214, 'Siguiendo Los Pasos Del Maestro')
(218, 'Influjo Estelar')
(220, 'Amenábar')
(225, 'Digital Ayatollah')
(248, 'Obertura (Obertura Beat)')
(250, 'Verde Llano')
(273, 'Panadero Ensoñado')
(283, 'Señorita Zapada')
(290, 'Aguas Claras de Olimpos')
(296, 'Ven Vení')
(297, 'Ave Seca')
(298, 'Donde No Se Lee')
(299, 'Extiéndete Una Vez Más')
(300, 'No Me Alcanza')
(301, 'Ekathé I')
(302, 'Nómbrala')
(303, 'Bahía Final')
(305, 'Bagatelle')
(306, 'E

In [170]:
len(not_found)

57

Fracasamos. Vamos a primero marcar los intrumentales

In [188]:
instrumentales = [118, 122, 127, 128, 130, 135, 146, 147, 152, 154, 156,
                  163, 165, 169, 174, 188, 198, 204, 214, 218, 220, 225, 248, 250, 273, 283,334]
df_tracks.loc[instrumentales, 'Letra'] = '[INSTRUMENTAL]'

Vamos a usar otra librería a ver si encuentra los que no son instrumentales

In [189]:
import lyricwikia

not_found = []

for i in df_tracks.index:
    art = df_tracks.loc[i, 'Artista']
    nom = df_tracks.loc[i, 'Nombre']

    if df_tracks.loc[i, 'Letra'] == '':
        try:
            letra = lyricwikia.get_lyrics(song=nom)
            df_tracks.at[i, 'Letra'] = letra.lyrics
            #print(letra.lyrics)
            #clear_output(wait=True)
        except:
            not_found.append((i,nom))
            print(*not_found,sep='\n')
            clear_output(wait=True)
            continue

(290, 'Aguas Claras de Olimpos')
(296, 'Ven Vení')
(297, 'Ave Seca')
(298, 'Donde No Se Lee')
(300, 'No Me Alcanza')
(301, 'Ekathé I')
(302, 'Nómbrala')
(305, 'Bagatelle')
(306, 'Ekathé II - Mono')
(308, 'Guíame')
(312, 'Los Duendes')
(313, 'Sub Rebaño')
(322, 'Así Nunca Encontrarás el Mar')
(323, 'Cuentas de un Collar')
(325, 'Zonda')
(336, 'El Rebaño del Pastor')
(337, 'Puyen De Abril')
(339, 'Cuenco de Sal')
(342, 'Duende')


In [190]:
len(not_found)

19

No hubo éxito. Aparentemente hay algunas letras en Genius pero están bajo otro nombre de artista (`Spinetta y los socios del desierto `vs `Luis Alberto Spinetta`), así que vamos a probar varios

In [183]:
not_found = []
for i in df_tracks.index:
    #Probar Luis Alberto Spinetta, Spinetta y los Socios del Desierto
    art = 'Spinetta y los Socios del Desierto'
    nom = df_tracks.loc[i, 'Nombre']

    if df_tracks.loc[i, 'Letra'] == '':
        try:
            letra = genius.search_song(nom, artist=art, get_full_info=False)
            df_tracks.at[i, 'Letra'] = letra.lyrics
            print(letra.lyrics)
            clear_output(wait=True)
        except AttributeError:
            not_found.append((i,nom))
            print(*not_found,sep='\n')
            clear_output(wait=True)
            continue


(290, 'Aguas Claras de Olimpos')
(296, 'Ven Vení')
(297, 'Ave Seca')
(298, 'Donde No Se Lee')
(300, 'No Me Alcanza')
(301, 'Ekathé I')
(302, 'Nómbrala')
(305, 'Bagatelle')
(306, 'Ekathé II - Mono')
(308, 'Guíame')
(312, 'Los Duendes')
(313, 'Sub Rebaño')
(322, 'Así Nunca Encontrarás el Mar')
(323, 'Cuentas de un Collar')
(325, 'Zonda')
(334, 'Wasabi Flash')
(336, 'El Rebaño del Pastor')
(337, 'Puyen De Abril')
(339, 'Cuenco de Sal')
(342, 'Duende')


In [191]:
len(not_found)

19

In [186]:
df_tracks.to_feather('tracks')