## Test Género
En este notebook, analizaremos si es posible mejorar los modelos clasificando las canciones por géneros.

In [1]:
import pandas as pd

In [2]:
df_final=pd.read_csv('df_final.csv')

In [3]:
df_final['Genero']

0              italian hip hop, italian pop, trap italiana
1                                                      NaN
2        ambient, braindance, electronica, intelligent ...
3                           jam band, neo mellow, pop rock
4                        argentine hip hop, trap argentino
                               ...                        
16861                                        j-pop, j-rock
16862                                                k-pop
16863                                    k-pop, korean r&b
16864                              k-pop, k-pop girl group
16865                                    k-pop, korean r&b
Name: Genero, Length: 16866, dtype: object

In [4]:
df_final['Stream'].describe()

count    1.686600e+04
mean     1.362465e+08
std      2.433885e+08
min      6.574000e+03
25%      1.748466e+07
50%      4.993142e+07
75%      1.391041e+08
max      3.386520e+09
Name: Stream, dtype: float64

Aquí vamos a ordenar en oreden descendente los géneros por numero de canciones

In [5]:
df_generos = df_final.assign(Genero=df_final['Genero'].str.split(', ')).explode('Genero')



In [6]:
# Agrupar por 'Genero' y contar el número de tracks
df_generos_count = df_generos.groupby('Genero', as_index=False).size()

# Renombrar las columnas para mayor claridad
df_generos_count.columns = ['Genero', 'Num_Tracks']

# Ordenar por número de tracks en orden descendente
df_generos_count = df_generos_count.sort_values(by='Num_Tracks', ascending=False)

# Reiniciar el índice para un formato limpio
df_generos_count.reset_index(drop=True, inplace=True)

# Mostrar el DataFrame
print(df_generos_count)



                           Genero  Num_Tracks
0                            rock        1565
1                             pop        1549
2                             rap        1184
3                       dance pop         887
4                         hip hop         816
...                           ...         ...
1107          neue deutsche harte           3
1108                uk doom metal           2
1109  australian children's music           2
1110                   afghan pop           2
1111                rock keyboard           1

[1112 rows x 2 columns]


In [7]:
df_generos_count.head(20)

Unnamed: 0,Genero,Num_Tracks
0,rock,1565
1,pop,1549
2,rap,1184
3,dance pop,887
4,hip hop,816
5,latin pop,777
6,classic rock,773
7,filmi,684
8,soft rock,683
9,album rock,650


In [8]:
df_final['Genero'] = df_final['Genero'].apply(
    lambda x: [genero.strip() for genero in x.split(',')]  # Elimina espacios
    if isinstance(x, str) 
    else []  # Maneja NaN u otros tipos (asignando lista vacía)
)

In [9]:
df_final['Genero']

0            [italian hip hop, italian pop, trap italiana]
1                                                       []
2        [ambient, braindance, electronica, intelligent...
3                         [jam band, neo mellow, pop rock]
4                      [argentine hip hop, trap argentino]
                               ...                        
16861                                      [j-pop, j-rock]
16862                                              [k-pop]
16863                                  [k-pop, korean r&b]
16864                            [k-pop, k-pop girl group]
16865                                  [k-pop, korean r&b]
Name: Genero, Length: 16866, dtype: object

Esta función me ofrece si hay un género específico en la lista de generos

In [10]:
def busca_genero(genero_buscado, lista_generos):
    # Si el valor es NaN (tipo float), retornar 0
    if isinstance(lista_generos, float):
        return 0
    # Si es una lista o cadena, verificar el género
    if genero_buscado in lista_generos:
        return 1
    else:
        return 0

Vamos a analizar los modelos más importantes para los 20 principales géneros utilizando LazyClassifier

In [12]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier

# Lista de generos a evaluar
generos = df_generos_count.head(20)['Genero']

# Diccionario para almacenar resultados
test = {}

for genero in generos:
    # Filtrar por genero y hacer una copia para evitar errores
    
    df_temp = df_final.copy()
    df_temp[genero] = df_temp['Genero'].apply(lambda x: busca_genero(genero, x))
    df_gen = df_temp[df_temp[genero] == 1].copy()
    
    
    

    # Calcular el tercer cuartil (Q3)
    q3 = df_gen['Stream'].quantile(0.75)

    # Definir las categorías
    bins = [0, q3, df_gen['Stream'].max()]
    labels = [0, 1]

    df_gen['Categoria'] = pd.cut(df_gen['Stream'], bins=bins, labels=labels, include_lowest=True)

    # Verificar si hay suficientes muestras en la categoría 1
    count_cat1 = df_gen['Categoria'].value_counts().get(1, 0)

    if count_cat1 == 0:
        print(f"⚠️ genero {genero}: No hay suficientes muestras en la categoría 1, saltando...")
        continue

    # Filtrar por categorías
    categoria_0 = df_gen[df_gen['Categoria'] == 0]
    categoria_1 = df_gen[df_gen['Categoria'] == 1]

    # Muestrear la categoría 0 con el mismo número de elementos que la categoría 1
    categoria_0_sample = categoria_0.sample(n=min(count_cat1, len(categoria_0)), random_state=42)

    # Concatenar para obtener un dataset balanceado
    df_balanceado = pd.concat([categoria_0_sample, categoria_1]).sample(frac=1, random_state=42).reset_index(drop=True)

    # Definir variables predictoras y objetivo
    X = df_balanceado[['Acousticness', 'Danceability', 'Duration_min', 'Energy',
                       'Instrumentalness', 'Key', 'Liveness', 'Loudness',
                       'Speechiness', 'Tempo', 'Valence']]
    y = df_balanceado['Categoria']

    # Dividir en entrenamiento y prueba
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Evaluación con LazyClassifier
    lazy_clf = LazyClassifier(verbose=0, ignore_warnings=False, custom_metric=None)
    models, predictions = lazy_clf.fit(X_train, X_test, y_train, y_test)

    # Guardar resultados en el diccionario
    test[genero] = models

print("✅ Evaluación completada para todos los generos.")

 28%|██▊       | 9/32 [00:00<00:00, 24.01it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


 72%|███████▏  | 23/32 [00:00<00:00, 25.41it/s]

SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:01<00:00, 21.67it/s]


XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 314, number of negative: 311
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001346 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2071
[LightGBM] [Info] Number of data points in the train set: 625, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.502400 -> initscore=0.009600
[LightGBM] [Info] Start training from score 0.009600


 28%|██▊       | 9/32 [00:00<00:00, 24.92it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


 72%|███████▏  | 23/32 [00:00<00:00, 26.01it/s]

SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:01<00:00, 23.94it/s]


XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 312, number of negative: 307
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000091 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1956
[LightGBM] [Info] Number of data points in the train set: 619, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.504039 -> initscore=0.016155
[LightGBM] [Info] Start training from score 0.016155


 28%|██▊       | 9/32 [00:00<00:00, 26.24it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 34.62it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 241, number of negative: 232
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000106 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1442
[LightGBM] [Info] Number of data points in the train set: 473, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.509514 -> initscore=0.038060
[LightGBM] [Info] Start training from score 0.038060


 31%|███▏      | 10/32 [00:00<00:00, 30.67it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 36.71it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 182, number of negative: 173
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000060 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1105
[LightGBM] [Info] Number of data points in the train set: 355, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.512676 -> initscore=0.050715
[LightGBM] [Info] Start training from score 0.050715


 28%|██▊       | 9/32 [00:00<00:00, 29.32it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 38.45it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 176, number of negative: 150
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000057 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1001
[LightGBM] [Info] Number of data points in the train set: 326, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.539877 -> initscore=0.159849
[LightGBM] [Info] Start training from score 0.159849


 28%|██▊       | 9/32 [00:00<00:00, 28.85it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 38.23it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 159, number of negative: 151
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000059 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 958
[LightGBM] [Info] Number of data points in the train set: 310, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.512903 -> initscore=0.051624
[LightGBM] [Info] Start training from score 0.051624


 28%|██▊       | 9/32 [00:00<00:00, 30.63it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 33.28it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 151, number of negative: 157
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000052 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 986
[LightGBM] [Info] Number of data points in the train set: 308, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.490260 -> initscore=-0.038966
[LightGBM] [Info] Start training from score -0.038966


 28%|██▊       | 9/32 [00:00<00:00, 31.45it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 39.51it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 132, number of negative: 141
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000044 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 873
[LightGBM] [Info] Number of data points in the train set: 273, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.483516 -> initscore=-0.065958
[LightGBM] [Info] Start training from score -0.065958


 28%|██▊       | 9/32 [00:00<00:00, 31.00it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


 97%|█████████▋| 31/32 [00:00<00:00, 34.08it/s]

SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 132, number of negative: 141
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000075 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 878
[LightGBM] [Info] Number of data points in the train set: 273, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.483516 -> initscore=-0.065958
[LightGBM] [Info] Start train

100%|██████████| 32/32 [00:00<00:00, 33.79it/s]




 34%|███▍      | 11/32 [00:00<00:00, 35.95it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 40.46it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 127, number of negative: 133
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000049 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 842
[LightGBM] [Info] Number of data points in the train set: 260, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.488462 -> initscore=-0.046162
[LightGBM] [Info] Start training from score -0.046162


 31%|███▏      | 10/32 [00:00<00:00, 33.08it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 40.35it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 123, number of negative: 136
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000043 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 798
[LightGBM] [Info] Number of data points in the train set: 259, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.474903 -> initscore=-0.100471
[LightGBM] [Info] Start training from score -0.100471


 28%|██▊       | 9/32 [00:00<00:00, 31.81it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 40.99it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 126, number of negative: 126
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000042 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 786
[LightGBM] [Info] Number of data points in the train set: 252, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000


 28%|██▊       | 9/32 [00:00<00:00, 30.94it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 37.74it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 116, number of negative: 124
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000056 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 770
[LightGBM] [Info] Number of data points in the train set: 240, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.483333 -> initscore=-0.066691
[LightGBM] [Info] Start training from score -0.066691


 28%|██▊       | 9/32 [00:00<00:00, 32.10it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 41.49it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 110, number of negative: 125
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000061 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 761
[LightGBM] [Info] Number of data points in the train set: 235, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.468085 -> initscore=-0.127833
[LightGBM] [Info] Start training from score -0.127833


 28%|██▊       | 9/32 [00:00<00:00, 30.86it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 36.96it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 114, number of negative: 118
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000050 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 725
[LightGBM] [Info] Number of data points in the train set: 232, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.491379 -> initscore=-0.034486
[LightGBM] [Info] Start training from score -0.034486


 31%|███▏      | 10/32 [00:00<00:00, 33.96it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 40.84it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 116, number of negative: 114
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000068 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 715
[LightGBM] [Info] Number of data points in the train set: 230, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.504348 -> initscore=0.017392
[LightGBM] [Info] Start training from score 0.017392


 34%|███▍      | 11/32 [00:00<00:00, 36.08it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 41.03it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 113, number of negative: 112
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000041 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 738
[LightGBM] [Info] Number of data points in the train set: 225, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.502222 -> initscore=0.008889
[LightGBM] [Info] Start training from score 0.008889


 28%|██▊       | 9/32 [00:00<00:00, 31.97it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 41.83it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 105, number of negative: 111
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000043 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 702
[LightGBM] [Info] Number of data points in the train set: 216, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.486111 -> initscore=-0.055570
[LightGBM] [Info] Start training from score -0.055570


 41%|████      | 13/32 [00:00<00:00, 38.70it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 39.41it/s]


SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 105, number of negative: 99
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000045 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 646
[LightGBM] [Info] Number of data points in the train set: 204, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.514706 -> initscore=0.058841
[LightGBM] [Info] Start training from score 0.058841


 28%|██▊       | 9/32 [00:00<00:00, 30.18it/s]

CategoricalNB model failed to execute
Negative values in data passed to CategoricalNB (input X).
FixedThresholdClassifier model failed to execute
FixedThresholdClassifier.__init__() missing 1 required positional argument: 'estimator'


100%|██████████| 32/32 [00:00<00:00, 40.88it/s]

SelfTrainingClassifier model failed to execute
You must pass an estimator to SelfTrainingClassifier. Use `estimator`.
StackingClassifier model failed to execute
StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
TunedThresholdClassifierCV model failed to execute
TunedThresholdClassifierCV.__init__() missing 1 required positional argument: 'estimator'
XGBClassifier model failed to execute
'super' object has no attribute '__sklearn_tags__'
[LightGBM] [Info] Number of positive: 98, number of negative: 102
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000036 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 644
[LightGBM] [Info] Number of data points in the train set: 200, number of used features: 11
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.490000 -> initscore=-0.040005
[LightGBM] [Info] Start training from score -0.040005
✅ Evaluación completada para todos los g




In [14]:
for i in test.keys():
    print(f'Modelos del género {i}')
    display(test[i])  # Esto imprimirá los resultados de LazyClassifier

Modelos del género rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NearestCentroid,0.6,0.6,0.6,0.6,0.02
LinearSVC,0.59,0.59,0.59,0.59,0.02
RidgeClassifier,0.59,0.59,0.59,0.59,0.01
LinearDiscriminantAnalysis,0.59,0.59,0.59,0.59,0.04
RidgeClassifierCV,0.59,0.59,0.59,0.59,0.01
CalibratedClassifierCV,0.59,0.59,0.59,0.59,0.06
LogisticRegression,0.58,0.58,0.58,0.58,0.02
AdaBoostClassifier,0.58,0.58,0.58,0.58,0.11
GaussianNB,0.56,0.57,0.57,0.53,0.01
BernoulliNB,0.53,0.53,0.53,0.53,0.02


Modelos del género pop


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LGBMClassifier,0.61,0.61,0.61,0.61,0.05
SVC,0.61,0.61,0.61,0.61,0.02
DecisionTreeClassifier,0.61,0.61,0.61,0.61,0.02
QuadraticDiscriminantAnalysis,0.6,0.61,0.61,0.59,0.01
NuSVC,0.6,0.6,0.6,0.6,0.04
RidgeClassifier,0.6,0.6,0.6,0.6,0.01
LinearSVC,0.6,0.6,0.6,0.6,0.02
LinearDiscriminantAnalysis,0.6,0.6,0.6,0.6,0.03
LogisticRegression,0.59,0.59,0.59,0.59,0.03
RidgeClassifierCV,0.59,0.59,0.59,0.59,0.02


Modelos del género rap


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BernoulliNB,0.55,0.56,0.56,0.54,0.01
DecisionTreeClassifier,0.53,0.53,0.53,0.53,0.02
ExtraTreesClassifier,0.52,0.52,0.52,0.52,0.1
QuadraticDiscriminantAnalysis,0.53,0.52,0.52,0.52,0.01
SGDClassifier,0.5,0.51,0.51,0.47,0.01
BaggingClassifier,0.51,0.51,0.51,0.51,0.04
SVC,0.5,0.5,0.5,0.5,0.02
DummyClassifier,0.46,0.5,0.5,0.29,0.01
LGBMClassifier,0.5,0.5,0.5,0.5,0.04
RandomForestClassifier,0.5,0.5,0.5,0.5,0.16


Modelos del género dance pop


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LabelSpreading,0.61,0.62,0.62,0.6,0.02
LabelPropagation,0.6,0.6,0.6,0.59,0.02
KNeighborsClassifier,0.55,0.57,0.57,0.53,0.01
NuSVC,0.55,0.57,0.57,0.54,0.02
RandomForestClassifier,0.54,0.55,0.55,0.53,0.14
ExtraTreeClassifier,0.54,0.54,0.54,0.54,0.01
LGBMClassifier,0.53,0.54,0.54,0.52,0.03
PassiveAggressiveClassifier,0.51,0.52,0.52,0.49,0.01
AdaBoostClassifier,0.49,0.51,0.51,0.49,0.12
SVC,0.48,0.5,0.5,0.47,0.01


Modelos del género hip hop


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AdaBoostClassifier,0.55,0.59,0.59,0.56,0.09
ExtraTreeClassifier,0.52,0.55,0.55,0.53,0.01
QuadraticDiscriminantAnalysis,0.41,0.54,0.54,0.34,0.02
BaggingClassifier,0.54,0.54,0.54,0.55,0.04
SGDClassifier,0.5,0.53,0.53,0.51,0.01
NearestCentroid,0.52,0.53,0.53,0.54,0.02
RandomForestClassifier,0.5,0.53,0.53,0.51,0.13
RidgeClassifierCV,0.49,0.53,0.53,0.49,0.01
BernoulliNB,0.48,0.52,0.52,0.48,0.02
SVC,0.49,0.52,0.52,0.5,0.01


Modelos del género latin pop


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RandomForestClassifier,0.63,0.63,0.63,0.63,0.13
LinearSVC,0.62,0.62,0.62,0.62,0.01
LinearDiscriminantAnalysis,0.62,0.62,0.62,0.62,0.02
LogisticRegression,0.62,0.62,0.62,0.62,0.05
RidgeClassifierCV,0.62,0.62,0.62,0.62,0.01
RidgeClassifier,0.62,0.62,0.62,0.62,0.01
CalibratedClassifierCV,0.6,0.6,0.6,0.6,0.03
ExtraTreesClassifier,0.59,0.59,0.59,0.59,0.1
AdaBoostClassifier,0.58,0.58,0.58,0.58,0.1
KNeighborsClassifier,0.56,0.58,0.58,0.56,0.02


Modelos del género classic rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BernoulliNB,0.58,0.58,0.58,0.58,0.01
LinearSVC,0.55,0.56,0.56,0.55,0.11
RidgeClassifier,0.55,0.56,0.56,0.55,0.01
LogisticRegression,0.55,0.56,0.56,0.55,0.02
LinearDiscriminantAnalysis,0.55,0.56,0.56,0.55,0.02
RidgeClassifierCV,0.55,0.56,0.56,0.55,0.01
AdaBoostClassifier,0.55,0.56,0.56,0.55,0.1
PassiveAggressiveClassifier,0.54,0.54,0.54,0.54,0.02
BaggingClassifier,0.53,0.54,0.54,0.52,0.04
GaussianNB,0.53,0.53,0.53,0.53,0.01


Modelos del género filmi


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ExtraTreesClassifier,0.68,0.67,0.67,0.68,0.09
CalibratedClassifierCV,0.67,0.66,0.66,0.67,0.03
LogisticRegression,0.67,0.66,0.66,0.67,0.06
LinearSVC,0.67,0.66,0.66,0.67,0.01
AdaBoostClassifier,0.64,0.64,0.64,0.64,0.09
RandomForestClassifier,0.64,0.64,0.64,0.64,0.12
LGBMClassifier,0.64,0.63,0.63,0.64,0.03
NearestCentroid,0.65,0.63,0.63,0.64,0.01
RidgeClassifierCV,0.62,0.62,0.62,0.62,0.01
SVC,0.62,0.62,0.62,0.62,0.01


Modelos del género soft rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LGBMClassifier,0.65,0.66,0.66,0.65,0.04
ExtraTreeClassifier,0.67,0.66,0.66,0.66,0.01
AdaBoostClassifier,0.62,0.62,0.62,0.62,0.08
LabelPropagation,0.57,0.57,0.57,0.57,0.02
LabelSpreading,0.57,0.57,0.57,0.57,0.01
DecisionTreeClassifier,0.57,0.56,0.56,0.57,0.02
NearestCentroid,0.57,0.55,0.55,0.56,0.02
CalibratedClassifierCV,0.49,0.55,0.55,0.4,0.03
KNeighborsClassifier,0.55,0.54,0.54,0.55,0.02
PassiveAggressiveClassifier,0.52,0.54,0.54,0.51,0.02


Modelos del género album rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NuSVC,0.53,0.54,0.54,0.53,0.02
ExtraTreeClassifier,0.52,0.52,0.52,0.52,0.01
CalibratedClassifierCV,0.48,0.51,0.51,0.43,0.03
DummyClassifier,0.45,0.5,0.5,0.28,0.01
BaggingClassifier,0.48,0.5,0.5,0.48,0.04
GaussianNB,0.48,0.49,0.49,0.48,0.01
KNeighborsClassifier,0.48,0.49,0.49,0.48,0.02
NearestCentroid,0.5,0.48,0.48,0.48,0.02
PassiveAggressiveClassifier,0.47,0.48,0.48,0.46,0.02
LGBMClassifier,0.47,0.47,0.47,0.47,0.03


Modelos del género trap


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
PassiveAggressiveClassifier,0.68,0.71,0.71,0.68,0.02
RandomForestClassifier,0.66,0.69,0.69,0.66,0.12
NuSVC,0.66,0.67,0.67,0.66,0.02
LGBMClassifier,0.65,0.66,0.66,0.65,0.03
AdaBoostClassifier,0.62,0.64,0.64,0.62,0.1
LogisticRegression,0.62,0.63,0.63,0.62,0.03
LinearDiscriminantAnalysis,0.62,0.63,0.63,0.62,0.01
LinearSVC,0.62,0.63,0.63,0.62,0.01
RidgeClassifierCV,0.62,0.63,0.63,0.62,0.01
SVC,0.6,0.63,0.63,0.6,0.01


Modelos del género urbano latino


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SVC,0.61,0.61,0.61,0.6,0.01
NearestCentroid,0.61,0.61,0.61,0.61,0.02
LinearSVC,0.59,0.59,0.59,0.59,0.02
LinearDiscriminantAnalysis,0.59,0.59,0.59,0.59,0.01
RidgeClassifierCV,0.59,0.59,0.59,0.59,0.01
RidgeClassifier,0.59,0.59,0.59,0.59,0.01
BernoulliNB,0.59,0.59,0.59,0.59,0.01
LogisticRegression,0.59,0.59,0.59,0.59,0.02
DecisionTreeClassifier,0.56,0.56,0.56,0.56,0.01
CalibratedClassifierCV,0.53,0.53,0.53,0.51,0.03


Modelos del género alternative metal


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GaussianNB,0.6,0.58,0.58,0.59,0.01
QuadraticDiscriminantAnalysis,0.58,0.57,0.57,0.58,0.01
LabelSpreading,0.57,0.56,0.56,0.57,0.01
NearestCentroid,0.57,0.56,0.56,0.57,0.02
RandomForestClassifier,0.57,0.55,0.55,0.56,0.12
LabelPropagation,0.55,0.54,0.54,0.55,0.01
KNeighborsClassifier,0.53,0.54,0.54,0.53,0.02
SVC,0.53,0.54,0.54,0.53,0.01
NuSVC,0.53,0.54,0.54,0.53,0.02
BernoulliNB,0.53,0.52,0.52,0.53,0.01


Modelos del género modern rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
DummyClassifier,0.37,0.5,0.5,0.2,0.01
GaussianNB,0.49,0.49,0.49,0.5,0.01
SVC,0.41,0.49,0.49,0.35,0.01
KNeighborsClassifier,0.46,0.48,0.48,0.46,0.01
ExtraTreeClassifier,0.44,0.47,0.47,0.44,0.01
AdaBoostClassifier,0.44,0.47,0.47,0.44,0.09
BernoulliNB,0.42,0.47,0.47,0.41,0.01
BaggingClassifier,0.39,0.44,0.44,0.37,0.03
LogisticRegression,0.39,0.44,0.44,0.37,0.05
RidgeClassifierCV,0.39,0.44,0.44,0.37,0.01


Modelos del género pop rap


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NearestCentroid,0.67,0.68,0.68,0.67,0.04
AdaBoostClassifier,0.67,0.67,0.67,0.67,0.1
SVC,0.67,0.67,0.67,0.67,0.01
ExtraTreesClassifier,0.67,0.66,0.66,0.67,0.09
RandomForestClassifier,0.66,0.66,0.66,0.65,0.12
CalibratedClassifierCV,0.64,0.64,0.64,0.64,0.03
LinearSVC,0.64,0.64,0.64,0.64,0.01
LogisticRegression,0.64,0.64,0.64,0.64,0.04
RidgeClassifierCV,0.64,0.64,0.64,0.64,0.01
RidgeClassifier,0.64,0.64,0.64,0.64,0.01


Modelos del género musica mexicana


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RandomForestClassifier,0.69,0.69,0.69,0.69,0.12
CalibratedClassifierCV,0.67,0.67,0.67,0.67,0.03
ExtraTreesClassifier,0.67,0.67,0.67,0.67,0.09
BernoulliNB,0.67,0.67,0.67,0.67,0.01
NuSVC,0.67,0.67,0.67,0.67,0.02
LogisticRegression,0.66,0.65,0.65,0.65,0.02
SVC,0.66,0.65,0.65,0.65,0.01
AdaBoostClassifier,0.64,0.64,0.64,0.64,0.1
RidgeClassifierCV,0.64,0.64,0.64,0.64,0.01
PassiveAggressiveClassifier,0.62,0.62,0.62,0.62,0.01


Modelos del género hard rock


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ExtraTreesClassifier,0.65,0.65,0.65,0.65,0.09
LGBMClassifier,0.63,0.63,0.63,0.63,0.03
GaussianNB,0.6,0.6,0.6,0.58,0.01
RandomForestClassifier,0.6,0.6,0.6,0.6,0.13
SVC,0.6,0.6,0.6,0.59,0.01
Perceptron,0.58,0.58,0.58,0.58,0.01
BaggingClassifier,0.58,0.58,0.58,0.58,0.04
DecisionTreeClassifier,0.56,0.56,0.56,0.56,0.01
SGDClassifier,0.54,0.55,0.55,0.54,0.01
ExtraTreeClassifier,0.54,0.54,0.54,0.54,0.01


Modelos del género mellow gold


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NuSVC,0.57,0.58,0.58,0.57,0.02
LGBMClassifier,0.56,0.57,0.57,0.55,0.03
QuadraticDiscriminantAnalysis,0.57,0.56,0.56,0.57,0.01
ExtraTreesClassifier,0.54,0.55,0.55,0.54,0.08
KNeighborsClassifier,0.54,0.54,0.54,0.54,0.01
BernoulliNB,0.52,0.53,0.53,0.52,0.01
DecisionTreeClassifier,0.52,0.53,0.53,0.52,0.01
AdaBoostClassifier,0.5,0.51,0.51,0.5,0.1
SVC,0.5,0.5,0.5,0.5,0.01
CalibratedClassifierCV,0.5,0.5,0.5,0.5,0.04


Modelos del género reggaeton


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AdaBoostClassifier,0.63,0.64,0.64,0.64,0.09
DecisionTreeClassifier,0.6,0.58,0.58,0.59,0.02
LGBMClassifier,0.58,0.57,0.57,0.58,0.03
SVC,0.56,0.57,0.57,0.56,0.01
PassiveAggressiveClassifier,0.58,0.56,0.56,0.57,0.02
NearestCentroid,0.56,0.56,0.56,0.56,0.02
RandomForestClassifier,0.56,0.55,0.55,0.56,0.12
BaggingClassifier,0.56,0.55,0.55,0.56,0.03
QuadraticDiscriminantAnalysis,0.5,0.54,0.54,0.43,0.01
LabelPropagation,0.54,0.54,0.54,0.54,0.01


Modelos del género r&b


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RandomForestClassifier,0.62,0.62,0.62,0.62,0.12
AdaBoostClassifier,0.6,0.6,0.6,0.6,0.09
QuadraticDiscriminantAnalysis,0.6,0.59,0.59,0.59,0.01
Perceptron,0.58,0.58,0.58,0.58,0.01
LogisticRegression,0.54,0.55,0.55,0.54,0.02
RidgeClassifierCV,0.54,0.55,0.55,0.54,0.01
ExtraTreesClassifier,0.54,0.54,0.54,0.54,0.09
SVC,0.52,0.52,0.52,0.52,0.01
LabelPropagation,0.5,0.52,0.52,0.48,0.02
LabelSpreading,0.5,0.52,0.52,0.48,0.01


## Genero Filmi

In [15]:
genero='filmi'
df_temp = df_final.copy()
df_temp[genero] = df_temp['Genero'].apply(lambda x: busca_genero(genero, x))
df_gen = df_temp[df_temp[genero] == 1].copy()
    
    
    

    # Calcular el tercer cuartil (Q3)
q3 = df_gen['Stream'].quantile(0.75)

    # Definir las categorías
bins = [0, q3, df_gen['Stream'].max()]
labels = [0, 1]

df_gen['Categoria'] = pd.cut(df_gen['Stream'], bins=bins, labels=labels, include_lowest=True)

    # Verificar si hay suficientes muestras en la categoría 1
count_cat1 = df_gen['Categoria'].value_counts().get(1, 0)

  

    # Filtrar por categorías
categoria_0 = df_gen[df_gen['Categoria'] == 0]
categoria_1 = df_gen[df_gen['Categoria'] == 1]

    # Muestrear la categoría 0 con el mismo número de elementos que la categoría 1
categoria_0_sample = categoria_0.sample(n=min(count_cat1, len(categoria_0)), random_state=42)

    # Concatenar para obtener un dataset balanceado
df_balanceado = pd.concat([categoria_0_sample, categoria_1]).sample(frac=1, random_state=42).reset_index(drop=True)

    # Definir variables predictoras y objetivo
X = df_balanceado[['Acousticness', 'Danceability', 'Duration_min', 'Energy',
                       'Instrumentalness', 'Key', 'Liveness', 'Loudness',
                       'Speechiness', 'Tempo', 'Valence']]
y = df_balanceado['Categoria']

    # Dividir en entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [20]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd



# Definir el modelo
clf = ExtraTreesClassifier(random_state=42)

# Definir el grid de hiperparámetros a probar
param_grid = {
    'n_estimators': [50, 100, 200],  # Número de árboles
    'max_depth': [None, 10, 20, 30],  # Profundidad máxima
    'min_samples_split': [2, 5, 10],  # Mínimo de muestras para dividir
    'min_samples_leaf': [1, 2, 4],  # Mínimo de muestras por hoja
    'criterion': ['gini', 'entropy'],  # Función de medición
}

# Configurar GridSearchCV
grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, 
                           cv=5, n_jobs=-1, verbose=1, scoring='accuracy')

# Ajustar el modelo a los datos
grid_search.fit(X_train, y_train)

# Mejor combinación de hiperparámetros
print("Mejores parámetros encontrados:", grid_search.best_params_)

# Evaluar en el conjunto de prueba
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Precisión en el conjunto de prueba:", accuracy)
# Obtener los mejores hiperparámetros y evaluar en el conjunto de prueba
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Mostrar los mejores hiperparámetros encontrados
print("Mejores hiperparámetros:", grid_search.best_params_)

# Evaluar el modelo con las métricas de clasificación
print("\nInforme de clasificación:")
print(classification_report(y_test, y_pred))

Fitting 5 folds for each of 216 candidates, totalling 1080 fits
Mejores parámetros encontrados: {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 10, 'n_estimators': 50}
Precisión en el conjunto de prueba: 0.5652173913043478
Mejores hiperparámetros: {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 10, 'n_estimators': 50}

Informe de clasificación:
              precision    recall  f1-score   support

           0       0.50      0.53      0.52        30
           1       0.62      0.59      0.61        39

    accuracy                           0.57        69
   macro avg       0.56      0.56      0.56        69
weighted avg       0.57      0.57      0.57        69



Conclusiones del Modelo ExtraTreesClassifier
1️⃣ El modelo tiene un rendimiento moderado con una precisión del 56.5%.

La precisión general sigue estando baja, lo que sugiere que el modelo no está diferenciando bien entre clases.
La precisión de la clase 1 es de 0.62, lo que indica que el modelo clasifica bien los casos positivos, pero aún comete errores.
El recall de la clase 1 es 0.59, lo que significa que el modelo detecta un 59% de los ejemplos reales de la clase 1.

2️⃣ El modelo detecta mejor la clase 1 que la clase 0.

La clase 1 tiene un mejor f1-score (0.61) en comparación con la clase 0 (0.52).
La clase 0 tiene una precisión de 0.50 y un recall de 0.53, lo que indica que el modelo tiene problemas para identificar correctamente esta clase.
Esto sugiere que el modelo está sesgado hacia la clase 1, pero sigue sin ser suficientemente preciso.

## Rock
Vamos a usar el NearestCentroid

In [26]:
genero='rock'
df_temp = df_final.copy()
df_temp[genero] = df_temp['Genero'].apply(lambda x: busca_genero(genero, x))
df_gen = df_temp[df_temp[genero] == 1].copy()
    
    
    

    # Calcular el tercer cuartil (Q3)
q3 = df_gen['Stream'].quantile(0.75)

    # Definir las categorías
bins = [0, q3, df_gen['Stream'].max()]
labels = [0, 1]

df_gen['Categoria'] = pd.cut(df_gen['Stream'], bins=bins, labels=labels, include_lowest=True)

    # Verificar si hay suficientes muestras en la categoría 1
count_cat1 = df_gen['Categoria'].value_counts().get(1, 0)

  

    # Filtrar por categorías
categoria_0 = df_gen[df_gen['Categoria'] == 0]
categoria_1 = df_gen[df_gen['Categoria'] == 1]

    # Muestrear la categoría 0 con el mismo número de elementos que la categoría 1
categoria_0_sample = categoria_0.sample(n=min(count_cat1, len(categoria_0)), random_state=42)

    # Concatenar para obtener un dataset balanceado
df_balanceado = pd.concat([categoria_0_sample, categoria_1]).sample(frac=1, random_state=42).reset_index(drop=True)

    # Definir variables predictoras y objetivo
X = df_balanceado[['Acousticness', 'Danceability', 'Duration_min', 'Energy',
                       'Instrumentalness', 'Key', 'Liveness', 'Loudness',
                       'Speechiness', 'Tempo', 'Valence']]
y = df_balanceado['Categoria']

    # Dividir en entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [29]:
from sklearn.neighbors import NearestCentroid
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import classification_report, accuracy_score
import numpy as np



# Definir el modelo NearestCentroid
clf = NearestCentroid()

# Definir el grid de hiperparámetros a probar
param_grid = {
    'metric': ['euclidean', 'manhattan', 'cosine'],  # Distancias a probar
    'shrink_threshold': [None, 0.1, 0.5, 1.0, 2.0]  # Regularización (None significa sin regularización)
}

# Configurar GridSearchCV
grid_search = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    cv=5,
    n_jobs=-1,
    verbose=1,
    scoring='f1'  # Puedes cambiar a 'accuracy', 'recall', etc.
)

# Ajustar el modelo a los datos
grid_search.fit(X_train, y_train)

# 🔹 Mejor combinación de hiperparámetros
print("Mejores parámetros encontrados:", grid_search.best_params_)

# Evaluar en el conjunto de prueba
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# 🔹 Reporte de métricas
print("\nInforme de clasificación:")
print(classification_report(y_test, y_pred))
print("Precisión en el conjunto de prueba:", accuracy_score(y_test, y_pred))


Fitting 5 folds for each of 15 candidates, totalling 75 fits
Mejores parámetros encontrados: {'metric': 'euclidean', 'shrink_threshold': 2.0}

Informe de clasificación:
              precision    recall  f1-score   support

           0       0.59      0.49      0.53        80
           1       0.55      0.65      0.60        77

    accuracy                           0.57       157
   macro avg       0.57      0.57      0.56       157
weighted avg       0.57      0.57      0.56       157

Precisión en el conjunto de prueba: 0.5668789808917197


In [30]:
from sklearn.neighbors import NearestCentroid
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler
import numpy as np

# Normalizar los datos antes de usar NearestCentroid
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Definir el modelo NearestCentroid
clf = NearestCentroid()

# Definir el grid de hiperparámetros a probar
param_grid = {
    'metric': ['euclidean', 'manhattan', 'cosine'],  
    'shrink_threshold': [0.0, 0.1, 0.5, 1.0, 2.0]  
}

# Configurar GridSearchCV optimizando el recall de la clase 1
grid_search = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    cv=5,
    n_jobs=-1,
    verbose=1,
    scoring='recall'
)

# Ajustar el modelo a los datos
grid_search.fit(X_train, y_train)

# 🔹 Mejor combinación de hiperparámetros
print("Mejores parámetros encontrados:", grid_search.best_params_)

# Evaluar en el conjunto de prueba
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# 🔹 Reporte de métricas
print("\nInforme de clasificación:")
print(classification_report(y_test, y_pred))
print("Precisión en el conjunto de prueba:", accuracy_score(y_test, y_pred))


Fitting 5 folds for each of 15 candidates, totalling 75 fits
Mejores parámetros encontrados: {'metric': 'euclidean', 'shrink_threshold': 0.1}

Informe de clasificación:
              precision    recall  f1-score   support

           0       0.61      0.55      0.58        80
           1       0.58      0.64      0.60        77

    accuracy                           0.59       157
   macro avg       0.59      0.59      0.59       157
weighted avg       0.59      0.59      0.59       157

Precisión en el conjunto de prueba: 0.5923566878980892


 Conclusiones del Modelo NearestCentroid

1️⃣ El modelo alcanzó una precisión global del 59.2%.

La accuracy del modelo es 0.5923, lo que indica que acierta en aproximadamente 59% de los casos.
No es un resultado excelente, pero puede ser aceptable dependiendo del problema y la distribución de datos.

2️⃣ El modelo tiene un desempeño similar en ambas clases.

Clase 0: Precisión 0.61, Recall 0.55, F1-score 0.58.
Clase 1: Precisión 0.58, Recall 0.64, F1-score 0.60.
La diferencia entre las clases no es grande, pero la clase 1 tiene mejor recall (0.64), lo que significa que el modelo detecta más casos positivos que negativos.