This notebook continued off of rough_rag, and implements a prediction model for populairty prediction. The model uses the target lyric embeddings, plus the embeddings, popularity, and librosa collect audio features of its three nearest embedding neighbors by FAISS similarity search. We also use SHAP to reveal the impact of certain features for each prediction, and adding it to our retreval content for our llm to better explain.

There is sitll work to be done:

- we dont test the LightBGM model or show performace. we should review its accuracy and other metrics for justifiability
- we edited the prompt for new context, also added parts to not mention the limitations of a lyric-only prediction for popularity. work can still be done
- we also need to include song level meta data (genre, year, other things)
- deep eval to evaluate the perforamnce of RAG
- maybe the llm can become a critiquer over an explainer. reprompt in order to get suggetions and reflect the results of our methods for prediction popularity
    - this would require the llm to better learn how to make the lyrics better (llm could also give counterfactual lyrics, generating better lyrics that do actually increase popularity)
- other future work involve allowing the llm to generate counterfactual explanations, include average popularity of neighbors, similarity-weighted popularity, and variance of neighbor popularity in prediction, SHAP visualizations, streamlit ui

### Downloading the model first

In [103]:
import os, certifi
os.environ["SSL_CERT_FILE"] = certifi.where()
os.environ["REQUESTS_CA_BUNDLE"] = certifi.where()

In [104]:
from sentence_transformers import SentenceTransformer
import tqdm as tqdm

model = SentenceTransformer("sentence-transformers/distiluse-base-multilingual-cased-v2")
print("Model loaded")

Model loaded


### Reading in data and making df

In [105]:
import numpy as np
import pandas as pd
from typing import List, Dict, Any

df1 = pd.read_parquet("lyric_embeddings/librosa_shard_0.parquet")
df2 = pd.read_parquet("lyric_embeddings/librosa_shard_1.parquet")
df3 = pd.read_parquet("lyric_embeddings/librosa_shard_2.parquet")
df4 = pd.read_parquet("lyric_embeddings/librosa_shard_3.parquet")
df5 = pd.read_parquet("lyric_embeddings/librosa_shard_4.parquet")

df = pd.concat([df1, df2, df3, df4, df5])

### REAL embedding functions

In [106]:
import faiss

emb_list = [np.asarray(x, dtype="float32") for x in df["lyrics_embedding"].values]
emb_matrix = np.stack(emb_list, axis=0)
dimension = emb_matrix.shape[1]
index = faiss.IndexFlatL2(dimension)

index.add(emb_matrix.astype("float32"))

print("FAISS index built with", index.ntotal, "vectors.")

def retrieve_similar_songs(query_embedding: np.ndarray, k: int = 5) -> List[Dict[str, Any]]:
    query_vector = np.array([query_embedding]).astype('float32')
    D, I = index.search(query_vector, k)

    neighbors = []
    for idx, dist in zip(I[0], D[0]):
        if idx != -1:
            neighbors.append({
                "index": int(idx),
                "similarity": float(dist)
            })

    return neighbors


FAISS index built with 20740 vectors.


In [107]:
import re
import unicodedata

def clean_lyrics_for_query(text: str) -> str:

    if not isinstance(text, str):
        return ""

    text = text.lower()

    # remove headers like [chorus], [verse 1], etc.
    text = re.sub(r"\[.*?\]", " ", text)

    # handle real and escaped newlines
    text = text.replace("\\n", " ").replace("\n", " ")

    # remove (prod. ...), (remix ...)
    text = re.sub(r"\(.*?prod.*?\)", " ", text)
    text = re.sub(r"\(.*?remix.*?\)", " ", text)

    # remove x2, x3, etc.
    text = re.sub(r"\bx\d+\b", " ", text)

    # keep letters (any language), numbers, spaces, apostrophes
    chars = []
    for ch in text:
        cat = unicodedata.category(ch)
        if cat.startswith("L") or cat.startswith("N") or ch in [" ", "'", "’"]:
            chars.append(ch)

    text = "".join(chars)

    # collapse multiple spaces
    text = re.sub(r"\s+", " ", text).strip()
    return text


def embed_lyrics(text: str) -> np.ndarray:
    cleaned = clean_lyrics_for_query(text)
    emb = model.encode(
        [cleaned],
        convert_to_numpy=True,
        normalize_embeddings=True
    )

    vec = emb[0]

    vec = np.asarray(vec, dtype="float32").reshape(-1)

    return vec



In [108]:
audio_feature_cols = df.columns[df.columns.get_loc("duration") : df.columns.get_loc("tonnetz_6") + 1].tolist()
df.head()

Unnamed: 0,song_id,title,artist,query_title,query_artist,track_genre,popularity,lyrics,preview_url,track_id,...,spectral_contrast_6,spectral_contrast_7,tonnetz_1,tonnetz_2,tonnetz_3,tonnetz_4,tonnetz_5,tonnetz_6,lyrics_clean,lyrics_embedding
0,4845,State of Mind,Scooter,state of mind,scooter,happy,24.0,The world seems not the same...\n\nIntroducing...,https://audio-ssl.itunes.apple.com/itunes-asse...,1692327616,...,18.328021,39.053367,0.197966,-0.116721,0.142559,-0.069539,-0.044986,-0.047523,the world seems not the same introducing twist...,"[0.07519827783107758, -0.023364899680018425, -..."
1,462,Reptilia,The Strokes,reptilia,the strokes,alt-rock,75.0,[Verse 1]\nHe seemed impressed by the way you ...,https://audio-ssl.itunes.apple.com/itunes-asse...,302987569,...,17.382681,39.012014,0.078138,-0.077754,0.063345,0.036541,-0.011976,-0.014041,he seemed impressed by the way you came in tel...,"[-0.08670999109745026, -0.025700576603412628, ..."
2,16017,None Of My Business,Cher Lloyd,none of my business,cher lloyd,electro,64.0,"[Chorus]\nDamn, I heard that you and her been ...",https://audio-ssl.itunes.apple.com/itunes-asse...,1438630505,...,18.248683,39.966514,0.013912,0.1729,-0.092766,-0.056323,-0.004173,-0.014388,damn i heard that you and her been having prob...,"[0.01792941242456436, 0.001567921251989901, 0...."
3,9478,Trouble Sleeping,The Perishers,trouble sleeping,the perishers,acoustic,48.0,I'm having trouble sleeping\nYou're jumping in...,https://audio-ssl.itunes.apple.com/itunes-asse...,89335271,...,16.969837,28.947224,-0.118755,0.195544,0.025169,-0.130705,0.024176,0.005865,i'm having trouble sleeping you're jumping in ...,"[0.012034112587571144, -0.0008498362149111927,..."
4,2822,Shot in the Dark,Ozzy Osbourne,shot in the dark,ozzy osbourne,hard-rock,65.0,[Verse 1]\nOut on the streets I'm stalking the...,https://audio-ssl.itunes.apple.com/itunes-asse...,158711416,...,17.184653,35.540522,-0.113671,0.023209,-0.029743,-0.051142,0.003486,-0.011837,out on the streets i'm stalking the night i ca...,"[-0.05440174415707588, 0.0212415661662817, -0...."


In [109]:
def construct_feature_vector(
    target_embedding: np.ndarray,
    neighbors: List[Dict[str, Any]],
    audio_feature_cols: List[str],
    k: int = 3
) -> np.ndarray:

    vec = []

    target_embedding = np.asarray(target_embedding, dtype="float32").reshape(-1)
    EMB_DIM = target_embedding.shape[0]
    vec.extend(target_embedding.tolist())

    AUDIO_DIM = len(audio_feature_cols)
    NEIGHBOR_BLOCK = EMB_DIM + 2 + AUDIO_DIM  # emb + similarity + popularity + audio to ensure they all the same size

    for i in range(k):
        if i < len(neighbors):
            nb = neighbors[i]

            # neighbor embedding
            nb_emb = df.iloc[nb["index"]]["lyrics_embedding"]
            nb_emb = np.asarray(nb_emb, dtype="float32").reshape(-1)

            # fill nans if neighbros dont exist
            if nb_emb.shape[0] != EMB_DIM:
                fixed_emb = np.full(EMB_DIM, np.nan, dtype="float32")
                fixed_emb[:min(EMB_DIM, len(nb_emb))] = nb_emb[:EMB_DIM]
                nb_emb = fixed_emb

            vec.extend(nb_emb.tolist())

            # similarity
            sim = nb.get("similarity", np.nan)
            vec.append(float(sim))

            # popularity
            vec.append(float(nb["popularity"]))

            # audio features
            af = nb["audio_features"]
            for col in audio_feature_cols:
                val = af.get(col, np.nan)
                if isinstance(val, (float, int, np.floating)):
                    vec.append(float(val))
                else:
                    vec.append(np.nan)

        else:
            vec.extend([np.nan] * NEIGHBOR_BLOCK)

    return np.asarray(vec, dtype="float32")


In [93]:
def get_top_k_neighbors(df, query_embedding, k=5):
    raw_neighbors = retrieve_similar_songs(query_embedding, k=k)
    neighbors = []

    for n in raw_neighbors:
        idx = n["index"]
        row = df.iloc[idx]

        audio_features = {}

        for col in audio_feature_cols:
            val = row[col]

            # keep if scalar
            if np.isscalar(val):
                audio_features[col] = float(val)
            
            # flatten if array
            elif isinstance(val, np.ndarray):
                val = val.flatten()
                for j, v in enumerate(val):
                    audio_features[f"{col}_{j}"] = float(v)
            
            # flatten if list
            elif isinstance(val, list):
                for j, v in enumerate(val):
                    audio_features[f"{col}_{j}"] = float(v)

            else:
                try:
                    audio_features[col] = float(val)
                except Exception:
                    audio_features[col] = None

        neighbor_data = {
            "index": idx,
            "song_id": row["song_id"],
            "title": row["title"],
            "artist": row["artist"],
            "similarity": n.get("similarity", None),
            "popularity": float(row["popularity"]),
            "lyrics_snippet": row["lyrics"][:400].replace("\n", " ") + "...",
            "audio_features": audio_features
        }
        
        neighbors.append(neighbor_data)

    return neighbors


In [110]:
X = []
y = []

for i, row in df.iterrows():
    target_lyric = row["lyrics"]
    target_embedding = np.asarray(row["lyrics_embedding"], dtype="float32")

    neighbors = get_top_k_neighbors(df, target_embedding, k=3)

    x_vec = construct_feature_vector(target_embedding, neighbors, audio_feature_cols, k=3)

    X.append(x_vec)
    y.append(row["popularity"])

X = np.vstack(X)
y = np.array(y, dtype="float32")


In [111]:
import lightgbm as lgb

train_data = lgb.Dataset(X, label=y)

params = {
    "objective": "regression",
    "metric": ["rmse", "mae"],
    "learning_rate": 0.05,
    "num_leaves": 63,
    "max_depth": -1,
    "feature_fraction": 0.9,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
}

model_lgb = lgb.train(params, train_data, num_boost_round=500)

print("LightGBM model trained!")


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.444943 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 584457
[LightGBM] [Info] Number of data points in the train set: 20740, number of used features: 2299
[LightGBM] [Info] Start training from score 38.835680
LightGBM model trained!


In [112]:
import shap

explainer = shap.TreeExplainer(model_lgb)
shap_values = explainer.shap_values(x_vec.reshape(1, -1))

def summarize_shap_for_sample(
    shap_values: np.ndarray,
    feature_names: List[str] = None,
    top_n: int = 15
) -> List[Dict[str, Any]]:

    if shap_values.ndim == 2:
        shap_vals = shap_values[0]
    else:
        shap_vals = shap_values

    abs_vals = np.abs(shap_vals)
    top_idx = np.argsort(abs_vals)[::-1][:top_n]

    if feature_names is None:
        feature_names = [f"feature_{i}" for i in range(len(shap_vals))]

    summary = []
    for idx in top_idx:
        summary.append({
            "feature": feature_names[idx],
            "shap_value": float(shap_vals[idx])
        })
    return summary


def group_shap_fully(
    shap_vals: np.ndarray,
    EMB_DIM: int,
    audio_feature_cols: List[str],
    k_neighbors: int = 3
):

    if shap_vals.ndim == 2:
        shap_vals = shap_vals[0]

    idx = 0
    groups = {}

    AUDIO_DIM = len(audio_feature_cols)

    target_emb_shap = shap_vals[idx : idx + EMB_DIM]
    groups["target_embedding"] = float(np.sum(target_emb_shap))
    idx += EMB_DIM

    groups["neighbors"] = []

    for nb in range(k_neighbors):

        # group by embeddings
        emb_block = shap_vals[idx : idx + EMB_DIM]
        emb_sum = float(np.sum(emb_block))
        idx += EMB_DIM

        # group by similarity
        sim_shap = float(shap_vals[idx])
        idx += 1

        # group by popularity
        pop_shap = float(shap_vals[idx])
        idx += 1

        # dont group by audio features
        audio_block = shap_vals[idx : idx + AUDIO_DIM]
        idx += AUDIO_DIM
        
        audio_dict = {
            feature_name: float(audio_block[j])
            for j, feature_name in enumerate(audio_feature_cols)
        }

        groups["neighbors"].append({
            "embedding": emb_sum,
            "similarity": sim_shap,
            "popularity": pop_shap,
            "audio_features": audio_dict
        })

    return groups




In [113]:
# continue working on promping
def build_rag_prompt_for_lyric_popularity(
    user_lyric: str,
    neighbors: List[Dict[str, Any]],
    predicted_popularity: float,
    shap_summary: List[Dict[str, Any]]
):


    lines = []
    lines.append("You are an expert in music analytics, audio features, and lyric interpretation.")
    lines.append("Your task is to EXPLAIN a predicted popularity score for a NEW lyric.")
    lines.append("")
    lines.append("CRITICAL INSTRUCTIONS:")
    lines.append(" - DO NOT provide disclaimers about limitations of predicting popularity from lyrics.")
    lines.append(" - Ground every part of your explanation in the retrieved similar songs.")
    lines.append(" - Quote specific phrases from the neighbor lyrics when helpful.")
    lines.append(" - Explain audio features in simple, everyday terms.")
    lines.append(" - Use the SHAP feature-attribution summary as evidence for WHY the model made its prediction.")
    lines.append(" - Structure your explanation into multiple paragraphs:")
    lines.append("      Paragraph 1: Lyric similarity analysis using retrieved neighbors.")
    lines.append("      Paragraph 2: Audio feature comparisons (brightness, timbre, tempo, etc.).")
    lines.append("      Paragraph 3: Interpretation of SHAP results for this specific lyric.")
    lines.append("      Paragraph 4: Final justification tying all evidence together.")
    lines.append("")
    lines.append("Return ONLY valid JSON with this format:")
    lines.append("{")
    lines.append('  "predicted_popularity": <number>,')
    lines.append('  "explanation": "<multi-paragraph explanation grounded in evidence>"')
    lines.append("}")
    lines.append("")
    lines.append("IMPORTANT:")
    lines.append("Return ONLY raw JSON.")
    lines.append("Do NOT include any code fences such as ``` json")
    lines.append("Do NOT include any explanation text outside the JSON.")
    lines.append("Do NOT add commentary before or after the JSON.")
    lines.append("Return JSON ONLY.")
    lines.append("")
    lines.append("------------------------------------------------------------")
    lines.append("NEW LYRIC:")
    lines.append(user_lyric.strip())
    lines.append("------------------------------------------------------------")
    lines.append("")
    lines.append(f"Predicted Popularity Score: {predicted_popularity:.2f}")
    lines.append("")
    lines.append("------------------------------------------------------------")
    lines.append("SHAP FEATURE-ATTRIBUTION SUMMARY (TOP CONTRIBUTORS):")
    lines.append("These features influenced the model's prediction and should be used in the explanation:")
    lines.append("SHAP FEATURE ATTRIBUTION SUMMARY (GROUPED):")
    lines.append(f"  Target embedding contribution: {shap_summary['target_embedding']:+.3f}")

    lines.append("\nNeighbor Contributions:")
    for i, nb in enumerate(shap_summary["neighbors"], start=1):
        lines.append(f"  Neighbor {i}:")
        lines.append(f"    embedding: {nb['embedding']:+.3f}")
        lines.append(f"    similarity: {nb['similarity']:+.3f}")
        lines.append(f"    popularity: {nb['popularity']:+.3f}")
        lines.append("    audio_features:")
        for feat_name, val in nb["audio_features"].items():
            lines.append(f"      {feat_name}: {val:+.3f}")

    lines.append("------------------------------------------------------------")
    lines.append("")
    lines.append("SIMILAR SONGS RETRIEVED FROM THE DATASET:")
    lines.append("Use these songs as evidence for lyrical themes, audio patterns, and overall justification.")

    for i, nb in enumerate(neighbors, start=1):
        lines.append(f"\nNeighbor #{i}:")
        lines.append(f"  song_id: {nb['song_id']}")
        lines.append(f"  title: {nb['title']}")
        lines.append(f"  artist: {nb['artist']}")
        if nb.get("similarity") is not None:
            lines.append(f"  similarity_score: {nb['similarity']:.4f}  (lower = more similar lyrics)")
        lines.append(f"  popularity: {nb['popularity']:.2f}")
        lines.append(f"  lyrics_snippet: {nb['lyrics_snippet']}")
        lines.append("  audio_features:")
        for feat_name, feat_val in nb["audio_features"].items():
            if isinstance(feat_val, (int, float)):
                lines.append(f"    {feat_name}: {feat_val:.4f}")
            else:
                lines.append(f"    {feat_name}: {feat_val}")

    lines.append("")
    lines.append("------------------------------------------------------------")
    lines.append(
        "Using ONLY the information above — the new lyric, retrieved neighbors, "
        "the predicted popularity score, and the SHAP feature-attribution summary — "
        "produce a multi-paragraph explanation grounded in the dataset evidence. "
        "Do not speculate beyond what is shown. Do not include disclaimers. "
        "Focus on clear, real-world intuition about audio features, lyrical patterns, "
        "genre cues, and model attribution."
    )

    return "\n".join(lines)



In [114]:
import os
import re
import json
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

def call_llm_for_popularity_and_explanation(prompt: str) -> dict:

    response = client.responses.create(
        model="gpt-4o",
        input=prompt,
        temperature=0.2,
        max_output_tokens=900
    )

    raw_text = response.output[0].content[0].text.strip()

    # Remove any ```json ...``` or ```
    raw_text = raw_text.replace("```json", "")
    raw_text = raw_text.replace("```", "")
    raw_text = raw_text.strip()

    # first try direct json parse
    try:
        return json.loads(raw_text)
    except:
        pass

    # else, find outside json block using regex
    json_matches = re.findall(r"\{(?:[^{}]|(?:\{[^{}]*\}))*\}", raw_text, flags=re.DOTALL)

    if json_matches:
        for match in json_matches:
            try:
                return json.loads(match)
            except:
                continue

    # try to repair json with trailing commas
    repaired = re.sub(r",\s*([}\]])", r"\1", raw_text)

    try:
        return json.loads(repaired)
    except:
        pass

    # else, say failed
    print("Could not parse JSON from LLM output. Returning raw text.")
    return {
        "predicted_popularity": None,
        "explanation": raw_text
    }





In [115]:
def rag_lyric_popularity_system(
    df: pd.DataFrame,
    user_lyric: str,
    k_neighbors: int = 3,
    top_shap_features: int = 15,
    feature_names: List[str] = None
) -> Dict[str, Any]:

    # 1) embed target lyric
    query_embedding = embed_lyrics(user_lyric)

    # 2) retrieve neighbors using FAISS + your helper
    neighbors = get_top_k_neighbors(df, query_embedding, k=k_neighbors)

    # 3) construct feature vector for prediction
    x_vec = construct_feature_vector(
        target_embedding=query_embedding,
        neighbors=neighbors,
        audio_feature_cols=audio_feature_cols,
        k=k_neighbors
    )

    # 4) predict popularity with LightGBM
    pred_pop = float(model_lgb.predict(x_vec.reshape(1, -1))[0])

    # 5) compute SHAP values for this sample
    shap_vals = explainer.shap_values(x_vec.reshape(1, -1))

    EMB_DIM = len(query_embedding)
    shap_grouped = group_shap_fully(
        shap_vals,
        EMB_DIM=EMB_DIM,
        audio_feature_cols=audio_feature_cols,
        k_neighbors=k_neighbors
    )


    # 6) build prompt for explanation (we reconstructed this earlier)
    prompt = build_rag_prompt_for_lyric_popularity(
        user_lyric=user_lyric,
        neighbors=neighbors,
        predicted_popularity=pred_pop,
        shap_summary=shap_grouped
    )

    # 7) call LLM for explanation ONLY
    llm_output = call_llm_for_popularity_and_explanation(prompt)

    explanation = llm_output.get("explanation", "")

    return {
        "predicted_popularity": pred_pop,
        "explanation": explanation,
        "neighbors_used": neighbors,
        "prompt_sent": prompt,
        "raw_llm_output": llm_output,
    }


# Now we can test the system

In [100]:
import textwrap

test_lyric = "im not cute anymore"
result = rag_lyric_popularity_system(df, test_lyric, k_neighbors=3)

print("Predicted popularity:", result["predicted_popularity"])
print("\nExplanation:\n", textwrap.fill(result["explanation"], width=125))


Predicted popularity: 68.35083016646738

Explanation:
 The lyric 'im not cute anymore' shares thematic elements with the retrieved neighbors, particularly in its exploration of
self-perception and identity. Neighbor 1, 'LIKEY' by TWICE, discusses themes of self-image and the desire to be perceived as
attractive, with lines like '자꾸 드러내고 싶지' (I want to show off). Similarly, Neighbor 2, 'Mateo' by Tove Lo, touches on feelings
of inadequacy and the pressure to fit in, as seen in 'I act so cool, but that's not me.' These thematic overlaps suggest that
the new lyric resonates with popular topics in contemporary music, contributing to its predicted popularity score.  In terms
of audio features, the new lyric's predicted popularity is influenced by its similarity to the audio characteristics of the
neighbors. Neighbor 1 has a bright and energetic sound, with a high spectral centroid and tempo, which are common in upbeat
pop songs. Neighbor 2, while slightly less energetic, still maintains a l

## Trems lyrics

In [101]:
trem_lyrics = "i'll take what you say. the wrong way on purpose. just to make me think. someone's paying attention. i'll take what you think. and sink under the surface. just to make me fee. like i'm worthy of viewing. delusional no. its desperate thinking. illiterate no. i just can’t read you. all that i do. is wait for you to notice. all that i get. is nothing short of. it's not what i say"

result = rag_lyric_popularity_system(df, trem_lyrics, k_neighbors=3)

print("Predicted popularity:", result["predicted_popularity"])
print("\nExplanation:\n", textwrap.fill(result["explanation"], width=125))

Predicted popularity: 67.6863339419813

Explanation:
 The new lyric shares thematic elements with its neighbors, particularly in its introspective and emotional tone. For
instance, Neighbor 1, 'So What' by BTS, features lines about dealing with internal struggles and seeking validation, similar
to the new lyric's exploration of feeling 'worthy of viewing' and 'desperate thinking.' Neighbor 2, 'Right Here' by Staind,
also deals with themes of waiting and seeking attention, as seen in the repeated phrase 'right here waiting.' These thematic
similarities suggest a resonance with listeners who appreciate introspective and emotionally charged lyrics.  In terms of
audio features, the new lyric's predicted popularity is influenced by its similarity to songs with certain sonic
characteristics. Neighbor 1 has a bright and energetic sound, with a high spectral centroid and tempo, which often correlates
with engaging and lively tracks. Neighbor 2, while having a lower tempo, shares a similar spec

In [102]:
trem_lyrics = "Feels like im running around. Feel like im running 'round again. Don't know how Or where it started. Feel like im goin crazy. Outta my mind. Outta space. Outta time. By the time I survive. I don't wanna move. I don't wanna stay. And either way. It always ends in heartbreak. I don't wanna be here anyway. You make it hard You make it hard for me to Stay why don't 'Cha give it up for me. And I just wanna feel Something. Stuck in a rut. never knowing what. I'm outta touch. Mmm mmm. I'm outta luck. Save me now. I'm outta touch. And I don't even want touch this I don't wanna do anything more. Anything much. I betcha know. I betcha know. I betcha know. I…I.  think I'd start over again. I think I'd start over again. I think I'd start over again oh. I think I'd start over again. I would I will if I could then I should.Start over. It's my time. My life. It's my right. My crime.I 'm alright. No I'm fine I'm fine I'm fine! "

result = rag_lyric_popularity_system(df, trem_lyrics, k_neighbors=3)

print("Predicted popularity:", result["predicted_popularity"])
print("\nExplanation:\n", textwrap.fill(result["explanation"], width=125))

Predicted popularity: 50.53201421975274

Explanation:
 The new lyric shares thematic elements with its closest neighbors, particularly the feelings of confusion and desire for
change. For instance, Neighbor 1, 'No Way Out' by Bullet for My Valentine, expresses a struggle with internal thoughts and a
desire to escape, similar to the lines 'Feel like im goin crazy. Outta my mind.' Neighbor 2, 'Glass' by Nekoi, also conveys a
sense of being lost and tired of feeling stuck, which resonates with 'Stuck in a rut. never knowing what.' These thematic
similarities contribute to the predicted popularity score by aligning with common emotional narratives in popular music.  In
terms of audio features, the new lyric is predicted to have a moderate tempo and energy level, akin to Neighbor 1, which has
a tempo of 112.5 BPM and a relatively high energy level. The spectral features such as spectral centroid and zero-crossing
rate suggest a bright and dynamic sound, similar to the audio characteristics 