<h1 style="color:rgb(0,120,170)">344.038, KV Multimedia Search and Retrieval (WS2023/24)</h1>
<h2 style="color:rgb(0,120,170)">Task 1_Group B</h2>

| First Name | Family Name  | Matr.Nr   |
|:-----------|:-------------|:----------|
| Harald     | Eibensteiner | K01300179 |
| Sanaei     | Moghaddam    | K11733444 |
| Lukas      | Troyer       | K12006666 |
| Lukas      | Wagner       | K01357626 |
| Branko     | Paunović     | K12046370 |

### Load Data & Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
np.random.seed(42)

import pandas as pd

from song import Song, songs
from retrieval import Retrieval, SimilarityMeasure

pd.set_option("display.expand_frame_repr", False) # prevent pandas from wrapping cells

In [3]:
# Number of similar songs to retrieve
N = 10

In [4]:
#
# Example:
#   - Title: Letterman
#   - Artist: Wiz Khalifa
#
query, query_row_id = songs.prompt_song_query()


####  Random Baseline

In [5]:
results = Retrieval(n=N).random_baseline(query)

print("\n\n", "random_baseline() retrieval based on your query: ", "\n\n", results)



 random_baseline() retrieval based on your query:  

                                                     song                artist
6254                                     Beyond the Down   Black Label Society
568                                               Motion           Boy Harsher
8269                                     Hole In My Soul         Kaiser Chiefs
5951                                        Sans Logique         Mylène Farmer
10070                                             Mirror            Kat Dahlia
33                                                 Flash  Cigarettes After Sex
576    Long Cool Woman (In A Black Dress) - 1999 Rema...           The Hollies
1512                                     Natural Harmony             The Byrds
7855                                               Candy          Paolo Nutini
532                                            You Don't                 GFOTY


###  Text-based(cos-sim, tf-idf)

In [6]:
# Perform tf_idf retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)

print("\n\n", "cosine_similarity() on tf_idf representations based on your query: ", "\n\n", results)



 cosine_similarity() on tf_idf representations based on your query:  

                  id  similarity                                    song          artist
9  zmukOoB01ASq2Y6r    0.474141  California And The Slipping Of The Sun        Gorillaz
8  qtI2udJpwde94hTQ    0.403079                           Open the Door    Otis Redding
3  K7amDTeZDJg7Ozxw    0.377162                                One Foot         Airways
7  pSOYlgWNzpZWpN9j    0.362179                       At Last I Am Free    Robert Wyatt
5  YT1ZsHvpwaukvRQv    0.344942                    San Francisco Street         Sun Rai
0  981Lm983YaJcbQV4    0.344636                              Street Joy     White Denim
2  EJBdc9p2QSyFT2Be    0.337130              Let It Go - Single Version     Demi Lovato
6  lTJVFhGsiLggKCEq    0.328499                     I Let Him Get to Me  Beat Happening
4  NCONeOaMBitsyTJt    0.318821                          Let It Will Be         Madonna
1  B1w512ol1loTbSBL    0.302504               

###  Text-based(cos-sim, BERT)

In [7]:
# Perform tfidf retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)

# Print the list of similar songs
print("\n\n", "cosine_similarity() on BERT representations based on your query: ", "\n\n", results)



 cosine_similarity() on BERT representations based on your query:  

                  id  similarity                           song            artist
0  5S5J81wX2tJKJnUF    0.736194                        Mansion         lil skies
7  iiUveYG1BM5KROVd    0.728700               Lights Turned On  Childish Gambino
1  BsztDgeUHXH5xMdG    0.725406  I Want It All (feat. Mack 10)          Warren G
6  dDNB0NTVZ9G5ENNX    0.725316                        Ballin'        Chief Keef
5  Q837q2LPqffyecBB    0.724774                    I Get Money           50 Cent
2  F2e2v4hi3OplM8Vw    0.723561                         STAINS      BROCKHAMPTON
9  wL6STNKK7Q0thTFC    0.722057                  Barbie Dreams       Nicki Minaj
8  sGD0zOgtbcrwZZO6    0.716462                     1997 DIANA      BROCKHAMPTON
3  M65mU1UIozrDxcvu    0.715846                      Fast Lane    Bad Meets Evil
4  OD6sfVyohxUZAO36    0.715089                       Top Back              T.I.


###  Text-based(cos-sim, word2vec)

In [8]:
# Perform word2vec retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="word2vec",
    similarity_measure=SimilarityMeasure.COSINE
)

# Print the list of similar songs
print("\n\n", "List of songs that share similarities with the song you queried: ", "\n\n", results)



 List of songs that share similarities with the song you queried:  

                  id  similarity                    song          artist
3  WP4RkO51CCZnKBlQ    0.905416             Wyclef Jean      Young Thug
0  CfCJl3HWjZCvaTv3    0.901050     Let Me Blow Ya Mind             Eve
8  pK2EZxyujhM6yp7Z    0.899793                  W.T.P.          Eminem
2  Q8HUKzo5muVvgIvP    0.896035        Marshall Mathers          Eminem
9  qtI2udJpwde94hTQ    0.894491           Open the Door    Otis Redding
4  dVLg34HafHN2B9lW    0.889872             Candy Paint     Post Malone
6  lPAW2IulFyjZtuaH    0.889726  can't leave without it       21 Savage
1  PoTQ9felmQHjpYDi    0.887894                Lighters  Bad Meets Evil
7  mgjDCaqqLfxz1070    0.887509             Drug Ballad          Eminem
5  gmFBqh7nAv05B4zE    0.887238                  Swords          M.I.A.


## Qualitative Analysis

In [9]:
#
# Reset the random seed once more to make reproducibility possible with the Lab Report
# Only (re-)done here once, and not for each song.
#
np.random.seed(42)

### \#1: _Waka Waka (This Time for Africa)_ by _Shakira_
#### 1.1. Random baseline

In [10]:
query, query_row_id = songs.get_match(
    Song(title="Waka Waka (This Time for Africa)", artist="Shakira")
)
songs.info.loc[Retrieval(n=N).random_baseline(query).index, songs.info.columns != "id"]

Unnamed: 0,artist,song,album_name
6254,Black Label Society,Beyond the Down,Catacombs of the Black Vatican (Deluxe)
568,Boy Harsher,Motion,Country Girl (Extended Version)
8269,Kaiser Chiefs,Hole In My Soul,Stay Together
5951,Mylène Farmer,Sans Logique,Ainsi Soit Je
10070,Kat Dahlia,Mirror,My Garden
33,Cigarettes After Sex,Flash,Cigarettes After Sex
576,The Hollies,Long Cool Woman (In A Black Dress) - 1999 Rema...,Distant Light [1999 - Remaster] (1999 Remaster...
1513,Elton John,Midnight Creeper,Don't Shoot Me I'm Only The Piano Player
7855,Paolo Nutini,Candy,Sunny Side Up
532,GFOTY,You Don't,Call Him A Doctor


As expected, the Random Baseline algorithm returns completely random results. There does not seem to be a common theme.

#### 1.2. Text-based(cos-sim, tf-idf)

In [11]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
6,0.942619,Under My Umbrella,Margo Guryan,Yellow House
0,0.872472,Teenage Love Affair,Alicia Keys,We As Human
7,0.848522,Blame It on the Boom Boom,Black Stone Cherry,Year Of The Gentleman (Bonus Track Edition)
4,0.797242,Charlie Brown,Benito Di Paula,The Best of Laura Pausini - E Ritorno Da Te
5,0.646342,Walpurgisnacht,Faun,"Oral Fixation, Vol. 2 (Expanded Edition)"
8,0.63878,Barco a Venus,Mecano,24 Hour Revenge Therapy (Remastered)
2,0.611976,Mariô,Criolo,In Our Bones
1,0.605905,Dirt,Alice in Chains,Life After Death (Remastered Edition)
9,0.568361,Shine Ya Light,Rita Ora,Michael Bublé (US Version)
3,0.53645,Auld Lang Syne (The New Year's Anthem),Mariah Carey,Trance - The Early Years (1997-2002)


_Under My Umbrella_ is also suggested by word2vec and seems to have many words in common. This also goes for _Blame It on the Boom Boom_. _Charlie Brown_ is Portugese. _Walpurgisnacht_ is German. _Barco a Venus_ is Spanish. _Mariô_ is Spanish (and also retrieved by BERT). _Dirt_ has many short words like “ah”.

#### 1.3. Text-based(cos-sim, BERT)

In [12]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
9,0.648514,Sol da Liberdade,Daniela Mercury,Michael Bublé (US Version)
2,0.629005,El Vals del Obrero,Ska-P,In Our Bones
5,0.617841,Breakin'...There's No Stopping Us,Ollie & Jerry,"Oral Fixation, Vol. 2 (Expanded Edition)"
0,0.617587,BEAUTIFUL HANGOVER,Bigbang,We As Human
7,0.610629,Auf Anderen Wegen,Andreas Bourani,Year Of The Gentleman (Bonus Track Edition)
8,0.607511,Feel Good Inc.,Gorillaz,24 Hour Revenge Therapy (Remastered)
6,0.606259,VIVID,BROCKHAMPTON,Yellow House
1,0.603091,The Bomb,Pigeon John,Life After Death (Remastered Edition)
4,0.601648,Mariô,Criolo,The Best of Laura Pausini - E Ritorno Da Te
3,0.601368,Free Me,Joss Stone,Trance - The Early Years (1997-2002)


_Sol da Liberdade_ is Portugese, _El Vals del Obrero_ is Spanish. _Breakin'...There's No Stopping Us_ is about achieving dreams. _BEAUTIFUL HANGOVER_ is Japanese, _Auf Anderen Wegen_ is German.

#### 1.4. Text-based(cos-sim, word2vec)

In [13]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="word2vec",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
9,0.861111,Blame It on the Boom Boom,Black Stone Cherry,Michael Bublé (US Version)
5,0.847494,Metaphors,San Cisco,"Oral Fixation, Vol. 2 (Expanded Edition)"
7,0.847086,Under My Umbrella,Margo Guryan,Year Of The Gentleman (Bonus Track Edition)
4,0.842948,Baby's on Fire,Die Antwoord,The Best of Laura Pausini - E Ritorno Da Te
8,0.842721,Bamboreea,Inna,24 Hour Revenge Therapy (Remastered)
3,0.8408,Royal,Waterparks,Trance - The Early Years (1997-2002)
0,0.839828,God Lives Through,A Tribe Called Quest,We As Human
6,0.839047,Kiss This,The Struts,Yellow House
1,0.838812,Patterns,Simon & Garfunkel,Life After Death (Remastered Edition)
2,0.838438,Sorry - Latino Remix,Justin Bieber,In Our Bones


_Blame It on the Boom Boom_ and _Metaphors_ have many similar words to _Waka Waka_ like “eh”, “oh”, “yeah”, but neither the genre nor the theme are similar. _Bamboreea_ is also about sports. _Royal_ again has many “oh” and the like, and _God Lives Through_ also has many “la”s.

### \#2: _Diamond Heart_ by _Lady Gaga_
#### 2.1. Random baseline

In [14]:
# Reset the random seed once more
query, query_row_id = songs.get_match(
    Song(title="Diamond Heart", artist="Lady Gaga")
)
songs.info.loc[Retrieval(n=N).random_baseline(query).index, songs.info.columns != "id"]

Unnamed: 0,artist,song,album_name
3463,Waterparks,Royal,Double Dare
8079,Sparks,My Other Voice,No.1 In Heaven
1607,Kasabian,Neon Noon,Velociraptor!
93,4minute,Cold Rain,Crazy
2033,Live,All Over You,Throwing Copper
1115,Haken,1985,Affinity
7919,2Pac,I Get Around,Strictly 4 My N.I.G.G.A.Z...
1713,Chad VanGaalen,Rabid Bits of Time,Soft Airplane
5023,Editors,Sugar,The Weight of Your Love
9058,Demi Lovato,For You,Tell Me You Love Me (Deluxe)


As expected, the Random Baseline algorithm returns completely random results. There does not seem to be a common theme.

#### 2.2. Text-based(cos-sim, tf-idf)

In [15]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
5,0.480732,Dimestore Diamond,Gossip,"Oral Fixation, Vol. 2 (Expanded Edition)"
2,0.450549,Diamonds Are Forever,Shirley Bassey,In Our Bones
7,0.422778,Gloe,Kiiara,Year Of The Gentleman (Bonus Track Edition)
9,0.394226,Diamond Ring,Bon Jovi,Michael Bublé (US Version)
4,0.3911,Dear Diamond,Miranda Lambert,The Best of Laura Pausini - E Ritorno Da Te
6,0.318929,See Saw,Devendra Banhart,Yellow House
3,0.317348,Diamond in the Rough,The Nearly Deads,Trance - The Early Years (1997-2002)
8,0.297933,Wild Child,Ace Wilder,24 Hour Revenge Therapy (Remastered)
1,0.277761,Wild Wild Life,Talking Heads,Life After Death (Remastered Edition)
0,0.275118,First to Say Goodnight,Oh Land,We As Human


- "Dimestore Diamond" by Gossip (Similarity: 0.481): This song's inclusion could be due to the word "diamond" in the title, which is a thematic similarity with Lady Gaga's "Diamond Heart." Additionally, Gossip's music is related to the alternative rock and punk genres.
- "Diamonds Are Forever" by Shirley Bassey (Similarity: 0.451): While Shirley Bassey's song title contains the word "diamonds," it doesn't have a direct connection to Lady Gaga's
- "Diamond Heart." The inclusion could be due to the thematic similarity and genre relevance, as both are in the pop and adult contemporary genres.
- "Diamond Ring" by Bon Jovi (Similarity: 0.394): The word "diamond" in the title may be a reason for its inclusion, as it relates thematically to "Diamond Heart." Bon Jovi's music falls within the rock genre.
- "Dear Diamond" by Miranda Lambert (Similarity: 0.391): The word "diamond" in the title could be the primary reason for its inclusion. Miranda Lambert's music is related to country, which is a different genre from Lady Gaga's pop.
- "Diamond in the Rough" by The Nearly Deads (Similarity: 0.317): The word "diamond" in the title is likely the reason for its inclusion, but there may not be strong thematic or genre relevance.

In this list, there are varying degrees of similarity, with some songs possibly included due to shared words in the titles or potential thematic connections. However, the connection to Lady Gaga's "Diamond Heart" remains somewhat indirect.

#### 2.3. Text-based(cos-sim, BERT)

In [16]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
4,0.712682,Love Me Like You,Little Mix,The Best of Laura Pausini - E Ritorno Da Te
6,0.709689,Loyal,Chris Brown,Yellow House
2,0.706896,Lifeline,Jamiroquai,In Our Bones
0,0.700668,Moments,Tove Lo,We As Human
9,0.700466,Misfit Love,Queens of the Stone Age,Michael Bublé (US Version)
5,0.700023,Gloe,Kiiara,"Oral Fixation, Vol. 2 (Expanded Edition)"
3,0.69246,Bing Bing,Crayon Pop,Trance - The Early Years (1997-2002)
8,0.688697,A Little Party Never Killed Nobody (All We Got),"Fergie, Q-Tip & GoonRock",24 Hour Revenge Therapy (Remastered)
7,0.687543,Just Like Jesse James,Cher,Year Of The Gentleman (Bonus Track Edition)
1,0.687287,Pretty Girl - Cheat Codes X Cade Remix,Maggie Lindemann,Life After Death (Remastered Edition)


- "Moments" by Tove Lo (Similarity: 0.701): The word "moments" could be a reason for its inclusion, as songs often capture specific moments or memories, similar to the concept in "Diamond Heart."
- "Misfit Love" by Queens of the Stone Age (Similarity: 0.700): While the title is different, the inclusion could be related to the concept of love, as it appears in both songs.
- "Just Like Jesse James" by Cher (Similarity: 0.688): The inclusion may be related to the idea of romance and relationships, which is a theme in many songs, including Lady Gaga's.
- "Pretty Girl - Cheat Codes X Cade Remix" by Maggie Lindemann (Similarity: 0.687): The inclusion might be based on the thematic connection of relationships or the word "pretty" in the title, which can be related to Lady Gaga's themes and style.

In this list, the connections to Lady Gaga's "Diamond Heart" are not immediately clear, and the songs may be included based on thematic or lyrical associations rather than direct artist or genre connections.

#### 2.4. Text-based(cos-sim, word2vec)

In [17]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="word2vec",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
1,0.903783,Till The Right Man Comes Along,Tina Turner,Life After Death (Remastered Edition)
3,0.903415,I'm On One,DJ Khaled,Trance - The Early Years (1997-2002)
8,0.898276,***Flawless,Beyoncé,24 Hour Revenge Therapy (Remastered)
4,0.898103,Lighters,Bad Meets Evil,The Best of Laura Pausini - E Ritorno Da Te
0,0.898078,Countdown,Beyoncé,We As Human
6,0.898065,We Made You,Eminem,Yellow House
9,0.897623,Gettin' Up,Q-Tip,Michael Bublé (US Version)
7,0.897282,Love Somebody,Rick Springfield,Year Of The Gentleman (Bonus Track Edition)
2,0.897209,So Much Better,Eminem,In Our Bones
5,0.89658,Mr. Right,A Rocket to the Moon,"Oral Fixation, Vol. 2 (Expanded Edition)"


- "Till The Right Man Comes Along" by Tina Turner (Similarity: 0.904): This song's inclusion may be related to the concept of waiting for the right person, which could be thematically connected to "Diamond Heart."
- "***Flawless" by Beyoncé (Similarity: 0.898): The inclusion might be related to the word "***Flawless" and the idea of self-confidence, which is a theme found in both songs.
- "Mr. Right" by A Rocket to the Moon (Similarity: 0.897): This song's inclusion could be based on the theme of finding the right person or being with the perfect partner, which can be thematically related to "Diamond Heart."

In this list, the connections to Lady Gaga's "Diamond Heart" are not immediately clear, and the songs may be included based on thematic or lyrical associations rather than direct artist or genre connections.

### \#3: _Wake Me Up When September Ends_ by _Green Day_
#### 3.1. Random baseline

In [18]:
query, query_row_id = songs.get_match(
    Song(title="Wake Me Up When September Ends", artist="Green Day")
)
songs.info.loc[Retrieval(n=N).random_baseline(query).index, songs.info.columns != "id"]

Unnamed: 0,artist,song,album_name
165,Amy Winehouse,Take the Box,Frank
2276,Snowmine,The Hill,Laminate Pet Animal
1919,Katatonia,July,The Great Cold Distance (10th Anniversary Edit...
901,The Sound,Sense of Purpose,From The Lion's Mouth
7499,fromis_9,Think of You,To. Day
1073,The Jesus and Mary Chain,About You,Darklands (Expanded Version)
8719,Rita Ora,Velvet Rope,Velvet Rope
6150,Karen Dalton,How Sweet It Is,In My Own Time
9575,Rammstein,Adios,Mutter
2715,Wiz Khalifa,Letterman,"Laugh Now, Fly Later"


As expected, the Random Baseline algorithm returns completely random results. There does not seem to be a common theme.

#### 3.2. Text-based(cos-sim, tf-idf)

In [19]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
1,0.535697,Wake Up,Emigrate,Life After Death (Remastered Edition)
8,0.521682,Wake Up (Make a Move),Lostprophets,24 Hour Revenge Therapy (Remastered)
3,0.501407,Open Your Eyes,Goldfinger,Trance - The Early Years (1997-2002)
4,0.482077,All the King's Men,The Rigs,The Best of Laura Pausini - E Ritorno Da Te
6,0.421284,Wasteland,Against the Current,Yellow House
2,0.41167,End of All Hope,Nightwish,In Our Bones
7,0.404016,Bring Me to Life,Evanescence,Year Of The Gentleman (Bonus Track Edition)
5,0.403344,Black Moon,Cellar Darling,"Oral Fixation, Vol. 2 (Expanded Edition)"
9,0.397533,Wake up and Live,Youth of Today,Michael Bublé (US Version)
0,0.391455,I Want to Wake Up,Pet Shop Boys,We As Human


- "Wake Up (Make a Move)" by Lostprophets (Similarity: 0.522): This song's title contains "Wake Up," which is the same as Green Day's song. It's possible that the song's title and genre relevance to rock music led to its inclusion.
- "Wake up and Live" by Youth of Today (Similarity: 0.398): This song's title directly contains "Wake up," similar to Green Day's song. The band's music is more in the realm of hardcore punk, which is distinct from Green Day's style.
- "I Want to Wake Up" by Pet Shop Boys (Similarity: 0.391): The song title "I Want to Wake Up" shares the theme of waking up with Green Day's song. However, Pet Shop Boys' music belongs to the synth-pop genre, which is quite different from Green Day's punk rock.

In this list, some songs are included due to thematic similarities in their titles, while others may have genre relevance to Green Day's music. However, some inclusions remain unclear, possibly due to shared words in the titles but differing musical styles.

#### 3.3. Text-based(cos-sim, BERT)

In [20]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
0,0.735971,Fat Old Sun,Pink Floyd,We As Human
4,0.724557,Where Hope and Daylight Die,Summoning,The Best of Laura Pausini - E Ritorno Da Te
8,0.720513,Déjate Caer,Los Tres,24 Hour Revenge Therapy (Remastered)
5,0.719888,4 Seasons Of Loneliness,Boyz II Men,"Oral Fixation, Vol. 2 (Expanded Edition)"
3,0.716033,When the Seasons Change,Five Finger Death Punch,Trance - The Early Years (1997-2002)
7,0.713933,Soul of the Sea,Heart,Year Of The Gentleman (Bonus Track Edition)
9,0.713904,Falling Down,Oasis,Michael Bublé (US Version)
2,0.712668,Summer and Lightning,Electric Light Orchestra,In Our Bones
1,0.711576,September in the Rain,Dinah Washington,Life After Death (Remastered Edition)
6,0.710427,Rivers Between Us,Draconian,Yellow House


- "Fat Old Sun" by Pink Floyd (Similarity: 0.736): Pink Floyd is a well-known rock band, and their music shares some genre relevance with Green Day. The inclusion in the list could be due to this genre similarity.
- "When the Seasons Change" by Five Finger Death Punch (Similarity: 0.716): This song's title directly references the changing of seasons, which is related to time, possibly explaining its inclusion.
- "Summer and Lightning" by Electric Light Orchestra (Similarity: 0.713): Electric Light Orchestra is known for their rock and orchestral elements, and the word "summer" in the title may have led to its inclusion due to the connection with a season.
- "September in the Rain" by Dinah Washington (Similarity: 0.712): This song's title directly references September, which is similar to the title of Green Day's song. Dinah Washington's music is jazz, making its inclusion less obvious.

In this list, the songs seem to be loosely connected by thematic elements related to time, seasons, and potentially genre relevance with Green Day's rock music. However, the inclusion of some songs, especially those from different genres and with little thematic similarity, remains unclear.

#### 3.4. Text-based(cos-sim, word2vec)

In [21]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="word2vec",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
7,0.827592,Children Of The Grounds,Midlake,Year Of The Gentleman (Bonus Track Edition)
1,0.823171,Ghost of Days Gone By,Alter Bridge,Life After Death (Remastered Edition)
6,0.819843,Zakochany Czlowiek,Better Person,Yellow House
5,0.819494,All the King's Men,The Rigs,"Oral Fixation, Vol. 2 (Expanded Edition)"
0,0.817667,Wake Up,Emigrate,We As Human
9,0.812723,End Come Too Soon,Wild Beasts,Michael Bublé (US Version)
2,0.804929,Before Later Becomes Never,Caliban,In Our Bones
3,0.802736,Long Gone Day,Mad Season,Trance - The Early Years (1997-2002)
4,0.800547,I'm Free (At Last),Masked Intruder,The Best of Laura Pausini - E Ritorno Da Te
8,0.800521,Song Z,Tracy Ate A Bug,24 Hour Revenge Therapy (Remastered)


- "Ghost of Days Gone By" by Alter Bridge (Similarity: 0.823): While Alter Bridge is known for their rock music, the song title doesn't share similarities with Green Day's song. Its inclusion could be due to genre relevance.
- "Before Later Becomes Never" by Caliban (Similarity: 0.805): The song title doesn't directly relate to Green Day's song, but Caliban's music is in a similar genre (metal), which might explain its inclusion.
- "Long Gone Day" by Mad Season (Similarity: 0.803): Mad Season's music is related to alternative rock, which is relevant to Green Day's style. The title may evoke a sense of time, like "Wake Me Up When September Ends."
- "Song Z" by Tracy Ate A Bug (Similarity: 0.800): While the title "Song Z" doesn't directly relate to Green Day's song, the inclusion may be based on artist name similarity or genre relevance.

In this list, there are varying degrees of similarity, with some songs possibly included due to shared words in the titles or genre relevance. However, the connection to "Wake Me Up When September Ends" by Green Day remains tenuous.

### Qualitative Analysis: Review of our Initial Findings

#### Song #1: "Waka Waka (This Time for Africa)" by Shakira:
Waka Waka (This Time for Africa), contains many short words such as “eh”, “oh”, “ah”. Additionally, it contains a mix of different languages, mostly West African and English. This explains most of the chosen songs by the text-based algorithms.

#### Song #2: "Diamond Heart" by Lady Gaga:
In the baseline analysis, there is no common theme or connection between "Diamond Heart" and the listed songs. They are diverse in artist and genre. In the text-based (cos-sim) analysis with TF-IDF, the thematic connection is primarily based on the presence of the word "diamond" in song titles. However, the thematic relevance varies, and some songs are in different genres from Lady Gaga's pop music.

In the BERT and word2vec-based analyses, the songs are included based on thematic or lyrical associations rather than direct artist or genre connections. The thematic connections are diverse, and the inclusion may be due to shared words in the titles.


#### Song #3: "Wake Me Up When September Ends" by Green Day:
In the baseline analysis, there is no common theme or connection between "Wake Me Up When September Ends" and the listed songs. They are diverse in artist and genre.

In the text-based (cos-sim) analysis with TF-IDF, some songs like "Wake Up" by Emigrate and "Wake Up (Make a Move)" by Lostprophets share the word "Wake" with Green Day's song. Their inclusion may be due to thematic similarity and genre relevance to rock music. However, some inclusions remain unclear, possibly due to shared words in the titles but differing musical styles.

In the BERT and word2vec-based analyses, the songs are included based on thematic or genre associations. The thematic connections vary, and some inclusions may be due to shared words in the titles. While some songs have genre relevance, others have less clear connections to Green Day's music.