<h1 style="color:rgb(0,120,170)">344.038, KV Multimedia Search and Retrieval (WS2023/24)</h1>
<h2 style="color:rgb(0,120,170)">Task 1_Group B</h2>

| First Name | Family Name  | Matr.Nr   |
|:-----------|:-------------|:----------|
| Harald     | Eibensteiner | K01300179 |
| Hadi       | Sanaei       | K11733444 |
| Lukas      | Troyer       | K12006666 |
| Lukas      | Wagner       | K01357626 |
| Branko     | Paunović     | K12046370 |

### Load Data & Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
np.random.seed(42)

import pandas as pd

from song import Song, songs
from retrieval import Retrieval, SimilarityMeasure

pd.set_option("display.expand_frame_repr", False) # prevent pandas from wrapping cells

In [3]:
# Number of similar songs to retrieve
N = 10

In [4]:
import numpy as np
artists = songs.info["artist"].unique()

In [5]:
artists = {artist: True for artist in sorted(artists)}
with open("artists.txt", "w") as fp:
    fp.write("\n".join(list(artists.keys())))

In [6]:
#
# Example:
#   - Title: Letterman
#   - Artist: Wiz Khalifa
#
query, query_row_id = songs.prompt_song_query()


####  Random Baseline

In [7]:
results = Retrieval(n=N).random_baseline(query)

print("random_baseline() retrieval based on your query: ", "\n\n", results)

random_baseline() retrieval based on your query:  

                                                     song                artist
6254                                     Beyond the Down   Black Label Society
568                                               Motion           Boy Harsher
8269                                     Hole In My Soul         Kaiser Chiefs
5951                                        Sans Logique         Mylène Farmer
10070                                             Mirror            Kat Dahlia
33                                                 Flash  Cigarettes After Sex
576    Long Cool Woman (In A Black Dress) - 1999 Rema...           The Hollies
1512                                     Natural Harmony             The Byrds
7855                                               Candy          Paolo Nutini
532                                            You Don't                 GFOTY


###  Text-based(cos-sim, tf-idf)

In [8]:
# Perform tf_idf retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)

print("\n\n", "cosine_similarity() on tf_idf representations based on your query: ", "\n\n", results)



 cosine_similarity() on tf_idf representations based on your query:  

                  id  similarity                                    song          artist
9  zmukOoB01ASq2Y6r    0.474141  California And The Slipping Of The Sun        Gorillaz
8  qtI2udJpwde94hTQ    0.403079                           Open the Door    Otis Redding
3  K7amDTeZDJg7Ozxw    0.377162                                One Foot         Airways
7  pSOYlgWNzpZWpN9j    0.362179                       At Last I Am Free    Robert Wyatt
5  YT1ZsHvpwaukvRQv    0.344942                    San Francisco Street         Sun Rai
0  981Lm983YaJcbQV4    0.344636                              Street Joy     White Denim
2  EJBdc9p2QSyFT2Be    0.337130              Let It Go - Single Version     Demi Lovato
6  lTJVFhGsiLggKCEq    0.328499                     I Let Him Get to Me  Beat Happening
4  NCONeOaMBitsyTJt    0.318821                          Let It Will Be         Madonna
1  B1w512ol1loTbSBL    0.302504               

###  Text-based(cos-sim, feature)

In [9]:
# Perform tfidf retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)

# Print the list of similar songs
print("\n\n", "cosine_similarity() on BERT representations based on your query: ", "\n\n", results)



 cosine_similarity() on BERT representations based on your query:  

                  id  similarity                           song            artist
0  5S5J81wX2tJKJnUF    0.736194                        Mansion         lil skies
7  iiUveYG1BM5KROVd    0.728700               Lights Turned On  Childish Gambino
1  BsztDgeUHXH5xMdG    0.725406  I Want It All (feat. Mack 10)          Warren G
6  dDNB0NTVZ9G5ENNX    0.725316                        Ballin'        Chief Keef
5  Q837q2LPqffyecBB    0.724774                    I Get Money           50 Cent
2  F2e2v4hi3OplM8Vw    0.723561                         STAINS      BROCKHAMPTON
9  wL6STNKK7Q0thTFC    0.722057                  Barbie Dreams       Nicki Minaj
8  sGD0zOgtbcrwZZO6    0.716462                     1997 DIANA      BROCKHAMPTON
3  M65mU1UIozrDxcvu    0.715846                      Fast Lane    Bad Meets Evil
4  OD6sfVyohxUZAO36    0.715089                       Top Back              T.I.


###  Text-based(similarity, feature)

In [10]:
# Perform word2vec retrieval
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="word2vec",
    similarity_measure=SimilarityMeasure.COSINE
)

# Print the list of similar songs
print("\n\n", "List of songs that share similarities with the song you queried: ", "\n\n", results)



 List of songs that share similarities with the song you queried:  

                  id  similarity                    song          artist
3  WP4RkO51CCZnKBlQ    0.905416             Wyclef Jean      Young Thug
0  CfCJl3HWjZCvaTv3    0.901050     Let Me Blow Ya Mind             Eve
8  pK2EZxyujhM6yp7Z    0.899793                  W.T.P.          Eminem
2  Q8HUKzo5muVvgIvP    0.896035        Marshall Mathers          Eminem
9  qtI2udJpwde94hTQ    0.894491           Open the Door    Otis Redding
4  dVLg34HafHN2B9lW    0.889872             Candy Paint     Post Malone
6  lPAW2IulFyjZtuaH    0.889726  can't leave without it       21 Savage
1  PoTQ9felmQHjpYDi    0.887894                Lighters  Bad Meets Evil
7  mgjDCaqqLfxz1070    0.887509             Drug Ballad          Eminem
5  gmFBqh7nAv05B4zE    0.887238                  Swords          M.I.A.


In [11]:
# TODO:
# For each query track, qualitatively compare the retrieved tracks
# 1. (with the query and other tracks in # the result list)
# 2. analyzing whether the list includes tracks by the same artist or of the same genre.
# 3. investigate the relevance of the retrieved tracks for the query
# (i.e., given the query can you speculate why the tracks in the result list have been retrieved?)
#
# (optional) Use some interactive UI instead of using input()? e.g. Dash
# (optional) Add a "I'm feeling lucky" (random) button when prompted to query a song

## Qualitative Analysis

### \#1: The Mission: Deliverance
#### 1.1. Random baseline

In [12]:
query, query_row_id = songs.get_match(
    Song(title="Deliverance", artist="The Mission")
)
songs.info.loc[Retrieval(n=N).random_baseline(query).index, songs.info.columns != "id"]

Unnamed: 0,artist,song,album_name
3462,The Three Degrees,Dirty Ol' Man,The Three Degrees
8078,Alice in Chains,Junkhead,Dirt
1606,Alicia Keys,How It Feels to Fly,The Element Of Freedom
93,4minute,Cold Rain,Crazy
2032,Saint PHNX,King,King
1114,Jamiroquai,Cloud 9,Automaton
7918,Eminem,Drug Ballad,The Marshall Mathers LP
1712,Good Charlotte,Misery,Good Morning Revival
5022,Wolfheart,Everlasting Fall,Constellation Of The Black Light
9057,Saor,Hearth,Guardians


<strong>Comment:</strong> As expected, the Random Baseline algorithm returns completely random results.

#### 1.2. Text-based(cos-sim, tf-idf)

In [13]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="tf_idf",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
5,0.805807,Give Up,Diana Ross,"Oral Fixation, Vol. 2 (Expanded Edition)"
7,0.776143,"You Think I Ain't Worth a Dollar, but I Feel L...",Queens of the Stone Age,Year Of The Gentleman (Bonus Track Edition)
9,0.617047,Scorned,Rawthang,Michael Bublé (US Version)
2,0.616081,Gimme All Your Love,Alabama Shakes,In Our Bones
0,0.608036,Te regalo,Carlos Baute,We As Human
4,0.60708,Give It Away,Red Hot Chili Peppers,The Best of Laura Pausini - E Ritorno Da Te
3,0.59773,Unspoken,The Ghost Inside,Trance - The Early Years (1997-2002)
8,0.574476,Swamp Song,Blur,24 Hour Revenge Therapy (Remastered)
6,0.571491,Give It Up,Etherwood,Yellow House
1,0.555318,Give Me Love,Ciara,Life After Death (Remastered Edition)


#### 1.3. Text-based(cos-sim, BERT)

In [14]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
7,0.729136,The Seven Angels,Avantasia,Year Of The Gentleman (Bonus Track Edition)
3,0.718301,Hail and Kill,Manowar,Trance - The Early Years (1997-2002)
4,0.712816,Linda Menina,Rosa de Saron,The Best of Laura Pausini - E Ritorno Da Te
1,0.709174,Ghost Dance,Patti Smith,Life After Death (Remastered Edition)
0,0.704484,Terremoto,Eyshila,We As Human
2,0.704345,Ashes of Eternity,Blind Guardian,In Our Bones
8,0.703635,Unchain My Soul,Dark Funeral,24 Hour Revenge Therapy (Remastered)
9,0.69492,Forget Me Knot,Serj Tankian,Michael Bublé (US Version)
5,0.689081,Reza,Elis Regina,"Oral Fixation, Vol. 2 (Expanded Edition)"
6,0.688819,Maid of Lorraine,Leaves' Eyes,Yellow House


#### 1.4. Text-based(cos-sim, word2vec)

In [15]:
results = Retrieval(n=N).text_based_similarity(
    song_id=query_row_id,
    feature="bert",
    similarity_measure=SimilarityMeasure.COSINE
)
results["album_name"] = songs.info["album_name"].loc[results.index]
results.drop("id", axis=1)

Unnamed: 0,similarity,song,artist,album_name
7,0.729136,The Seven Angels,Avantasia,Year Of The Gentleman (Bonus Track Edition)
3,0.718301,Hail and Kill,Manowar,Trance - The Early Years (1997-2002)
4,0.712816,Linda Menina,Rosa de Saron,The Best of Laura Pausini - E Ritorno Da Te
1,0.709174,Ghost Dance,Patti Smith,Life After Death (Remastered Edition)
0,0.704484,Terremoto,Eyshila,We As Human
2,0.704345,Ashes of Eternity,Blind Guardian,In Our Bones
8,0.703635,Unchain My Soul,Dark Funeral,24 Hour Revenge Therapy (Remastered)
9,0.69492,Forget Me Knot,Serj Tankian,Michael Bublé (US Version)
5,0.689081,Reza,Elis Regina,"Oral Fixation, Vol. 2 (Expanded Edition)"
6,0.688819,Maid of Lorraine,Leaves' Eyes,Yellow House
