# MySpotify

## Initilizing MySpotify Module

In [7]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [8]:
%reload_ext autoreload
%autoreload 2

DEBUG_MODE = True

from MySpotify import MySpotify, os
import requests

# Need always to reload the link to download the data
if not os.path.exists("data.zip"):
    download_link = ""

    response = requests.get(download_link)
    with open("data.zip", "wb") as file:
        file.write(response.content)


SpotifyModule = MySpotify("data.zip", DEBUG_MODE)

Unzipping data.zip to /workspaces/MySpotify/data/
Unzipping /workspaces/MySpotify/data/mxm_dataset_train.txt.zip to /workspaces/MySpotify/data/
Unzipping /workspaces/MySpotify/data/train_triplets.txt.zip to /workspaces/MySpotify/data/
All files are present!


[nltk_data] Downloading package wordnet to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Preprocessing Data

### Convert files from csv to parquet

In [9]:
SpotifyModule.convert_files()

Reading msd_tagtraum_cd2 file...


Converting msd_tagtraum_cd2: 100%|██████████| 280831/280831 [00:04<00:00, 59093.92it/s] 


msd_tagtraum_cd2 file converted to parquet
Reading train_triplets file...


Converting train_triplets: 100%|██████████| 48373586/48373586 [02:45<00:00, 292708.43it/s]


train_triplets file converted to parquet
Reading unique_tracks file...


Converting unique_tracks: 100%|██████████| 1000000/1000000 [00:03<00:00, 262347.19it/s]


unique_tracks file converted to parquet
Creating a tmp txt file...
Reading mxm_dataset_train file...
Processing Data...


Converting mxm_dataset_train: 100%|██████████| 210519/210519 [05:35<00:00, 627.43it/s] 


Saving Dataframe to parquet...
mxm_dataset_train file converted to parquet
All files converted to parquet!!


### Merge the Data into one parquet file

In [10]:
SpotifyModule.PreProcess_Data()

Finding play count: 100%|██████████| 4838/4838 [01:08<00:00, 70.82it/s]


Play count data prepared successfully!!


Merging Data: 100%|██████████| 100/100 [00:47<00:00,  2.10it/s]


Song Data Merged successfully!!


Merging Genre Data: 100%|██████████| 100/100 [00:39<00:00,  2.54it/s]


Tracks Data Merged successfully!!
Data Sorted successfully!!
play_count.parquet removed!!
Merged_Song_data.parquet removed!!


## Display DataFrame

In [11]:
from IPython.display import HTML, display, clear_output

def display_scrollable_df(df, height=400):
    html = df.to_html()
    scrollable = f'<div style="height: {height}px; overflow: auto;">{html}</div>'
    display(HTML(scrollable))

## Top Tracks

### Top 100 tracks by the play count

In [12]:
TopTracksNum = 100
best_tracks = SpotifyModule.get_Top_Tracks(TopTracksNum)
display_scrollable_df(best_tracks)

Unnamed: 0,artist_name,title,play_count
0,Dwight Yoakam,You're The One,726885
1,Björk,Undo,648239
2,Kings Of Leon,Revelry,527893
3,Harmonia,Sehr kosmisch,425463
4,Barry Tuckwell/Academy of St Martin-in-the-Fields/Sir Neville Marriner,Horn Concerto No. 4 in E flat K495: II. Romance (Andante cantabile),389880
5,Florence + The Machine,Dog Days Are Over (Radio Edit),356533
6,OneRepublic,Secrets,292642
7,Five Iron Frenzy,Canada,274627
8,Tub Ring,Invalid,268353
9,Sam Cooke,Ain't Misbehavin,244730


### Top 100 tracks by genre

In [13]:

TopTracksByGenreNum = 100
genre_list = ["Rock", "Rap", "Jazz", "Electronic", "Pop", "Blues", "Country", "Reggae", "New Age"]
for genre in genre_list:
    clear_output(wait=True)
    best_tracks_by_genre = SpotifyModule.Get_TopTracks_By_Genre(TopTracksByGenreNum, genre)
    display_scrollable_df(best_tracks_by_genre)
    key = input("Press Enter to continue to the next genre, or type something and press Enter to exit: ")
    if key != "":
        break


Unnamed: 0,track_id,title,play_count,majority_genre,index_number
0,TRGXQES128F42BA5EB,Undo,648239,Rock,0
1,TRONYHY128F92C9D11,Revelry,527893,Rock,1
2,TRDMBIJ128F4290431,Sehr kosmisch,425463,Rock,2
3,TROAQBZ128F9326213,Secrets,292642,Rock,3
4,TRIXAZF128F421EE64,Invalid,268353,Rock,4
5,TRIEXMF128F92FDD60,Use Somebody,145725,Rock,5
6,TRRVFSI128EF34A1AE,16 Candles,129069,Rock,6
7,TRENTGL128E0780C8E,Clocks,114362,Rock,7
8,TRIKGRK128E0780DB0,Yellow,109566,Rock,8
9,TRCBRTN12903CC4BD1,The Only Exception (Album Version),103653,Rock,9


## Collections

### Using Baseline

In [14]:
Collections = ["love", "war","happiness", "loneliness", "money"]
Num_of_tracks = 50
for theme in Collections:
    clear_output(wait=True)
    Baseline = SpotifyModule.Baseline(theme, Num_of_tracks)
    display_scrollable_df(Baseline)
    key = input("Press Enter to continue to the next genre, or type something and press Enter to exit: ")
    if key != "":
        break

Finding words for love theme: 100%|██████████| 5000/5000 [00:00<00:00, 25126.76it/s]
Getting love theme scores: 100%|██████████| 22/22 [00:04<00:00,  4.43it/s]


Unnamed: 0,artist_name,title,play_count,index_number
118,Black Eyed Peas,Imma Be,62438,0
2033,Christina Aguilera,Genie In A Bottle,9065,1
3463,50 Cent,I Get Money,6140,2
4711,Men Without Hats,Safety Dance,4861,3
7449,Fatboy Slim,Gangster Tripping,3326,4
9508,Jill Scott,It's Love,2648,5
10029,Sisters Of Mercy,This Corrosion,2515,6
33961,Lauryn Hill,Interlude 5,664,7
51696,Shaggy,All About Love,374,8
61096,All Saints,Love Is Love,290,9


### Using Word2Vec

In [15]:
Collections = ["love", "war","happiness", "loneliness", "money"]
Num_of_tracks = 50
for theme in Collections:
    clear_output(wait=True)
    Word2Vec = SpotifyModule.Word2Vec(theme, Num_of_tracks)
    display_scrollable_df(Word2Vec)
    key = input("Press Enter to continue to the next genre, or type something and press Enter to exit: ")
    if key != "":
        break

Getting love theme scores: 100%|██████████| 22/22 [00:04<00:00,  5.12it/s]


Unnamed: 0,artist_name,title,play_count,index_number
587,The B-52's,Love Shack,20226,0
7449,Fatboy Slim,Gangster Tripping,3326,1
9508,Jill Scott,It's Love,2648,2
25971,2Pac,Lord Knows,921,3
26294,Sohodolls,Bang Bang Bang Bang,908,4
32062,Ashanti,VooDoo,714,5
33961,Lauryn Hill,Interlude 5,664,6
35112,Karyn White,The Way You Love Me (LP Version),636,7
51696,Shaggy,All About Love,374,8
61096,All Saints,Love Is Love,290,9


### Using Classifier

In [16]:
Collections = ["love", "war","happiness", "loneliness", "money"]
Num_of_tracks = 50
for theme in Collections:
    clear_output(wait=True)
    Classification = SpotifyModule.Classification(Collections, theme, Num_of_tracks)
    display_scrollable_df(Classification)
    key = input("Press Enter to continue to the next genre, or type something and press Enter to exit: ")
    if key != "":
        break


Getting love theme scores: 100%|██████████| 22/22 [00:04<00:00,  5.30it/s]
Getting war theme scores: 100%|██████████| 22/22 [00:03<00:00,  5.59it/s]
Getting happiness theme scores: 100%|██████████| 22/22 [00:03<00:00,  5.69it/s]
Getting loneliness theme scores: 100%|██████████| 22/22 [00:03<00:00,  5.62it/s]
Getting money theme scores: 100%|██████████| 22/22 [00:03<00:00,  5.77it/s]
Merging lyrics with them theme label: 100%|██████████| 22/22 [00:04<00:00,  4.85it/s]


Fitting the model
Training samples: 2000, Testing samples: 500
MLP Accuracy: 0.9100
              precision    recall  f1-score   support

   happiness       0.92      0.90      0.91       103
  loneliness       0.94      0.95      0.95       110
        love       0.83      0.90      0.86        87
       money       0.93      0.89      0.91       102
         war       0.93      0.90      0.91        98

    accuracy                           0.91       500
   macro avg       0.91      0.91      0.91       500
weighted avg       0.91      0.91      0.91       500



Getting love theme scores: 100%|██████████| 22/22 [00:16<00:00,  1.33it/s]


Unnamed: 0,artist_name,title,play_count,index_number
15847,Garbage,Bleed Like Me,1603,0
21217,The Clash,English Civil War,1165,1
21743,KT Tunstall,Change,1134,2
25903,The Classic Crime,Solar Powered Life,924,3
29737,Shawn McDonald,Simply Nothing,783,4
43502,Saliva,Black Sheep,479,5
62894,the bird and the bee,I'm Into Something Good,277,6
66520,Comeback Kid,Come Around (Album Version),253,7
70245,Iron Butterfly,Unconscious Power (LP Version),231,8
90197,Gogol Bordello,Pala Tute,151,9


## Recommendations

### Similar to a User listening

In [17]:
user_id = "19a63dbfea18904260bfe9c35cabb711ce01d2a6"
SpotifyModule._Recommendations.get_similar_tracks(user_id)

Reading Data


Mapping Users: 100%|██████████| 1019318/1019318 [00:00<00:00, 1752492.37it/s]
Mapping Songs: 100%|██████████| 384546/384546 [00:00<00:00, 1870335.85it/s]


Fitting Model


100%|██████████| 20/20 [26:05<00:00, 78.25s/it]
Calculating Precision: 100%|██████████| 9885/9885 [02:08<00:00, 76.99it/s]


Precision at k: 24.27/100


Unnamed: 0,song_id,likelihood
0,SOQWYAQ12A6D4FB9A3,0.969169
1,SOHSXAV12A67ADF7E7,0.951313
2,SOSTTPA12A8AE47622,0.923543
3,SOXKGUD12A58A7C687,0.911624
4,SOTFAGD12A8C13C43C,0.860013
5,SONHVVE12AB018D038,0.858061
6,SOWKQYL12AB0183B15,0.856945
7,SOAAFAC12A67ADF7EB,0.84722
8,SOOGANI12A8C139E6B,0.837166
9,SOOGGMF12A8C131953,0.833086


In [18]:
song_id = "SOWCEMD12A8C137121"

SpotifyModule._Recommendations.get_common_listened_tracks(song_id)

Unnamed: 0,song_id,likelihood
0,SOWCEMD12A8C137121,1.0
1,SOHBIEQ12A8C13710F,0.917874
2,SONQQMT12A8C13711A,0.912367
3,SORQJVE12AB018184F,0.909913
4,SOEHCHY12A58A7C960,0.907307
5,SOFUZMG12A8C137116,0.904212
6,SOBNBWX12A8C13E6BA,0.904135
7,SONDUWO12A8C13712C,0.900458
8,SOYPYFT12AF72A1D58,0.894676
9,SOZAOJF12B0B80BCBA,0.88377
