# Project 1: Methods For High-Dimensional Space Visualization Using Spotify Song Data
In this project, I had the opportunty to carry out the following tasks:
- Scrapped Spotify Song Data for multiple playlists
- Built a dataset out of the extracted information
- Used T-SNE to visualize relationships between data points in  high-dimensional spaces
- Was able to tell certain genres appart by looking at the regions where data points were more densely packed

In [44]:
#To begin with, we load the dataset created with the data I scrapped from Spotify's API
import pandas as pd
import plotly.express as px
df = pd.read_csv('SpotifyMexScored.csv')
df.head()

Unnamed: 0,cancion,artista,playlist,track_id,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,Lista Rep,liked
0,Immortal Rites,Morbid Angel,4tszLL7NTfLCoIz39Zsiy1,5hmek3mrSYvfSElBsPNbxo,0.19,0.934,1,-7.796,0.056,8e-06,0.107,0.363,0.36,Death Metal,1
1,Chopped in Half,Obituary,4tszLL7NTfLCoIz39Zsiy1,01cGujYWGF7JchJLSgf6Ta,0.257,0.989,10,-5.918,0.0867,7e-06,0.521,0.263,0.385,Death Metal,1
2,Left Hand Path,Entombed,4tszLL7NTfLCoIz39Zsiy1,5faD0zZ9fMa3J5ZN3lIWtp,0.166,0.927,7,-8.797,0.0736,4e-06,0.651,0.344,0.219,Death Metal,1
3,Pull the Plug,Death,4tszLL7NTfLCoIz39Zsiy1,2l0h4aBFLp9HdoaNdCTlbW,0.226,0.978,9,-5.729,0.209,4e-06,0.533,0.0436,0.242,Death Metal,1
4,Into The Grave,Grave,4tszLL7NTfLCoIz39Zsiy1,4bAIIhqJeOTDcyeo1GvIMo,0.295,0.915,1,-6.968,0.0954,3e-06,0.91,0.0772,0.343,Death Metal,1


In [45]:
#And then define which are the numeric columns which will be used in subsequent steps
X = df[['danceability', 'energy',
       'key', 'loudness', 'speechiness', 'acousticness', 'instrumentalness',
       'liveness', 'valence']]

In [46]:
#Since all of them are numeric, we'll only normalize the same
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [47]:
#And then, we'll use TSNE to reduce the number of components all the way down to 2 in order to be able to visualize all data and it's potential relationships
from sklearn.manifold import TSNE
model = TSNE(n_components=2)
X_transformed = model.fit_transform(X_scaled)



The default initialization in TSNE will change from 'random' to 'pca' in 1.2.


The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.



In [48]:
#TNSE values are calculated and assigned to every row
tsne1 = [value[0] for value in X_transformed]
tsne2 = [value[1] for value in X_transformed]
df['TSNE_1'] = tsne1
df['TSNE_2'] = tsne2
df.head()

Unnamed: 0,cancion,artista,playlist,track_id,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,Lista Rep,liked,TSNE_1,TSNE_2
0,Immortal Rites,Morbid Angel,4tszLL7NTfLCoIz39Zsiy1,5hmek3mrSYvfSElBsPNbxo,0.19,0.934,1,-7.796,0.056,8e-06,0.107,0.363,0.36,Death Metal,1,-20.362423,-21.135359
1,Chopped in Half,Obituary,4tszLL7NTfLCoIz39Zsiy1,01cGujYWGF7JchJLSgf6Ta,0.257,0.989,10,-5.918,0.0867,7e-06,0.521,0.263,0.385,Death Metal,1,-8.467403,-45.638783
2,Left Hand Path,Entombed,4tszLL7NTfLCoIz39Zsiy1,5faD0zZ9fMa3J5ZN3lIWtp,0.166,0.927,7,-8.797,0.0736,4e-06,0.651,0.344,0.219,Death Metal,1,-5.670158,-54.009987
3,Pull the Plug,Death,4tszLL7NTfLCoIz39Zsiy1,2l0h4aBFLp9HdoaNdCTlbW,0.226,0.978,9,-5.729,0.209,4e-06,0.533,0.0436,0.242,Death Metal,1,-11.409132,-51.696171
4,Into The Grave,Grave,4tszLL7NTfLCoIz39Zsiy1,4bAIIhqJeOTDcyeo1GvIMo,0.295,0.915,1,-6.968,0.0954,3e-06,0.91,0.0772,0.343,Death Metal,1,-0.22388,-50.622578


In [49]:
#And finally, an interactive graph of this results is generated
#It is interesting to see that certian genres can be visually told appart from others in a graphic way merely by considering their TSNE values
fig = px.scatter(df, x="TSNE_1", y="TSNE_2", color="Lista Rep", hover_data=['cancion'], template="plotly_dark")
fig.show()