# Sistem Rekomendasi Musik Spotify
**Oleh : Yoga Mileniandi**

## Pendahuluan
![spotify](https://user-images.githubusercontent.com/61934759/137764151-d27729b5-7145-4df8-97e2-168e7bbb0caf.png)
Proyek berupa sistem rekomendasi musik yang ditunjukkan bagi pengguna aplikasi Spotify. Sistem rekomendasi musik ini menggunakan pendekatan content-based filtering. Content-based filtering melakukan rekomendasi dengan mempelajari profil minat pengguna baru berdasarkan data dari objek yang telah dinilai pengguna.


## 1. Mempersiapkan Library dan Dataset

### 1.1 Memanggil Library

In [7]:
# Library untuk pengolahan data
import numpy as np
import pandas as pd
from zipfile import ZipFile

# Library untuk visualisasi data
import matplotlib.pyplot as plt
import seaborn as sns

# Library untuk pemodelan
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import calinski_harabasz_score

### 1.2 Mengunduh Dataset from Kaggle

In [5]:
# Melakukan pengaturan API Kaggle
! pip install -q kaggle
from google.colab import files
files.upload()
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle (1).json
mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [6]:
# Mengambil dataset dari Kaggle
!kaggle datasets download -d edalrami/19000-spotify-songs

Downloading 19000-spotify-songs.zip to /content
  0% 0.00/1.18M [00:00<?, ?B/s]
100% 1.18M/1.18M [00:00<00:00, 39.0MB/s]


### 1.3 Memuat dataset

In [8]:
# Ekstrasi data
path = '/content/19000-spotify-songs.zip'
with ZipFile(path, 'r') as zip_ref:
  zip_ref.extractall('working')

In [9]:
# Memuat dataset dengan library pandas
song_data = pd.read_csv("/content/working/song_data.csv")
song_info = pd.read_csv("/content/working/song_info.csv")

## 2. Pemahaman Data

In [10]:
# Melihat isi song_data
song_data.head()

Unnamed: 0,song_name,song_popularity,song_duration_ms,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,audio_mode,speechiness,tempo,time_signature,audio_valence
0,Boulevard of Broken Dreams,73,262333,0.00552,0.496,0.682,2.9e-05,8,0.0589,-4.095,1,0.0294,167.06,4,0.474
1,In The End,66,216933,0.0103,0.542,0.853,0.0,3,0.108,-6.407,0,0.0498,105.256,4,0.37
2,Seven Nation Army,76,231733,0.00817,0.737,0.463,0.447,0,0.255,-7.828,1,0.0792,123.881,4,0.324
3,By The Way,74,216933,0.0264,0.451,0.97,0.00355,0,0.102,-4.938,1,0.107,122.444,4,0.198
4,How You Remind Me,56,223826,0.000954,0.447,0.766,0.0,10,0.113,-5.065,1,0.0313,172.011,4,0.574


In [11]:
# Melihat isi song_info
song_info.head()

Unnamed: 0,song_name,artist_name,album_names,playlist
0,Boulevard of Broken Dreams,Green Day,Greatest Hits: God's Favorite Band,00s Rock Anthems
1,In The End,Linkin Park,Hybrid Theory,00s Rock Anthems
2,Seven Nation Army,The White Stripes,Elephant,00s Rock Anthems
3,By The Way,Red Hot Chili Peppers,By The Way (Deluxe Version),00s Rock Anthems
4,How You Remind Me,Nickelback,Silver Side Up,00s Rock Anthems


In [15]:
song_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18835 entries, 0 to 18834
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   song_name         18835 non-null  object 
 1   song_popularity   18835 non-null  int64  
 2   song_duration_ms  18835 non-null  int64  
 3   acousticness      18835 non-null  float64
 4   danceability      18835 non-null  float64
 5   energy            18835 non-null  float64
 6   instrumentalness  18835 non-null  float64
 7   key               18835 non-null  int64  
 8   liveness          18835 non-null  float64
 9   loudness          18835 non-null  float64
 10  audio_mode        18835 non-null  int64  
 11  speechiness       18835 non-null  float64
 12  tempo             18835 non-null  float64
 13  time_signature    18835 non-null  int64  
 14  audio_valence     18835 non-null  float64
dtypes: float64(9), int64(5), object(1)
memory usage: 2.2+ MB


In [16]:
song_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18835 entries, 0 to 18834
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   song_name    18835 non-null  object
 1   artist_name  18835 non-null  object
 2   album_names  18835 non-null  object
 3   playlist     18835 non-null  object
dtypes: object(4)
memory usage: 588.7+ KB


In [18]:
print("Jumlah data lagu pada song_data : ", len(song_data.song_name.unique()))
print("Jumlah data lagu pada song_info : ", len(song_info.song_name.unique()))

Jumlah data lagu pada song_data :  13070
Jumlah data lagu pada song_info :  13070


## 3. Eksplorasi Data

### 3.1 EDA - Unvariate Analysis