# Main Notebook
This Notebook contains all the code from all the other ones:
- nothotsongs
- 100hotsongs
- spotipy_api_webscrapper
- clustering_songs_from_dataframes

## Importing the Libraries

In [4]:
import pandas as pd
import numpy as np
import pprint
import sys
sys.path.insert(1, '/Users/Hector_Martin/Documents/Labs/music_recommender_project/config.py')
import config
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
from bs4 import BeautifulSoup
import requests
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import pickle
import functions
%matplotlib inline

## Getting the Not Hot Songs DataFrame
Let's obtain a DataFrame containing a list of worth +5000 songs from the past.

In [5]:
pd.set_option('display.max_columns', None)
music = pd.read_csv('/Users/Hector_Martin/Documents/Labs/music_recommender_project/data/EvolutionPopUSA_MainData.csv')

### Since the DataFrame consists of + 17000 songs, we are going to work with a fraction of the it

In [6]:
nothotsongs = music.sample(frac=0.3, replace=False, random_state=1)
nothotsongs = nothotsongs[['artist_name', 'track_name']]

### Renaming the columns to keep consistency with the 100 Hot Songs DataFrame.

In [7]:
nothotsongs = nothotsongs.rename(columns={'artist_name':'artists',
                            'track_name':'songs', })
nothotsongs = nothotsongs[['songs', 'artists']]
nothotsongs

Unnamed: 0,songs,artists
4614,Georgia On My Mind,Michael Bolton
12993,Insane In The Brain,Cypress Hill
9999,I Don't Like Mondays,The Boomtown Rats
7081,Goodbye,Night Ranger
1806,Rooms On Fire,Stevie Nicks
...,...,...
7440,Angel,Natasha Bedingfield
5856,Automatically Sunshine,The Supremes
15036,Vaquero (Cowboy),The Fireballs
16082,The Blizzard,Jim Reeves


### Storing the DataFrame into a csv file

In [8]:
nothotsongs.to_csv("data/nothotsongs.csv", index=False)

## Web Scrapping of 100 Hot Songs
Our goal is to scrape the current top 100 songs present at https://www.billboard.com/charts/hot-100 and their respective artists, put the information into a pandas dataframe, and save the dataframe in a csv file in the current folder.

In [9]:
hot100_df = functions.hot100("https://www.billboard.com/charts/hot-100/")
hot100_df

Unnamed: 0,songs,artists
0,Wait For U,Future Featuring Drake & Tems
1,As It Was,Harry Styles
2,First Class,Jack Harlow
3,Puffin On Zootiez,Future
4,Heat Waves,Glass Animals
...,...,...
95,Ahhh Ha,Lil Durk
96,Rumors,Gucci Mane Featuring Lil Durk
97,Over,Lucky Daye
98,Shake It,"Kay Flock, Cardi B, Dougie B & Bory300"


## Web Scrapping with Spotipy API

In order to get all the audio feautures from all the songs we have, we are going to use this API. Our final goal will be to get 3 DataFrames containing the name of the songs, the artists and their respective audio features obtained from Spotify:
- **100 Hot Songs DataFrame:** DataFrame cointaining the hottest 100 current mainstream hits
- **Not Hot Songs DataFrame:** DataFrame containing an extensive list of songs from the past
- **All Songs DataFrame:** DataFrame containing the content of the previous 2 ones

### Reading the files to get the DataFrames:

In [10]:
hotsongs = pd.read_csv('/Users/Hector_Martin/Documents/Labs/music_recommender_project/data/hot100.csv')
nothotsongs = pd.read_csv('/Users/Hector_Martin/Documents/Labs/music_recommender_project/data/nothotsongs.csv')

### Getting the list of hot songs:

In [11]:
hot_songs_list = [song for song in hotsongs['songs']]

### Let's do the same with the Not Hot Songs:

In [12]:
not_hot_songs_list = [song for song in nothotsongs['songs']]

### Getting the audio features based on the lists of songs

In [None]:
hot_songs_af = functions.get_audio_features(hot_songs_list, hotsongs)
not_hot_songs_af = functions.get_audio_features(not_hot_songs_list)

Looking for song:  0
Looking for song:  1
Looking for song:  2
Looking for song:  3
Looking for song:  4
Looking for song:  5
Looking for song:  6
Looking for song:  7
Looking for song:  8
Looking for song:  9
Looking for song:  10
Looking for song:  11
Looking for song:  12
Looking for song:  13
Looking for song:  14
Looking for song:  15
Looking for song:  16
Looking for song:  17
Looking for song:  18
Looking for song:  19
Looking for song:  20
Looking for song:  21
Looking for song:  22
Looking for song:  23
Looking for song:  24
Looking for song:  25
Looking for song:  26
Looking for song:  27
Looking for song:  28
Looking for song:  29
Looking for song:  30
Looking for song:  31
Looking for song:  32
Looking for song:  33
Looking for song:  34
Looking for song:  35
Looking for song:  36
Looking for song:  37
Looking for song:  38
Looking for song:  39
Looking for song:  40
Looking for song:  41
Looking for song:  42
Looking for song:  43
Looking for song:  44
Looking for song:  4

### Function to concatenate the Audio features to the song DataFrames

In [None]:
hotconcat_df = functions.add_audio_features(hotsongs, hot_songs_af, 'hot_songs')
nothotconcat_df = functions.add_audio_features(nothotsongs, hot_songs_af, 'not_hot_songs')

### Function to concatenate the Hot Songs DataFrame with the Not Hot Songs one

In [None]:
allsongs_df = functions.concatallsongs(hotconcat_df,nothotconcat_df,'allsongs_df')

## Clustering the songs from dataframes

### Importing the libraries

In [None]:
all_songs = pd.read_csv('/Users/Hector_Martin/Documents/Labs/music_recommender_project/data/allsongs_df.csv')

### Removing all the unnecessary audio features:
These are not actually audio features, just links and other kind of information that has nothing to do with audio qualities.

In [None]:
all_songs_clean = all_songs.drop(['analysis_url', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature', 'type'],axis =1)

### Store this cleaned Dataframe in a csv file:

In [None]:
all_songs_clean.to_csv("data/all_songs_clean.csv", index=False)

### Numerical and Categorical split:
- X_num will be for Numerical columns
- X_cat will be for Categorical ones

In [None]:
X_num = all_songs_clean.drop(['songs', 'artists'], axis =1)

In [None]:
X_cat = all_songs[['songs', 'artists']]

### Scaling the features

In [None]:
scaler = pickle.load(open("encoders/onehotencoder.pkl","rb"))
X_scaled = scaler.transform(X_num)
X_scaled_df = pd.DataFrame(X_scaled, columns = X_num.columns)
print('Data before the transformation')
print('------------------------------')
display(X_num.head())
print()
print('Data after the transformation')
print('------------------------------')
display(X_scaled_df.head())

### Training Models with different K values to assess which offers the best performance:

In [None]:
functions.k_means_trainer(X_scaled_df)