## Importing libraries

In [1]:
import os
import sys
import json
import spotipy
import requests
import pandas as pd
from math import ceil
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

### first thing's first: 
#### 1. you need to created an app on spotify developer, that is where you will get your client Id and secret.
#### 2. the app is also where you will enter your redirect uri.
    just enter a localhost for your base url, and for your route use a 8080 port (or a port that you are nor currently using could be 5000, 8888, ...)
#### 3. best thing to do is to create envirement variables with those informations (client_id. client_secret and the redirect_uri)
    I also created an envirement variable with my spotify username uri
    To find your spotify username uri, go to your spotify profile> under share profile > choose spotify uri > then take away the spotify:user: at the begibing of the giberish code. 
    That giberish code is your spotify username uri, create an envirement with that**
#### 4. using spotipy library we can request our credentials 
    to request a token we will need:
       1. a scope 
       2. the env variables we have created

#### 5. using the spotipy library we can request a token, and authenticate it as well, by calling our function.
      Make sure to save the token as a global variable so you can use it outside the function

In [3]:
def get_token(scope=None):
    
    redirect_uri = os.environ['SPOTIPY_REDIRECT_URI']
    username = os.environ['SPOTIPY_USERNAME']
    client_id = os.environ['SPOTIPY_CLIENT_ID']
    client_secret = os.environ['SPOTIPY_CLIENT_SECRET']
    token = util.prompt_for_user_token(username, scope, client_id, client_secret, redirect_uri)
    return token

token = get_token(scope='user-library-read')
sp = spotipy.Spotify(auth=token)

#### 6. now that we have our token,  authentication, and scope, we can access our liked song library
*although you could use spotipy to access the liked library, I found using request and json to be easier.*

In [4]:
url = "https://api.spotify.com/v1/me/tracks"
headers = {'Authorization': "Bearer {}".format(token)}
r = requests.get(url, headers=headers)
parsed = json.loads(r.text)

count_songs = parsed["total"]
print ("Total number of songs: {}".format(count_songs))

Total number of songs: 1973


#### 7. now it's one thing to access the library, and another to scrape, using spotify API you can scrape 50 songs at a time, and you could reset the offset, to grab unlimited amount of songs
*using a ceil variable, you can write a loop that will reset the offset automatically.* 
* once again i used reguest and json to scrape my liked library.

In [None]:
all_songs = []
for i in range(int(ceil(count_songs/50.0))):
    offset = 50*i
    url = "https://api.spotify.com/v1/me/tracks?limit=50&offset={}".format(offset)
    headers = {'Authorization': "Bearer {}".format(token)}
    r = requests.get(url, headers=headers)
    parsed = json.loads(r.text)

    all_songs.extend(parsed["items"])
print ("Number of gathered songs: {}".format(len(all_songs)))

In [None]:
all_songs[:1]

#### 8. the information I grabbed is a list of lists of dicts. I only need the 'track' information, so I will isolate that

In [None]:
liked_songs= []
for song in all_songs:
    song_id = song['track']
    liked_songs.append(song_id)
print ("Number of song_ids: {}".format(len(liked_songs)))

In [None]:
liked_songs

#### 9. now I will be creating a list of the song id's so I can later use it as my dataframe's index

In [None]:
song_ids= []
for song in liked_songs:
    song_id = song['id']
    song_ids.append(song_id)
print ("Number of song_ids: {}".format(len(song_ids)))
song_ids[:1]

#### 10. while i'm at it i will create a dataframe, by putting then names of the songs, and the  the id's

In [21]:
df_song_info = pd.DataFrame(liked_songs, columns=['name'], index=song_ids)
df_song_info.head()

Unnamed: 0,name
75nN4kH1uzSsUdMrdUVfrq,Siren 042
71GBQ7iVnffAGkNuTDxCoH,Make Me A Song
4VqPOruhp5EdPBeR92t6lQ,Uprising
2takcwOaAZWiXQijPHIx7B,Time Is Running Out
7ouMYWpwJ422jRcDASZB7P,Knights of Cydonia


In [20]:
df_song_info.shape

(1973, 1)

     at the time I scraped my liked songs I had 1973 songs!

#### 11. Once I had my data frame I decided that I need the name of the artists as well, so I created a list, I will add them to my dataframe next


In [None]:
songs_artits=[]
for song in liked_songs:
    artist= song['artists'][0]['name']
    
    songs_artits.append(artist)
print ("Number of song_ids: {}".format(len(songs_artits))) 
songs_artits[:1]

***here is a dataframe of the artists, there is a much easier way of creating a dataframe, but I had decided later that i need the artist name as well, so here is this dataframe now!*** 

In [None]:
df_song = pd.DataFrame(songs_artits, columns=['artists'], index=song_ids)
df_song.head()

In [23]:
df_song.head()

Unnamed: 0,artists
75nN4kH1uzSsUdMrdUVfrq,Lala Lala
71GBQ7iVnffAGkNuTDxCoH,Eleanor Friedberger
4VqPOruhp5EdPBeR92t6lQ,Muse
2takcwOaAZWiXQijPHIx7B,Muse
7ouMYWpwJ422jRcDASZB7P,Muse


#### 12. once again we need to using sotipy library, we will grab track features on the songs already have all we need are the song Id to lookd for them. 
***the great thing about this library and api is that it auto sleeps, so you will not overwhelm the server!***

In [None]:
feature_songs=[]
for song_id in song_ids: 
    try: 
        analysis = sp.audio_features(song_id)
        feature_songs.append(analysis)
    except: 
        print(song_id)

#### 13. so the features come in a list of lists of dics (once again) but this time I was smart about it, I wrote a loop to get the key and value pairs of each dicts,  
***keys are going to serve as the columns later, and the values are the data I am collecting on the songs***

In [None]:
both =[]
lis_of_features=[]
list_of_info =[]
for lists in feature_songs:
    for dict_list in lists:
        both.append(dict_list.items())
        lis_of_features.append(dict_list.keys())
        list_of_info.append(dict_list.values())

#### 14 well, here are the name of the columns (aka the dict keys):

In [15]:
cols = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature']

#### 15. now we are going to put the key-value pairs together to create a dataframe, and remmeber the list of song Id's we had made earlier, well, it's our index now!

In [27]:
features = pd.DataFrame(list_of_info, columns=cols,index=song_ids)

In [28]:
features.shape

(1973, 18)

#### 16. lastly concadinate the dataframes ***artist name and Id** , ***artist and id*** , and ***features*** 
***I will save this datafraem as an output because I DO NOT WANT TO HAVE TO DO THIS ALL OVER AGAIN!***

In [29]:
df = pd.concat([df_song_info, df_song, features], axis=1)
df.head()

Unnamed: 0,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
75nN4kH1uzSsUdMrdUVfrq,Siren 042,Lala Lala,0.601,0.596,11,-7.373,1,0.0251,0.694,0.00579,0.107,0.457,108.0,audio_features,75nN4kH1uzSsUdMrdUVfrq,spotify:track:75nN4kH1uzSsUdMrdUVfrq,https://api.spotify.com/v1/tracks/75nN4kH1uzSs...,https://api.spotify.com/v1/audio-analysis/75nN...,161043,4
71GBQ7iVnffAGkNuTDxCoH,Make Me A Song,Eleanor Friedberger,0.727,0.575,2,-8.295,1,0.0292,0.0548,0.0131,0.123,0.622,129.993,audio_features,71GBQ7iVnffAGkNuTDxCoH,spotify:track:71GBQ7iVnffAGkNuTDxCoH,https://api.spotify.com/v1/tracks/71GBQ7iVnffA...,https://api.spotify.com/v1/audio-analysis/71GB...,332400,4
4VqPOruhp5EdPBeR92t6lQ,Uprising,Muse,0.602,0.905,2,-4.046,1,0.0775,0.000202,0.064,0.117,0.411,128.019,audio_features,4VqPOruhp5EdPBeR92t6lQ,spotify:track:4VqPOruhp5EdPBeR92t6lQ,https://api.spotify.com/v1/tracks/4VqPOruhp5Ed...,https://api.spotify.com/v1/audio-analysis/4VqP...,304840,4
2takcwOaAZWiXQijPHIx7B,Time Is Running Out,Muse,0.585,0.842,9,-5.883,0,0.0556,0.00242,0.00686,0.0866,0.428,118.211,audio_features,2takcwOaAZWiXQijPHIx7B,spotify:track:2takcwOaAZWiXQijPHIx7B,https://api.spotify.com/v1/tracks/2takcwOaAZWi...,https://api.spotify.com/v1/audio-analysis/2tak...,237040,4
7ouMYWpwJ422jRcDASZB7P,Knights of Cydonia,Muse,0.366,0.963,11,-5.301,0,0.142,0.000273,0.0122,0.115,0.211,137.114,audio_features,7ouMYWpwJ422jRcDASZB7P,spotify:track:7ouMYWpwJ422jRcDASZB7P,https://api.spotify.com/v1/tracks/7ouMYWpwJ422...,https://api.spotify.com/v1/audio-analysis/7ouM...,366213,4


In [30]:
df.to_csv('./Data/liked_songs.csv',index=False)

## next we will move to EDA