# <b>Section 1: Data Crawling</b>

### <b><u>Step 1</u>: Import library</b>

These are the main libraries used for data crawling:
- `spotipy`: Spotipy is a lightweight Python library for the Spotify Web API. With Spotipy you get full access to all of the music data provided by the Spotify platform.
- `dotenv`: used to get spotipy's client_id, secret and redirect uri in .env file
- `os`: used to get accessed to .env files in system

In [None]:
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
from dotenv import load_dotenv
import spotipy
import os

### <b><u>Step 2</u>: Request access to Spotify API by using OAuth method

Firstly, we will need to load the .env file to get the spotipy's client_id, secret and redirect uri in order to get accessed to Spotify's API service using OAuth method.

After that, we will initialize a `SpotifyClientCredentials` object and pass in as a parameter along with spotipy's client_id, secret and redirect uri to get permission to API service.

In [None]:
load_dotenv()

client_id = os.getenv('SPOTIPY_CLIENT_ID')
client_secret = os.getenv('SPOTIPY_CLIENT_SECRET')
redirect_uri = os.getenv('SPOTIPY_REDIRECT_URI')

client_credentials_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri),
                    client_credentials_manager=SpotifyClientCredentials())

### <b><u>Step 3</u>: Crawl top 3000 songs from Spotify from 2020-2022

day la markdown

In [None]:
result = sp.search(q='year:2022', limit=50)

day la markdown

In [None]:
songs_data = result['tracks']['items']

for _ in range(19):
    result = sp.next(result['tracks'])
    songs_data.extend(result['tracks']['items'])

day la markdown

In [None]:
for i in range(20):
    result = sp.search(q='year:2021', limit=50, offset=i*50)
    songs_data.extend(result['tracks']['items'])

day la markdown

In [None]:
for i in range(20):
    result = sp.search(q='year:2020', limit=50, offset=i*50)
    songs_data.extend(result['tracks']['items'])

### <b><u>Step 4</u>: Getting artists' ID from the list of tracks</b>

In [None]:
artists_id = [[tracks[i]['artists'][j]['id'] for j in range(len(tracks[i]['artists']))] for i in range(len(songs_data))]

day la markdown

In [None]:
# artists = []
# for i in range(len(artists_id)):
#     artist_info = spotify.artist(artist_id=artists_id[i])
#     cols = ['id', 'name', 'genres', 'followers', 'popularity']
#     artist = dict.fromkeys(cols)
#     for key in artist:
#         if key != 'followers' and key != 'genres':
#             artist[key] = artist_info[key]
#         elif key != 'genres':
#             artist[key] = artist_info[key]['total']
#         else:
#             artist[key] = ', '.join(artist_info[key])
#     artists.append(artist)