# <b>Section 1: Data Crawling</b>

### <b><u>Step 1</u>: Import library</b>

These are the main libraries used for data crawling:
- `spotipy`: Spotipy is a lightweight Python library for the Spotify Web API. With Spotipy you get full access to all of the music data provided by the Spotify platform.
- `dotenv`: used to get spotipy's client_id, secret and redirect uri in .env file
- `os`: used to get accessed to .env files in system

In [1]:
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
from dotenv import load_dotenv
import spotipy
import os

### <b><u>Step 2</u>: Request access to Spotify API by using OAuth method

Firstly, we will need to load the .env file to get the spotipy's client_id, secret and redirect uri in order to get accessed to Spotify's API service using OAuth method.

After that, we will initialize a `SpotifyClientCredentials` object and pass in as a parameter along with spotipy's client_id, secret and redirect uri to get permission to API service.

In [2]:
load_dotenv()

client_id = os.getenv('SPOTIPY_CLIENT_ID')
client_secret = os.getenv('SPOTIPY_CLIENT_SECRET')
redirect_uri = os.getenv('SPOTIPY_REDIRECT_URI')

client_credentials_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri),
                    client_credentials_manager=SpotifyClientCredentials())

### <b><u>Step 3</u>: Crawl top 3000 songs from Spotify from 2020-2022

day la markdown

In [3]:
result = sp.search(q='year:2022', limit=50)

day la markdown

In [4]:
songs_data = result['tracks']['items']

for _ in range(19):
    result = sp.next(result['tracks'])
    songs_data.extend(result['tracks']['items'])

day la markdown

In [5]:
for i in range(20):
    result = sp.search(q='year:2021', limit=50, offset=i*50)
    songs_data.extend(result['tracks']['items'])

day la markdown

In [6]:
for i in range(20):
    result = sp.search(q='year:2020', limit=50, offset=i*50)
    songs_data.extend(result['tracks']['items'])

day la markdown

In [7]:
for i in songs_data:
    print(i['name'])

Dreamers [Music from the FIFA World Cup Qatar 2022 Official Soundtrack]
Em Là
dự báo thời tiết hôm nay mưa
Chết Trong Em
Wild Flower (with youjeen)
Có Đâu Ai Ngờ
Tại Vì Sao
Chìm Sâu
ThichThich
Lâu Lâu Nhắc Lại
Shut Down
Waiting For You
The Astronaut
Ngày Đầu Tiên
Left and Right (Feat. Jung Kook of BTS)
Bên Trên Tầng Lầu
Lonely
Mặt Mộc
Still Life (with Anderson .Paak)
Có Em (feat. Low G)
Chạy Khỏi Thế Giới Này
Run BTS
Change pt.2
Một Ngàn Nỗi Đau
Yêu Người Có Ước Mơ
No.2 (with parkjiyoon)
vaicaunoicokhiennguoithaydoi
double take
Vì Anh Đâu Có Biết
Yet To Come
Closer (with Paul Blanco, Mahalia)
Butter
Hectic (with Colde)
Yun (with Erykah Badu)
Ngã Tư Không Đèn
All Day (with Tablo)
Forg_tful (with Kim Sawol)
willow
Christmas Tree Farm
willow
cardigan
Anh Nhớ Ra (feat. TRANG)
willow
willow
Christmas Tree Farm
Christmas Tree Farm
willow
Lover
Christmas Tree Farm
Christmas Tree Farm
Anti-Hero
có hẹn với thanh xuân
Anti-Hero
Anti-Hero
Anti-Hero
Anti-Hero
Anti-Hero
Anti-Hero
Anti-Hero
Anti-Her

### <b><u>Step 4</u>: Store songs data to 'songs_data.tsv' file

In [8]:
artists_uri = [[artist['uri'] for artist in track['artists']] for track in songs_data]
len(artists_uri)

3000

day la markdown

In [None]:
with open("../../data/songs_data.tsv", 'w') as f:
    f.write("id\tname\tartist\tgenres\tartist_followers\tartist_popularity\tmarkets\talbum\treleased_date\talbum_popularity\tduration\texplicit\tpopularity\n")
    for track, uri_row in zip(songs_data, artists_uri):
        # artists_uri = [artist['uri'] for artist in track['artists']]
        artists_data = [sp.artist(uri) for uri in uri_row]
        artists_info = {k: [] for k in artists_data[0].keys() if k in {'followers', 'genres', 'name', 'popularity'}}

        for i in artists_data:
            artists_info['name'].append(i['name'])
            artists_info['genres'].extend(i['genres'])
            artists_info['followers'].append(str(i['followers']['total']))
            artists_info['popularity'].append(str(i['popularity']))

        album_popularity = str(sp.album(track['album']['uri'])['popularity'])
        print(1)

        f.write(track['id']+'\t'+track['name']+'\t'+(','.join(artists_info['name']))+'\t'+(','.join(set(artists_info['genres']))) \
                +'\t'+(','.join(artists_info['followers']))+'\t'+(','.join(artists_info['popularity'])) \
                +'\t'+ str(len(track['available_markets'])) \
                +'\t'+track['album']['name']+'\t'+track['album']['release_date']+'\t'+album_popularity \
                +'\t'+str(track['duration_ms'])+'\t'+str(track['explicit'])+'\t'+str(track['popularity'])+'\n')


In [11]:
with open("../../data/songs_data.tsv", 'w') as f:
    f.write("id\tname\tartist\tgenres\tartist_followers\tartist_popularity\tmarkets\talbum\treleased_date\talbum_popularity\tduration\texplicit\tpopularity\n")
    for track in songs_data:
        artists_uri = [artist['uri'] for artist in track['artists']]
        artists_data = [sp.artist(uri) for uri in artists_uri]
        artists_info = {k: [] for k in artists_data[0].keys() if k in {'followers', 'genres', 'name', 'popularity'}}

        for i in artists_data:
            artists_info['name'].append(i['name'])
            artists_info['genres'].extend(i['genres'])
            artists_info['followers'].append(str(i['followers']['total']))
            artists_info['popularity'].append(str(i['popularity']))

        album_popularity = str(sp.album(track['album']['uri'])['popularity'])
        print(1)

        f.write(track['id']+'\t'+track['name']+'\t'+(','.join(artists_info['name']))+'\t'+(','.join(set(artists_info['genres']))) \
                +'\t'+(','.join(artists_info['followers']))+'\t'+(','.join(artists_info['popularity'])) \
                +'\t'+ str(len(track['available_markets'])) \
                +'\t'+track['album']['name']+'\t'+track['album']['release_date']+'\t'+album_popularity \
                +'\t'+str(track['duration_ms'])+'\t'+str(track['explicit'])+'\t'+str(track['popularity'])+'\n')
