# Data Collection

Import required packages and python scripts with my custom data collecting functions, load and set up the environment 

In [None]:
import os 
import json
import requests
import pandas as pd

from dotenv import load_dotenv
from tqdm.notebook import tqdm  
from IPython.display import Image

import spotify_auth
import data_collection

tqdm.pandas()
load_dotenv()

## 1. Get Spotify Token to Access the API

This token is valid for one hour and will be used to collect data from the API.

In [2]:
token = spotify_auth.get_token()

## 2. Load Data on Unemployment Rate of the UK 

- collected from [The Office for National Statistics](https://www.ons.gov.uk/employmentandlabourmarket/peoplenotinwork/unemployment/timeseries/mgsx/lms)

save to raw data as a JSON

In [4]:
unemployment_rate = pd.read_csv("../data/raw/unemployment_statistics.csv")
unemployment_rate.to_json("../data/raw/unemployment.json", orient="records")

## 3. Load Data with Top Charting Songs from 2000-2023 in the UK

I created a spreadsheet (manually via copy and paste) with artist, song title, and year of popularity. Uploaded to the data folder as a csv.

- 2011-2023: collected from [Official Charts top 40 biggest songs of (year) in the UK](https://www.officialcharts.com/chart-news/the-official-top-40-biggest-songs-of-2022__38203/)
- 2000-2010: collected from [(year) in British music charts](https://en.wikipedia.org/wiki/2010_in_British_music_charts)

In [4]:
top_charts = pd.read_csv("../data/raw/top_charts.csv")

## 4. Get Song ID, Release Date, and Genres by Searching the Spotify API

Creates **song_data**, a dictionary, with the song/artist/tending year as the unique key, and a nested dictionary with the title, artist, year it was in the top 40 charts, the spotify track id, and the release date.

In [5]:
song_data = {}

for _, row in top_charts.iterrows():
    song = row['song']
    artist = row['artist']
    year = row['year']
    unique_key = f"{song} by {artist} in {year}"


    track_id, release_date, artist_id = data_collection.get_spotify_id(song, artist, token)

    genres = data_collection.get_artist_genres(artist_id, token) if artist_id else []

    song_data[unique_key] = {
        "song": song,
        "artist": artist,
        "track_id": track_id,
        "artist_id": artist_id,
        "trending_year": year,
        "release_date": release_date,
        "genres": genres
    }

## 5. Save Data to a JSON
I uploaded this data to the raw folder within my data folder titled charts_song_data.json

In [6]:
with open("../data/raw/charts_song_data.json", 'w') as json_file:
    json.dump(song_data, json_file)