<div class="alert alert-block alert-warning">

### Is country hot in 2025? 🤠🌵🏜🐴

**Author:** Ellie Taagen   

**Date:** March 2025

**Read Me:** for this notebook's features to render please open in [nbviewer](https://nbviewer.org/github.com/etaagen/python-projects/blob/main/healthSurveyProject.ipynb), instead of GitHub.  

### 📓 Table of contents <a class='anchor' id='top'></a>
- [Quickstart](#quickstart)
- [Data cleaning](#data-cleaning)
- [Summary](#summary)
    
</div>

### Quickstart <a class="anchor" id="quickstart"></a>
We are going to load the Billboard Hot 100 year-end charts for the past decade and their genres.

In [31]:
###### Import libraries ######
import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
import billboard
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import re
import requests
import json

In [32]:
###### Import Billboard Hot 100 year-end charts from 2014- 2024 ######

# Create an empty list to store all years
all_years_data = []

for my_years in range(2014, 2025):
    
    chart = billboard.ChartData('hot-100-songs', year=my_years)

    # Create an empty list to store song data
    chart_data = []

    # Loop through the chart entries and collect data
    for entry in range(0, len(chart.entries)): 
        song = chart.entries[entry]
        chart_data.append({
            "year": chart.year,
            "rank": song.rank,
            "title": song.title,
            "artist": song.artist
        })
    chart_df = pd.DataFrame(chart_data)

    all_years_data.append(chart_df)

# Concatenate all yearly DataFrames into a single DataFrame
all_years_data_df = pd.concat(all_years_data, ignore_index=True)



In [33]:
## get a list of artists in the Billboard Hot 100
hot_100_artists = all_years_data_df['artist'].unique()

# will search for a single artist, first listed (the & symbol may mess up some of these when that is true artist name)
# Combine all patterns into one using the | operator
pattern = r" Featuring| &|,| with| With| X| x| /| \("

# Apply re.split() with the combined pattern to each artist
hot_100_artists_clean = [re.split(pattern, artist)[0] for artist in hot_100_artists]

# remove duplicates
hot_100_artists_clean = pd.DataFrame(hot_100_artists_clean)
hot_100_artists_clean = hot_100_artists_clean[0].unique()

In [94]:
##### connect to Spotify API to get genres #####
# may need to refresh this hourly 
auth_manager = SpotifyClientCredentials(client_id='61c373aac8a44012acf0cf2d9e30c751', client_secret='a960c191c96c42dfb0d0ff14ee58d5f3')
sp = spotipy.Spotify(auth_manager=auth_manager)

In [96]:
# get the artist info from Spotify 
# Create an empty list to store all years
all_artist_data = []

for my_artist in range(0, len(hot_100_artists_clean)):
    artist = hot_100_artists_clean[my_artist]

    searchResults = sp.search(q="artist:" + artist, type="artist", market="US", limit=1)

    # Access the first artist's details
    artists_data = searchResults.get('artists', {})
    artist_items = artists_data.get('items', [])

    # create empty list 
    artist_data = []
    if artist_items:
        first_artist = artist_items[0]
        artist_data.append({
                "Name": first_artist.get('name'),
                "ID": first_artist.get('id'),
                "Popularity": first_artist.get('popularity'),
                "Genres": first_artist.get('genres'),
                "Followers": first_artist.get('followers', {}).get('total')
            })
        artist_df = pd.DataFrame(artist_data)

        all_artist_data.append(artist_df)
    else:
        artist_data.append({
            "Name": np.nan,
            "ID": np.nan,
            "Popularity": np.nan,
            "Genres": np.nan,
            "Followers": np.nan
        })
        artist_df = pd.DataFrame(artist_data)

        all_artist_data.append(artist_df)

# Concatenate all DataFrames into a single DataFrame
all_artist_data_df = pd.concat(all_artist_data, ignore_index=True)

all_artist_data_df["Name"].isna().sum()
all_artist_data_df.dropna(inplace=True)

Ok so notable a lot of the `genre` data is not populating, even for artists like Rihana. This is confirmed trend for the API [here](https://community.spotify.com/t5/Spotify-for-Developers/Get-Artist-API-is-not-returning-any-or-all-Genres/td-p/6880841). Couple of things to try: 
- use the Spotify API to search by song, but according to the API docs this won't return genre
- use a different resource like Last FM API 

In [None]:
#### Try Last FM API ####
#Application name	HotCountry
#Registered to etaagen
API_KEY = "661ec4fffb99e8a34958b582e69bda12"
ARTIST_NAME = "Pharrell Williams"

# Construct the API URL
url = f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={ARTIST_NAME}&api_key={API_KEY}&format=json"

# Make the request
response = requests.get(url)
last_fm_data = json.loads(response.text)
test = list(last_fm_data.values())
pd.DataFrame(test)

In [34]:
#### Try Last FM API ####
#Application name	HotCountry
#Registered to etaagen
API_KEY = "661ec4fffb99e8a34958b582e69bda12"
#ARTIST_NAME = "Pharrell Williams"

last_fm_all_artist_data = []

for my_artist in range(0, len(hot_100_artists_clean)):
   
    #last_fm_artist_data = []

    ARTIST_NAME = hot_100_artists_clean[my_artist]

    # Construct the API URL
    url = f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={ARTIST_NAME}&api_key={API_KEY}&format=json"

    # Make the request
    response = requests.get(url)
    last_fm_data = json.loads(response.text)

    # Extract relevant data from the response
    if 'artist' in last_fm_data:  # Ensure the 'artist' key exists
        artist_info = last_fm_data['artist']
        artist_data = {
            "name": artist_info.get("name"),
            "listeners": artist_info.get("stats", {}).get("listeners"),
            "playcount": artist_info.get("stats", {}).get("playcount"),
            "url": artist_info.get("url"),
            "tags": [tag['name'] for tag in artist_info.get("tags", {}).get("tag", [])] if "tags" in artist_info else None
        }
        # Append the artist data as a dictionary to the list
        last_fm_all_artist_data.append(artist_data)

    # Convert the list of dictionaries into a single DataFrame
    last_fm_all_artist_data_df = pd.DataFrame(last_fm_all_artist_data)

    

In [36]:
#### write to csv to avoid always reloading ####
last_fm_all_artist_data_df.to_csv('./data/last_fm_artist_data.csv', index=False)
all_years_data_df.to_csv('./data/hot_100_rank.csv', index=False)

In [37]:
test = pd.read_csv('./data/last_fm_artist_data.csv')

Need some sort of predictive ability for 2025
Some other variables that could predict this besides top 100 could be an economic value and political and religous lean 



<div class="alert alert-block alert-warning">

### Summary <a class="anchor" id="summary"></a>

In this notebook we practiced:  
  

Some additional exploration and coding could include:  



### That's all for now!  
<img src="snoopy.jpg" width="100" height="100" style="vertical-align:bottom">

<div>
