# Analysis of Musical and Lyrical Trends at Intelligent Interactive Systems (MIIS)

#### Oktay Ozan Güner   -  ID : OZAN_ID
#### Juan Miguel Alfonso Habana   -  ID : MIGUEL_ID

# Introduction

Since 1990, the way people interact with music has evolved significantly. This period marks a transition from the tangible, physical media of CDs and vinyl to the intangible, yet infinitely accessible world of digital music. 
* How digitalization and streaming have influenced listeners? 
* How the listening habits have changed over time?

We'll examine how our music listening habits have been affected from duration of the songs to the way sentiment of the lyrics.


In [1]:
## DOWNLOADING PACKAGES
#!pip install bs4
#!pip install datetime

In [2]:
# Importing the packages that we need
from datetime import datetime, timedelta
import random
import time
import pandas as pd
from warnings import filterwarnings

filterwarnings("ignore")

In [5]:
def find_dates(start_date, end_date):
  from datetime import datetime, timedelta
  """
  Collecting the weeks between 1960 up to now.
  """
  # Start date
  start_date = start_date - timedelta(days=7)
  start_weekday = start_date.weekday()

  # Find the first day after the start date
  days_until_sunday = (start_weekday - start_date.weekday() + 7) % 7
  first_date = start_date + timedelta(days=days_until_sunday)

  dates = []
  # Loop from the first date until the end date
  current_date = first_date
  while current_date <= end_date:

      current_date += timedelta(days=7)
      current_date_str = current_date.strftime("%Y-%m-%d")
      dates.append(current_date_str)

  return dates



In [6]:
# At the beginning, start date is determined as 1960, after computing limitation, we had to filter since 1990. 
start_date = datetime(1960, 1, 1)
end_date = datetime.now()
all_dates_str = find_dates(start_date, end_date)  # Getting all dates since start date until end date.

In [19]:
# The URL that we scraped
base_url = 'https://www.billboard.com/charts/hot-100/'

In [None]:
# We examined which elements of the url are usable to scrape hot 100 lists. After finding appropriate elements, then we were able to scrape.

import requests
from bs4 import BeautifulSoup

start_time = time.time()  # Measuring processing time

df = pd.DataFrame()
for d in all_dates_str:
  print(d)
  rand_int = random.randint(1, 5)
  time.sleep(rand_int)
  url = f'{base_url}{d}/'

  response = requests.get(url)

  if response.status_code == 200:
      soup = BeautifulSoup(response.text, 'html.parser')
      pos_divs = soup.find_all("div", {"class":"o-chart-results-list-row-container"})
      artist_divs = soup.find_all("ul", {"class":"lrv-a-unstyle-list lrv-u-flex lrv-u-height-100p lrv-u-flex-direction-column@mobile-max"})
      date_list =[]
      pos_list = []
      for div in pos_divs:
        date_list.append(d)
        current_week = div.find('span').get_text().strip()
        pos_list.append(current_week)

      artist_list=[]
      song_list=[]
      for div in artist_divs:
        artist_name = div.find('span').get_text().strip()
        song_name = div.find('h3').get_text().strip()
        song_list.append(song_name)
        artist_list.append(artist_name)
  else:
    print(f"{response.status_code}")

  zipped_list = list(zip(date_list, pos_list, artist_list, song_list))

  temp_df = pd.DataFrame(zipped_list, columns = ["Week", "Position", "Artist_Name", "Song"])
  df = df.append(temp_df, ignore_index=True)

end_time = time.time()

processing_time = end_time - start_time
print(f"Processing time: {processing_time} seconds")

In [41]:
# Saving the hot 100 charts into a csv file
path_name = "Billboard_Lists_1960-01-01_2024-02-23.csv"
#df.to_csv(path_name)