## Most Synced Songs in Film & TV
Who doesn't love a good soundtrack? Some of our favorite songs are discovered by watching them synced to picture. This dashboard will explore the top songs from film and TV shows according to What-Song.com. This notebook will scrape the info needed from the website.

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
from bs4 import BeautifulSoup
import time
import regex as re

The URL below contains about 20 pages of more links that each go to artist page.

In [2]:
url = 'https://www.what-song.com/sitemap/artist-index.xml'

res = requests.get(url)

In [3]:
res.status_code

200

In [4]:
soup = BeautifulSoup(res.content, 'lxml')

In [5]:
directory = soup.find_all('loc')

Each page below contains a couple thousand individual artists to scrape:

In [6]:
directory = [x.text.split('sitemap/')[1] for x in directory]
directory[:5]

['artist-0.xml',
 'artist-1.xml',
 'artist-2.xml',
 'artist-3.xml',
 'artist-4.xml']

In [7]:
artists_to_loop = []

#For each artist directory page
for page in directory:
    
    #go to the site and get the content
    site = 'https://www.what-song.com/sitemap/'
    art = requests.get(site + page)
    soup = BeautifulSoup(art.content, 'lxml')
    
    #grab all the 'loc' tags, which will be individual artists
    people = soup.find_all('loc')
    for p in people:
         artists_to_loop.append(p.text.split('Artist/')[1])

We now have a list of all the artists on the site to go back and scrape their placements.

In [8]:
len(artists_to_loop)

43986

In [9]:
artists_to_loop[:5]

['1/Dusty-Springfield',
 '3/The-Kinks',
 '2/The-Fray',
 '4/Black-Summer-Crush',
 '5/Frank-Sinatra']

Now that we have the pages for each artist, we'll loop through to grab the songs.

In [None]:
#Empty list to hold each song
song_list = []
failed = []
#Count to keep track of how many gathered
count = 0

for a in artists_to_loop:
    #Setting this up as a try/except so the loop doesn't stop if one doesn't work
    try:
        artist_page = 'https://www.what-song.com/Artist/'
        artist = a.split('/')[1].replace('-', ' ')
        person = requests.get(artist_page + a)
        soup = BeautifulSoup(person.content, 'lxml')
        title = soup.find_all('div', {'class' : 'song__title'})
        info = soup.find_all('div', {'class' : 'song__info'})
        show = soup.find_all('div', {'class' : 'song__name-like'})
        for t in range(0, len(title)):
            sync_dict = {
                'artist' : artist,
                'song_title' : title[t].text,
                'use' : info[t].text,
                'show' :  show[t].text
            }
            song_list.append(sync_dict)
        count += 1
        if count % 1000 == 0:
            print(f'{count} artists gathered so far.')
    #Save the artist page to 'failed' if unable to retrieve the information
    except:
        failed.append(a)

1000 artists gathered so far.
2000 artists gathered so far.
3000 artists gathered so far.
4000 artists gathered so far.
5000 artists gathered so far.
6000 artists gathered so far.
7000 artists gathered so far.
8000 artists gathered so far.
9000 artists gathered so far.


Print the total gathered and turn our list into a DataFrame

In [None]:
print(count)
what_song = pd.DataFrame(song_list)

In [None]:
what_song.to_csv('./raw_songs.csv', index = False)

Checking out what was gathered:

In [14]:
what_song

Unnamed: 0,artist,song_title,use,show
0,Dusty Springfield,Girls It Ain't Easy,1:22A jackrabbit appears as Crystal gets up fr...,The Hunt12 Mar 20200
1,Dusty Springfield,Wishin' and Hopin',,Sex Education • S2E816 Jan 20200
2,Dusty Springfield,Spooky,0:23Maddie and Chimney run into Tara and Vince...,9-1-1 • S3E627 Oct 20190
3,Dusty Springfield,I Can't Make It Alone,,The Deuce • S3E429 Sep 20190
4,Dusty Springfield,No Easy Way Down,,The Deuce • S3E429 Sep 20190
...,...,...,...,...
212,Frank Sinatra,The Girl from Ipanema,Song plays at the reception and then again whe...,Friends • S2E2415 May 19960
213,Frank Sinatra,Strangers In the Night,Ross dances with Susan at the wedding.,Friends • S2E1117 Jan 19960
214,Frank Sinatra,Young at Heart,,Dream A Little Dream29 Nov 19880
215,Frank Sinatra,It Had To Be You,,When Harry Met Sally29 Nov 19881
