<h1>Intro</h1>
On June 17, 1983 The Police released "Synchronicity," an album which twice interrupted Michael Jackson's "Thriller" at the top of the Billboard charts. Sychronicity was my favorite album when it was released, and remains one of my favorites today.
<p>
This project looks at the staying power of this 40-year-old record. Minimally, I'll look at record sales. But I hope to uncover insight of deeper engagement which might predict longer-term durability: how often are lyrics discussed, how many musicians trying to learn these songs?

<h1>Questions</h1>
<ol>
<li>Which tracks have shown users discussing lyrics? How has this changed over time?
<li>Which songs have been covered? How has this changed over time?
    <li>How have record sales/followers for Synchronicity compared with more recent albums by former members of The Police?
</ol>

<h1>Lyric Comments- Web Scraping Second Hand Songs</h1>
This could've been done by API, but I'm web scraping to build my skills

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [158]:
#grab the page for Synchronicity by the Police
url = "https://secondhandsongs.com/release/413"
http_response = requests.get(url)
html = http_response.text
soup_songlist = BeautifulSoup(html)

<h3>select the list of songs, then create a list of corresponding URLs</h3>

In [212]:
list_song_urls = []
#loop all html links in the striped table, and grab urls
for html_link in soup_songlist.select(".field-title a"):
    list_song_urls.append("https://secondhandsongs.com"+str(html_link["href"]))

#remove the header row
list_song_urls.pop(0)

#remove any submissions that aren't yet verified
for song_url in list_song_urls:
    if song_url.find("submission") != -1:
        list_song_urls.remove(song_url)

list_song_urls

['https://secondhandsongs.com/performance/656',
 'https://secondhandsongs.com/performance/23164',
 'https://secondhandsongs.com/performance/326045',
 'https://secondhandsongs.com/performance/35963',
 'https://secondhandsongs.com/performance/71166',
 'https://secondhandsongs.com/performance/2204',
 'https://secondhandsongs.com/performance/52439',
 'https://secondhandsongs.com/performance/173743',
 'https://secondhandsongs.com/performance/11041']

<h1>For each song, count the number of covers per year</h1>
would also like to look at song name, written by, language<br>
but for now, to keep it simple

In [211]:
#create an empty data frame
df_covers = pd.DataFrame ({'song':[],'year':[]})
df_covers

Unnamed: 0,song,year


In [213]:
#put it all together
for song_url in list_song_urls:
    http_response = requests.get(song_url)
    html = http_response.text
    soup_covers = BeautifulSoup(html)
    song_title = soup_covers.select('.entity-title a')[0].text
    df_next_song = pd.DataFrame([{
        "song": song_title,
        "year": next_date.text[-4:] if next_date.text != "Release date " else ""
    } for next_date in soup_covers.select(".field-date")])
    df_covers=pd.concat([df_covers,df_next_song])

<h3>clean up data</h3>

In [214]:
#remove blank years
df_covers=df_covers.drop(df_covers.query("year==''").index)
df_covers

Unnamed: 0,song,year
1,Every Breath You Take,1983
2,Every Breath You Take,1983
3,Every Breath You Take,1983
4,Every Breath You Take,1983
5,Every Breath You Take,1983
...,...,...
54,Wrapped Around Your Finger,2014
55,Wrapped Around Your Finger,2016
56,Wrapped Around Your Finger,2020
57,Wrapped Around Your Finger,2022


In [215]:
df_covers_count = df_covers.groupby("song").value_counts().to_frame().reset_index()
df_covers_count

Unnamed: 0,song,year,count
0,Every Breath You Take,2008,17
1,Every Breath You Take,2017,16
2,Every Breath You Take,2014,15
3,Every Breath You Take,2015,15
4,Every Breath You Take,2004,14
...,...,...,...
150,Wrapped Around Your Finger,1996,1
151,Wrapped Around Your Finger,1983,1
152,Wrapped Around Your Finger,1994,1
153,Wrapped Around Your Finger,2014,1


In [224]:
df_covers_by_year = df_covers_count.pivot_table(index="song",columns="year",values="count")
df_covers_by_year

year,1983,1984,1985,1986,1987,1989,1990,1991,1993,1994,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
song,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Every Breath You Take,5.0,3.0,4.0,3.0,,,,4.0,5.0,2.0,...,9.0,15.0,15.0,11.0,16.0,11.0,7.0,10.0,5.0,2.0
King of Pain,1.0,1.0,,,,1.0,1.0,,1.0,1.0,...,2.0,2.0,1.0,2.0,2.0,4.0,,,,1.0
Mother,1.0,,,,,,,,,,...,,,,1.0,,,,,,
Murder by Numbers,1.0,,,,,1.0,,,,,...,3.0,1.0,,,,,1.0,,,
Synchronicity I,1.0,,,,,,,,,,...,,,,1.0,,2.0,,,,
Synchronicity II,1.0,,,,,,,,,1.0,...,,,,1.0,,2.0,,,,
Tea in the Sahara,1.0,,,1.0,,,,,,,...,1.0,,,2.0,1.0,1.0,,,,
Walking in Your Footsteps,1.0,,,,,,1.0,,,,...,,,,1.0,,,,1.0,,2.0
Wrapped Around Your Finger,1.0,,,,1.0,,,1.0,,1.0,...,3.0,1.0,2.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0


<h3>data cleaning</h3>

In [227]:
#subtract 1 from 1983 because this counts the original release
df_covers_by_year["1983"] = df_covers_by_year["1983"] -1

<h3>export</h3>

In [229]:
df_covers_by_year.to_csv('police_covers_by_year.csv')

<h3>data review</h3>