# Web Scraping with Beautiful Soup - Lab

## Introduction

Now that you've read and seen some docmentation regarding the use of Beautiful Soup, its time to practice and put that to work! In this lab you'll formalize some of our example code into functions and scrape the lyrics from an artist of your choice.

## Objectives
You will be able to:
* Scrape Static webpages
* Select specific elements from the DOM

## Link Scraping

Write a function to collect the links to each of the song pages from a given artist page.

In [4]:
#Starter Code

from bs4 import BeautifulSoup
import requests


url = 'https://www.azlyrics.com/m/moontaxi.html' #Put the URL of your AZLyrics Artist Page here!

html_page = requests.get(url) #Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing

albums = soup.find_all("div", class_="album")

#The example from our lecture/reading
data = [] #Create a storage container
for album_n in range(len(albums)):
    #On the last album, we won't be able to look forward
    if album_n == len(albums)-1:
        cur_album = albums[album_n]
        album_songs = cur_album.findNextSiblings('a')
        for song in album_songs:
            page = song.get('href')
            title = song.text
            album = cur_album.text
            data.append((title, page, album))
    else:
        cur_album = albums[album_n]
        next_album = albums[album_n+1]
        saca = cur_album.findNextSiblings('a') #songs after current album
        sbna = next_album.findPreviousSiblings('a') #songs before next album
        album_songs = [song for song in saca if song in sbna] #album songs are those listed after the current album but before the next one!
        for song in album_songs:
            page = song.get('href')
            title = song.text
            album = cur_album.text
            data.append((title, page, album))
data[:2]

[('Gimme A Light',
  '../lyrics/moontaxi/gimmealight.html',
  'album: "Melodica" (2007)'),
 ('Maybe', '../lyrics/moontaxi/maybe.html', 'album: "Melodica" (2007)')]

In [5]:
album = albums[0]
album.findNextSiblings('a')

[<a href="../lyrics/moontaxi/gimmealight.html" target="_blank">Gimme A Light</a>,
 <a href="../lyrics/moontaxi/maybe.html" target="_blank">Maybe</a>,
 <a id="51773"></a>,
 <a href="../lyrics/moontaxi/mercury.html" target="_blank">Mercury</a>,
 <a href="../lyrics/moontaxi/alltherage.html" target="_blank">All The Rage</a>,
 <a href="../lyrics/moontaxi/letsgoback.html" target="_blank">Let's Go Back</a>,
 <a href="../lyrics/moontaxi/southerntrance.html" target="_blank">Southern Trance</a>,
 <a href="../lyrics/moontaxi/whiskeysunsets.html" target="_blank">Whiskey Sunsets</a>,
 <a href="../lyrics/moontaxi/squarecircles.html" target="_blank">Square Circles</a>,
 <a href="../lyrics/moontaxi/cabaret.html" target="_blank">Cabaret</a>,
 <a href="../lyrics/moontaxi/gunflower.html" target="_blank">Gunflower</a>,
 <a id="51774"></a>,
 <a href="../lyrics/moontaxi/runningwild.html" target="_blank">Running Wild</a>,
 <a href="../lyrics/moontaxi/morocco.html" target="_blank">Morocco</a>,
 <a href="../ly

## Text Scraping
Write a secondary function that scrapes the lyrics for each song page.

In [7]:
#Remember to open up the webpage in a browser and control-click/right-click and go to inspect!
from bs4 import BeautifulSoup
import requests

#Example page
url = 'https://www.azlyrics.com/lyrics/moontaxi/gimmealight.html'

html_page = requests.get(url)
soup = BeautifulSoup(html_page.content, 'html.parser')
soup.prettify()[:1000]




'<!DOCTYPE html>\n<html lang="en">\n <head>\n  <meta charset="utf-8"/>\n  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>\n  <meta content="width=device-width, initial-scale=1" name="viewport"/>\n  <meta content=\'Lyrics to "Gimme A Light" song by Moon Taxi: There was a book and they called it the Good Word When people still read books There was a code of a...\' name="description"/>\n  <meta content="Gimme A Light lyrics, Moon Taxi Gimme A Light lyrics, Moon Taxi lyrics" name="keywords"/>\n  <meta content="noarchive" name="robots"/>\n  <meta content="//www.azlyrics.com/az_logo_tr.png" property="og:image"/>\n  <title>\n   Moon Taxi - Gimme A Light Lyrics | AZLyrics.com\n  </title>\n  <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet"/>\n  <link href="//www.azlyrics.com/bsaz.css" rel="stylesheet"/>\n  <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->\n  <!--[if lt IE 9]>\r\n<script src="https://o

In [11]:
divs = soup.find_all("div")
divs

[<div id="fb-root"></div>, <div class="container">
 <!-- Brand and toggle get grouped for better mobile display -->
 <div class="navbar-header">
 <button class="navbar-toggle collapsed" data-target="#search-collapse" data-toggle="collapse" type="button">
 <span class="glyphicon glyphicon-search"></span>
 </button>
 <button class="navbar-toggle collapsed" data-target="#artists-collapse" data-toggle="collapse" type="button">
 <span class="glyphicon glyphicon-th-list"></span>
 </button>
 <a class="navbar-brand" href="//www.azlyrics.com"><img alt="AZLyrics.com" class="pull-left" src="//www.azlyrics.com/az_logo_tr.png" style="max-height:40px; margin-top:-10px;"/></a>
 </div>
 <ul class="collapse navbar-collapse nav navbar-nav" id="artists-collapse">
 <li>
 <div class="btn-group text-center" role="group">
 <a class="btn btn-menu" href="//www.azlyrics.com/a.html">A</a>
 <a class="btn btn-menu" href="//www.azlyrics.com/b.html">B</a>
 <a class="btn btn-menu" href="//www.azlyrics.com/c.html">C</

In [28]:
section = soup.find("div", {"class": "col-xs-12 col-lg-8 text-center"})
section.find_all("div")[6]

<div>
<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
There was a book and they called it the Good Word<br/>
When people still read books<br/>
There was a code of a certain moral nature<br/>
When people still believed in crooks<br/>
There was a time when the people they all got sick of<br/>
The ethical bullshit based on the lie of some guy<br/>
They set ablaze the house of god with a smile and a nod<br/>
<br/>
They said all I've got is a match so won't you gimme gimme gimme a light<br/>
They said all I've got is a match so won't you gimme gimme gimme a light<br/>
They said all I've got is a match so won't you gimme gimme gimme a light<br/>
It's how we're changing what we know of what is wrong and what is right<br/>
<br/>
There was a thing and they called it love<br/>
When people, when they still read books<br/>
There was a book and they called it the Good Word<br/>
When people still believed in crooks

In [98]:
def scrapeLyrics(song_url):
    html_page = requests.get(song_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    
    section = soup.find("div", {"class": "col-xs-12 col-lg-8 text-center"})
    lyrics = section.find_all("div")[6].text()
    return lyrics

In [99]:
scrapeLyrics("https://www.azlyrics.com/lyrics/moontaxi/morocco.html")

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

## Synthesizing
Create a script using your two functions above to scrape all of the song lyrics for a given artist.


In [42]:
from splinter import Browser

browser = Browser("chrome", headless = False)
url = "https://www.azlyrics.com/m/moontaxi.html"
browser.visit(url)

html = browser.html #Make a get request to retrieve the page
soup = BeautifulSoup(html, 'html.parser') #Pass the page contents to beautiful soup for parsing

albums = soup.find_all("div", class_="album")


In [90]:
songs = []
links = []
lyrics = []

for url in album.findNextSiblings('a'):
    try:
        my_link=("https://www.azlyrics.com"+url["href"][2:])
        songs.append(url.getText())
        links.append(my_link)
    except:
        pass

for link in links:
    lyrics.append(scrapeLyrics(link))
lyrics   

[<div>
 <!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
 There was a book and they called it the Good Word<br/>
 When people still read books<br/>
 There was a code of a certain moral nature<br/>
 When people still believed in crooks<br/>
 There was a time when the people they all got sick of<br/>
 The ethical bullshit based on the lie of some guy<br/>
 They set ablaze the house of god with a smile and a nod<br/>
 <br/>
 They said all I've got is a match so won't you gimme gimme gimme a light<br/>
 They said all I've got is a match so won't you gimme gimme gimme a light<br/>
 They said all I've got is a match so won't you gimme gimme gimme a light<br/>
 It's how we're changing what we know of what is wrong and what is right<br/>
 <br/>
 There was a thing and they called it love<br/>
 When people, when they still read books<br/>
 There was a book and they called it the Good Word<br/>
 When people still

In [93]:
import pandas as pd 

df = pd.DataFrame({"Song": songs, "Link": links, "Lyrics": lyrics})
df.head()

Unnamed: 0,Song,Link,Lyrics
0,Gimme A Light,https://www.azlyrics.com/lyrics/moontaxi/gimme...,<div> <!-- Usage of azlyrics.com content by an...
1,Maybe,https://www.azlyrics.com/lyrics/moontaxi/maybe...,<div> <!-- Usage of azlyrics.com content by an...
2,Mercury,https://www.azlyrics.com/lyrics/moontaxi/mercu...,<div> <!-- Usage of azlyrics.com content by an...
3,All The Rage,https://www.azlyrics.com/lyrics/moontaxi/allth...,<div> <!-- Usage of azlyrics.com content by an...
4,Let's Go Back,https://www.azlyrics.com/lyrics/moontaxi/letsg...,<div> <!-- Usage of azlyrics.com content by an...


## Visualizing
Generate two bar graphs to compare lyrical changes for the artist of your chose. For example, the two bar charts could compare the lyrics for two different songs or two different albums.

In [None]:
#Use this block for your code!

## Level - Up

Think about how you structured the data from your web scraper. Did you scrape the entire song lyrics verbatim? Did you simply store the words and their frequency counts, or did you do something else entirely? List out a few different options for how you could have stored this data. What are advantages and disadvantages of each? Be specific and think about what sort of analyses each representation would lend itself to.

In [None]:
#Use this block for your code!

## Summary

Congratulations! You've now practiced your Beautiful Soup knowledge!