# Web Scraping with Beautiful Soup - Lab

## Introduction

Now that you've read and seen some docmentation regarding the use of Beautiful Soup, its time to practice and put that to work! In this lab you'll formalize some of our example code into functions and scrape the lyrics from an artist of your choice.

## Objectives
You will be able to:
* Scrape Static webpages
* Select specific elements from the DOM

## Link Scraping

Write a function to collect the links to each of the song pages from a given artist page.

In [1]:
!pip install bs4 
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/1d/5d/3260694a59df0ec52f8b4883f5d23b130bc237602a1411fa670eae12351e/beautifulsoup4-4.7.1-py3-none-any.whl (94kB)
[K    100% |████████████████████████████████| 102kB 26.8MB/s a 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/bf/b3/2473abf05c4950c6a829ed5dcbc40d8b56d4351d15d6939c8ffb7c6b1a14/soupsieve-1.7.3-py2.py3-none-any.whl
Building wheels for collected packages: bs4
  Running setup.py bdist_wheel for bs4 ... [?25ldone
[?25h  Stored in directory: /home/ntk38/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.7.1 bs4

In [2]:
def grab_song_links(artist_page_url):

    url = artist_page_url

    html_page = requests.get(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing


    #The example from our lecture/reading
    data = [] #Create a storage container

    #Get album divs
    albums = soup.find_all("div", class_="album")
    for album_n in range(len(albums)):
        #On the last album, we won't be able to look forward
        if album_n == len(albums)-1:
            cur_album = albums[album_n]
            album_songs = cur_album.findNextSiblings('a')
            for song in album_songs:
                page = song.get('href')
                title = song.text
                album = cur_album.text
                data.append((title, page, album))
        else:
            cur_album = albums[album_n]
            next_album = albums[album_n+1]
            saca = cur_album.findNextSiblings('a') #songs after current album
            sbna = next_album.findPreviousSiblings('a') #songs before next album
            album_songs = [song for song in saca if song in sbna] #album songs are those listed after the current album but before the next one!
            for song in album_songs:
                page = song.get('href')
                title = song.text
                album = cur_album.text
                data.append((title, page, album))
    return data



In [3]:
grab_song_links('https://www.azlyrics.com/e/eagles.html')

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

## Text Scraping
Write a secondary function that scrapes the lyrics for each song page.

In [4]:
#Remember to open up the webpage in a browser and control-click/right-click and go to inspect!
from bs4 import BeautifulSoup
import requests

#Example page
url = 'https://www.azlyrics.com/lyrics/gomez/getmiles.html'


html_page = requests.get(url)
soup = BeautifulSoup(html_page.content, 'html.parser')
soup.prettify()[:1000]

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

In [12]:
divs = soup.findAll('div')
divs

[<div id="fb-root"></div>, <div class="container">
 <!-- Brand and toggle get grouped for better mobile display -->
 <div class="navbar-header">
 <button class="navbar-toggle collapsed" data-target="#search-collapse" data-toggle="collapse" type="button">
 <span class="glyphicon glyphicon-search"></span>
 </button>
 <button class="navbar-toggle collapsed" data-target="#artists-collapse" data-toggle="collapse" type="button">
 <span class="glyphicon glyphicon-th-list"></span>
 </button>
 <a class="navbar-brand" href="//www.azlyrics.com"><img alt="AZLyrics.com" class="pull-left" src="//www.azlyrics.com/az_logo_tr.png" style="max-height:40px; margin-top:-10px;"/></a>
 </div>
 <ul class="collapse navbar-collapse nav navbar-nav" id="artists-collapse">
 <li>
 <div class="btn-group text-center" role="group">
 <a class="btn btn-menu" href="//www.azlyrics.com/a.html">A</a>
 <a class="btn btn-menu" href="//www.azlyrics.com/b.html">B</a>
 <a class="btn btn-menu" href="//www.azlyrics.com/c.html">C</

In [14]:
div = divs[0]
div

<div id="fb-root"></div>

In [15]:
for n, div in enumerate(divs):
    if "<!-- Usage of azlyrics.com content by any " in div.text:
        print(n)

In [16]:
main_page = soup.find('div', {"class": "container main-page"})
main_l2 = main_page.find('div', {"class" : "row"})
main_l3 = main_l2.find('div', {"class" : "col-xs-12 col-lg-8 text-center"})

In [17]:
lyrics = main_l3.findAll('div')[6].text
lyrics

"\n\r\nI love this island but this island's killing me\nSitting here in silence, man, I don't get no peace\nThe waves upon my shore take me away piece by piece\nGonna leave everything I know gonna head out towards the sea\nJump off this island gonna head out towards the sea\n\nI love this city man, but this city's killing me\nSitting here in all this noise man, I don't get no peace\nThe cars below my street take me away piece by piece\nGonna leave everything I know gonna head out towards the sea\nGonna leave this city man, gonna head out towards the sea\n\nGet miles away, get miles away\nGet miles away, get miles\n\nI love this planet but this planet's killin' me\nSitting here in all this grass man I don't get no weed\nThe sweat comin' from my pores take me away piece by piece\nGonna leave everything I know gonna head to the Galaxy\nGonna leave this planet man, gonna head to the Galaxy\n\nGet miles away, get miles away\nGet miles away, get miles away\nGet miles away, get miles away\nGe

In [18]:
def scrape_lyrics(song_page_url):
    html_page = requests.get(song_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    main_page = soup.find('div', {"class": "container main-page"})
    main_l2 = main_page.find('div', {"class" : "row"})
    main_l3 = main_l2.find('div', {"class" : "col-xs-12 col-lg-8 text-center"})
    lyrics = main_l3.findAll('div')[6].text
    return lyrics

In [19]:
scrape_lyrics('https://www.azlyrics.com/lyrics/eagles/hotelcalifornia.html')

'\n\r\nOn a dark desert highway, cool wind in my hair\nWarm smell of colitas, rising up through the air\nUp ahead in the distance, I saw a shimmering light\nMy head grew heavy and my sight grew dim\nI had to stop for the night\nThere she stood in the doorway\nI heard the mission bell\nAnd I was thinking to myself\n"This could be Heaven or this could be Hell"\nThen she lit up a candle and she showed me the way\nThere were voices down the corridor\nI thought I heard them say\n\nWelcome to the Hotel California\nSuch a lovely place (Such a lovely place)\nSuch a lovely face\nPlenty of room at the Hotel California\nAny time of year (Any time of year)\nYou can find it here\n\nHer mind is Tiffany-twisted, she got the Mercedes bends\nShe got a lot of pretty, pretty boys she calls friends\nHow they dance in the courtyard, sweet summer sweat\nSome dance to remember, some dance to forget\n\nSo I called up the Captain\n"Please bring me my wine."\nHe said, "We haven\'t had that spirit here since nin

## Synthesizing
Create a script using your two functions above to scrape all of the song lyrics for a given artist.


In [8]:
#Use this block for your code!
links = soup.findAll('a')
links

[<a class="navbar-brand" href="//www.azlyrics.com"><img alt="AZLyrics.com" class="pull-left" src="//www.azlyrics.com/az_logo_tr.png" style="max-height:40px; margin-top:-10px;"/></a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/a.html">A</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/b.html">B</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/c.html">C</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/d.html">D</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/e.html">E</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/f.html">F</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/g.html">G</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/h.html">H</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/i.html">I</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/j.html">J</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/k.html">K</a>,
 <a class="btn btn-menu" href="//www.azlyrics.com/l.html">L</a>,
 <a class="btn btn-menu" href="//www.

In [10]:
links1 = soup.get('href')
links1

## Visualizing
Generate two bar graphs to compare lyrical changes for the artist of your chose. For example, the two bar charts could compare the lyrics for two different songs or two different albums.

In [None]:
#Use this block for your code!

## Level - Up

Think about how you structured the data from your web scraper. Did you scrape the entire song lyrics verbatim? Did you simply store the words and their frequency counts, or did you do something else entirely? List out a few different options for how you could have stored this data. What are advantages and disadvantages of each? Be specific and think about what sort of analyses each representation would lend itself to.

In [None]:
#Use this block for your code!

## Summary

Congratulations! You've now practiced your Beautiful Soup knowledge!