# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [12]:
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

In [22]:
r = requests.get("https://www.residentadvisor.net/events/us/losangeles")
soup = BeautifulSoup(r.content, 'html.parser')
events = soup.find('div', id="event-listing")

In [23]:
listings = events.findAll('li')
print(len(listings), listings[0])

72 <li><p class="eventDate date"><a href="/events.aspx?ai=23&amp;v=day&amp;mn=11&amp;yr=2019&amp;dy=13"><span>Wed, 13 Nov 2019 /</span></a></p></li>


In [15]:
rows = []
for listing in listings:
    # set current date if it's a date
    date = listing.find('p', class_="eventDate date")
    event = listing.find('h1', class_="event-title")
    if event:
        details = event.text.split(' at ')
        event_name = details[0].strip()
        venue = details[1].strip()
        try:
            n_attendees = int(re.match("(\d*)", listing.find('p', class_="attending").text)[0])
        except:
            n_attendees = np.nan
        rows.append([event_name, venue, cur_date, n_attendees])
    elif date:
        cur_date = date.text
    else:
        continue
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,0,1,2,3
0,"EDIT Room _ Cocodisco, Inbal Lankry and Very S...",Bar Franca,"Wed, 13 Nov 2019 /",27.0
1,Clinic with Ramon Tapia (Drumcode),The Sayers Club,"Wed, 13 Nov 2019 /",7.0
2,Theta Grooves...A Deep/Dub/Minimal Techno Even...,Gravlax,"Wed, 13 Nov 2019 /",2.0
3,Conflict of Interest with Shel—b (Rhythm Rapport),Bar Henry,"Wed, 13 Nov 2019 /",2.0
4,Made to Move with Kosmik + Stacy Christine,General Lee's Cocktail House,"Thu, 14 Nov 2019 /",15.0


In [30]:
def scrape_events(events_page_url):
    r = requests.get(events_page_url)
    soup = BeautifulSoup(r.content, 'html.parser')
    listings = events.findAll('li')
    rows = []
    for listing in listings:
        #Is it a date? If so, set current date.
        date = listing.find('p', class_="eventDate date")
        event = listing.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", listing.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    df.head()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [31]:
scrape_events('https://www.residentadvisor.net/events/us/losangeles')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"EDIT Room _ Cocodisco, Inbal Lankry and Very S...",Bar Franca,"Wed, 13 Nov 2019 /",27.0
1,Clinic with Ramon Tapia (Drumcode),The Sayers Club,"Wed, 13 Nov 2019 /",7.0
2,Theta Grooves...A Deep/Dub/Minimal Techno Even...,Gravlax,"Wed, 13 Nov 2019 /",2.0
3,Conflict of Interest with Shel—b (Rhythm Rapport),Bar Henry,"Wed, 13 Nov 2019 /",2.0
4,Made to Move with Kosmik + Stacy Christine,General Lee's Cocktail House,"Thu, 14 Nov 2019 /",15.0
5,L.A. 909: Chrome Mamí & Urenium,Paper Tiger Bar,"Thu, 14 Nov 2019 /",6.0
6,Sylvan Esso presents With,Walt Disney Concert Hall,"Thu, 14 Nov 2019 /",5.0
7,AFI Fest 2019 presented by Audi,TCL Chinese Theatres,"Thu, 14 Nov 2019 /",2.0
8,Jupiter by Stealth Nights,TBA - Los Angeles,"Thu, 14 Nov 2019 /",1.0
9,You Know What! 1 Year Anniversary Feat. Floog ...,TBA - Downtown LA,"Fri, 15 Nov 2019 /",70.0


## Write a Function to Retrieve the URL for the Next Page

In [32]:
soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']

'/events/us/losangeles/week/2019-11-20'

In [33]:
def next_page(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    link = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page = "https://www.residentadvisor.net" + link
    return next_page

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [34]:
next_page('https://www.residentadvisor.net/events/us/losangeles')

'https://www.residentadvisor.net/events/us/losangeles/week/2019-11-20'

In [38]:
df = scrape_events('https://www.residentadvisor.net/events/us/losangeles/week/2019-11-20')
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"EDIT Room _ Cocodisco, Inbal Lankry and Very S...",Bar Franca,"Wed, 13 Nov 2019 /",27.0
1,Clinic with Ramon Tapia (Drumcode),The Sayers Club,"Wed, 13 Nov 2019 /",7.0
2,Theta Grooves...A Deep/Dub/Minimal Techno Even...,Gravlax,"Wed, 13 Nov 2019 /",2.0
3,Conflict of Interest with Shel—b (Rhythm Rapport),Bar Henry,"Wed, 13 Nov 2019 /",2.0
4,Made to Move with Kosmik + Stacy Christine,General Lee's Cocktail House,"Thu, 14 Nov 2019 /",15.0
5,L.A. 909: Chrome Mamí & Urenium,Paper Tiger Bar,"Thu, 14 Nov 2019 /",6.0
6,Sylvan Esso presents With,Walt Disney Concert Hall,"Thu, 14 Nov 2019 /",5.0
7,AFI Fest 2019 presented by Audi,TCL Chinese Theatres,"Thu, 14 Nov 2019 /",2.0
8,Jupiter by Stealth Nights,TBA - Los Angeles,"Thu, 14 Nov 2019 /",1.0
9,You Know What! 1 Year Anniversary Feat. Floog ...,TBA - Downtown LA,"Fri, 15 Nov 2019 /",70.0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!