# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

url = 'https://www.residentadvisor.net/events'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
# Open the inspect element feature in your browser
event_listings = soup.find('div', id='event-listing')
entries = event_listings.findAll('li')
print(len(entries), entries[0].prettify())

18 <li>
 <p class="eventDate date">
  <a href="/events.aspx?ai=8&amp;v=day&amp;mn=11&amp;yr=2020&amp;dy=25">
   <span>
    Wed, 25 Nov 2020 /
   </span>
  </a>
 </p>
</li>



## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
def scrape_events(url):
    rows = []
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    entry = soup.find('div', id='event-listing')
    s = entry.findAll('li')
    for ele in s:
        event = entry.find('h1', class_="event-title")
        if event:
            tmp = event.text.split(' at ')
            title = tmp[0].strip()
            loc = tmp[1].strip()
            date = ele.text[:10]
            num = ele.text[-11]          
            rows.append([title.strip(), loc.strip(), date, num])            
    df = pd.DataFrame(rows, columns=['Event_title', 'Venue', 'Event_Date', '#_of_Attendees'])
    return df

## Write a Function to Retrieve the URL for the Next Page

In [4]:
def next_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    nxt = soup.find('a', {'ga-event-action':'Next '}).attrs['href']
    next_page_url = 'https://www.residentadvisor.net'+nxt
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
def scrape_events(events_page_url):
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    entries = event_listings.findAll('li')
    rows = []
    for entry in entries:
        #Is it a date? If so, set current date.
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
    df = pd.DataFrame(rows, columns=['Event_title', 'Venue', 'Event_Date', '#_of_Attendees'])
    return df

In [5]:
df = pd.DataFrame(columns=['Event_title', 'Venue', 'Event_Date', '#_of_Attendees'])
url = 'https://www.residentadvisor.net/events'
while len(df) < 10:
    df = df.append(scrape_events(url))
    url = next_page(url)
    time.sleep(.1)
print(len(df))
df.head()

18


Unnamed: 0,Event_title,Venue,Event_Date,#_of_Attendees
0,Unter Baths: A Holiday Merch Drop,Unter Baths,"Wed, 25 No",
1,Unter Baths: A Holiday Merch Drop,Unter Baths,2020-11-25,1.0
2,Unter Baths: A Holiday Merch Drop,Unter Baths,"Thu, 26 No",
3,Unter Baths: A Holiday Merch Drop,Unter Baths,2020-11-26,1.0
4,Unter Baths: A Holiday Merch Drop,Unter Baths,"Fri, 27 No",


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!