# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [20]:
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

In [8]:
response = requests.get('https://www.residentadvisor.net/events/us/washingtondc')
soup = BeautifulSoup(response.content, 'html.parser')
soup.prettify()

'<!DOCTYPE html>\n<html lang="en,ja,es">\n <head id="_x1">\n  <title>\n   RA: Events in Washington DC, United States of America\n  </title>\n  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>\n  <meta content="en,ja,es" http-equiv="content-language"/>\n  <meta content="RA: Resident Advisor" name="Description"/>\n  <meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, washington, dc, united, states, america" name="Keywords"/>\n  <meta content="Resident Advisor" name="Author"/>\n  <meta content="Resident Advisor" property="og:site_name"/>\n  <meta content="712773712080127" property="fb:app_id"/>\n  <link href="/bundles/default-css?v=FkfRVAlFvpndxqgZliJaJOXD-OhkiRFP8nrBK9Pg2R01" rel="stylesheet"/>\n  <meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/>\n  <link href="/bundles/cat-listings-css?v=qgpSmyPbylOKeJFqy2yvCrTgAsw9yQYcJtLKS_vPO6s1" rel="stylesheet"/>\n  <link href="/favicon.ico" rel="icon" type="imag

In [18]:
events = soup.find('div', id='event-listing')
entries = events.findAll('li')
print(len(entries), entries[0])

2 <li><p class="eventDate date"><a href="/events.aspx?ai=22&amp;v=day&amp;mn=5&amp;yr=2020&amp;dy=7"><span>Thu, 07 May 2020 /</span></a></p></li>


In [21]:
rows = []
for entry in entries:
    date = entry.find('p', class_='eventDate date')
    event = entry.find('h1', class_='event-title')
    if event:
        details = event.text.split(' at ')
        event_name = details[0].strip()
        venue = details[1].strip()
        try:
            n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
        except:
            n_attendees = np.nan
        rows.append([event_name, venue, cur_date, n_attendees])
    elif date:
        cur_date = date.text
    else:
        continue
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,0,1,2,3
0,[POSTPONED] Hernan Cattaneo,Flash,"Thu, 07 May 2020 /",16


In [None]:
def scrape_events(events_page_url):
    response = requests.get('https://www.residentadvisor.net/events/us/washingtondc')
    soup = BeautifulSoup(response.content, 'html.parser')
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!