# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.
import re
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import time

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [2]:
def scrape_events(events_page_url):
    #Your code here
    url_get = requests.get(events_page_url)
    soup = BeautifulSoup(url_get.content, "html.parser")
    events = soup.find("div", id="event-listing").findAll("li")
    rows = []
    
    for event in events:
            eventDate = event.find("p", class_="eventDate date")
            title = event.find("h1", class_="event-title")

            if title:
                name  = title.text.split(" at ")[0]
                venue = title.text.split(" at ")[1]
                try:
                    attending = int(re.match("(\d*)", 
                                             event.find('p', class_="attending").text)[0])
                except:
                    attending = np.nan
                rows.append([name, venue, date, attending])
            elif eventDate:
                try:
                    date = eventDate.text
                except:
                    continue
            else:
                continue
            
    df = pd.DataFrame(rows)
    df.columns = ['Event_Name', 'Venue', 'Event_Date', 'Number_of_Attendees']
    return df

In [3]:
scrape_events("https://www.residentadvisor.net/events")

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Mor Elian by BauHaus Houston,"Bauhaus, Houston","Fri, 20 Dec 2019 /",4.0
1,DeepAUS presents: End of the Decade After Part...,"TBA - Austin, Austin","Sat, 21 Dec 2019 /",2.0
2,Erick Rosales,"Bauhaus, Houston","Sat, 21 Dec 2019 /",2.0
3,House of Tones presents: Sandy Rivera & GAWP,"The Parish, Austin","Sat, 21 Dec 2019 /",1.0
4,SHADED,"Vulcan Gas Company, Austin","Sat, 21 Dec 2019 /",1.0
5,Gritsy presents Sicaria Sound,"The Dive, Houston","Sat, 21 Dec 2019 /",


## Write a Function to Retrieve the URL for the Next Page

In [4]:
def next_page(url):
    #Your code here
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, "html.parser")
    url_next = soup.find("a", attrs={"ga-event-action":"Next "}).attrs["href"]
    next_page = "https://www.residentadvisor.net" + url_next
    return next_page

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [5]:
#Your code here
df_list = []
rows = 0
url = "https://www.residentadvisor.net/events/us/texas"

while rows <= 1000:
    try:
        df = scrape_events(url)
        df_list.append(df)
        rows += len(df)
        url = next_page(url)
        time.sleep(1)
    except:
        rows = 1001
df = pd.concat(df_list)
df = df.iloc[:1000].sort_values(by=['Number_of_Attendees', 'Event_Date'], 
                                ascending=[False, True])
print(len(df))
df.head()

22


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
2,"Bas_mrkt 2020 NYE! Barbuto, Sara Landry, Nymbl...","The Oven, Austin","Tue, 31 Dec 2019 /",23.0
0,Mor Elian by BauHaus Houston,"Bauhaus, Houston","Fri, 20 Dec 2019 /",4.0
0,Debbie Does Disco: Disco Dive Sessions PT. 5,"Tradewinds Social Club, Dallas/Fort Worth","Fri, 27 Dec 2019 /",4.0
1,"Midwest Sessions, 4yr Anniversary with Red Eye","Plush, Austin","Fri, 27 Dec 2019 /",3.0
1,Lunar Eclipse: A Drone Night,"523 Thompson Warehouse, Austin","Fri, 10 Jan 2020 /",2.0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!