# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup

In [4]:
# just a bit of exploration

# html_page = requests.get('https://www.residentadvisor.net/events')
# soup = BeautifulSoup(html_page.content, 'html.parser')
# events_div = soup.find('div', id='event-listing')
# events_list = events_div.findAll('article')
# events_list[0].find('p', 'attending').find('span').text

In [5]:
def scrape_events(events_page_url):
    Event_Name = []
    Venue = []
    Event_Date = []
    Number_of_Attendees = []
    
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    events_div = soup.find('div', id='event-listing')
    events_list = events_div.findAll('article')
    for event in events_list:
        Event_Name.append(event.find('h1', class_='event-title').find('a').text)
        Venue.append(event.findAll('a')[-1].text)
        Event_Date.append(event.find('time')['datetime'])
        num_attending = 0;
        if event.find('p', 'attending'):
            num_attending = int(event.find('p', 'attending').find('span').text)
        Number_of_Attendees.append(num_attending)
    
    df = pd.DataFrame([Event_Name, Venue, Event_Date, Number_of_Attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    df['Number_of_Attendees'] = df['Number_of_Attendees'].astype('int')
    df['Event_Date'] = pd.to_datetime(df['Event_Date'])
    return df

In [6]:
df = scrape_events('https://www.residentadvisor.net/events')
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Field Trip 109: Eli Brown,Q Nightclub,2020-02-27,1
1,"Haüsed: Diet Panda B2B David Hayes, Aivi Lo & ...",Kremwerk,2020-02-27,1
2,Shameless 17 Year Anniversary Party with DeWal...,Re-Bar,2020-02-28,9
3,People Music presents // Stylust Activated Tou...,Kremwerk,2020-02-28,1
4,5K Feat. Arnold & Lane,Kremwerk,2020-02-28,1
5,Bottom Forty presents Alison Swing,Kremwerk,2020-02-29,4
6,Diggin' Deep with Dave Seaman,The Monkey Loft,2020-02-29,1
7,Haute Sauce: Falcons & Andre Power,Q Nightclub,2020-02-29,0


## Write a Function to Retrieve the URL for the Next Page

In [7]:
# more exploration to make sure what's going into my function works, step by step.

# url = 'https://www.residentadvisor.net/events/'
# html_page = requests.get(url)
# soup = BeautifulSoup(html_page.content, 'html.parser')
# next_location = soup.find('li', id='liNext').find('a')['href']
# next_page_url = url[:url.rfind('/events')] + next_location
# display(next_page_url)

In [8]:
def next_page(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    next_page_url = ''
    if soup.find('li', class_='but arrow-right right disabled', id='liNext'):
        next_page_url = ''
    else:
        next_location = soup.find('li', id='liNext').find('a')['href']
        next_page_url = url[:url.rfind('/events')] + next_location
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [9]:
# url = 'https://www.residentadvisor.net/events/us/seattle/week/2020-06-23'
# next_page(url)
# html_page = requests.get(url)
# soup = BeautifulSoup(html_page.content, 'html.parser')
# soup.find('li', id='liNext')

# next_page_url = ''
# if soup.find('li', id='liNext').find('a')['href']:
#     next_location = soup.find('li', id='liNext').find('a')['href']
#     next_page_url = url[:url.rfind('/events')] + next_location
# else:
#     next_page_url = ''

In [23]:
url = 'https://www.residentadvisor.net/events'
print(url)
df = scrape_events(url)

while next_page(url):
    url = next_page(url)
    print(url)
    df = df.append(scrape_events(url), ignore_index=True)
    
df.shape

https://www.residentadvisor.net/events
https://www.residentadvisor.net/events/us/seattle/week/2020-03-03
https://www.residentadvisor.net/events/us/seattle/week/2020-03-10
https://www.residentadvisor.net/events/us/seattle/week/2020-03-17
https://www.residentadvisor.net/events/us/seattle/week/2020-03-24
https://www.residentadvisor.net/events/us/seattle/week/2020-03-31
https://www.residentadvisor.net/events/us/seattle/week/2020-04-07
https://www.residentadvisor.net/events/us/seattle/week/2020-04-14
https://www.residentadvisor.net/events/us/seattle/week/2020-04-21
https://www.residentadvisor.net/events/us/seattle/week/2020-04-28
https://www.residentadvisor.net/events/us/seattle/week/2020-05-05
https://www.residentadvisor.net/events/us/seattle/week/2020-05-12
https://www.residentadvisor.net/events/us/seattle/week/2020-05-19
https://www.residentadvisor.net/events/us/seattle/week/2020-05-26
https://www.residentadvisor.net/events/us/seattle/week/2020-06-02
https://www.residentadvisor.net/event

(56, 4)

In [24]:
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Field Trip 109: Eli Brown,Q Nightclub,2020-02-27,1
1,"Haüsed: Diet Panda B2B David Hayes, Aivi Lo & ...",Kremwerk,2020-02-27,1
2,Shameless 17 Year Anniversary Party with DeWal...,Re-Bar,2020-02-28,9
3,People Music presents // Stylust Activated Tou...,Kremwerk,2020-02-28,1
4,5K Feat. Arnold & Lane,Kremwerk,2020-02-28,1
5,Bottom Forty presents Alison Swing,Kremwerk,2020-02-29,4
6,Diggin' Deep with Dave Seaman,The Monkey Loft,2020-02-29,1
7,Haute Sauce: Falcons & Andre Power,Q Nightclub,2020-02-29,0
8,Night School: Gattuso,Q Nightclub,2020-03-03,0
9,Haüsed: Flava D,Kremwerk,2020-03-05,2


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!