# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [2]:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
import numpy as np
import time

# Inspect event element formatting
response = requests.get("https://www.residentadvisor.net/events/us/washingtondc")
soup = BeautifulSoup(response.content, 'html.parser')
events_list = soup.find('div', id="event-listing")
events = events_list.findAll('li')
print('Number of Events Scraped:', len(events), '\nFirst Entry Format:', events[0])

Number of Events Scraped: 28 
First Entry Format: <li><p class="eventDate date"><a href="/events.aspx?ai=22&amp;v=day&amp;mn=10&amp;yr=2019&amp;dy=2"><span>Wed, 02 Oct 2019 /</span></a></p></li>


In [3]:
# Solution template format for one events page
rows = []
for event in events:
    date = event.find('p', class_="eventDate date")
    name = event.find('h1', class_="event-title")
    if name:
        details = name.text.split(' at ') # EVENT_NAME at VENUE
        event_name = details[0].strip()
        venue = details[1].strip()
        try:
            number_of_attendees = int(re.match("(\d*)", event.find('p', class_="attending").text)[0])
        except:
            number_of_attendees = np.nan # no attendees found
        rows.append([event_name, venue, event_date, number_of_attendees])
    elif date:
        event_date = date.text
    else:
        continue
df = pd.DataFrame(rows)
df.columns = ['Event_Name', 'Venue', 'Event_Date', 'Number_of_Attendees']
df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Meute (Live),U Street Music Hall,"Wed, 02 Oct 2019 /",2
1,Roosevelt with Blindstares,Flash,"Wed, 02 Oct 2019 /",1
2,Contact: Shed,Flash,"Thu, 03 Oct 2019 /",22
3,Meute (Live),U Street Music Hall,"Thu, 03 Oct 2019 /",2
4,Beauty in the Backyard,Camp Ramblewood,"Thu, 03 Oct 2019 /",2


In [4]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    event_listings = soup.find('div', id="event-listing")
    events = event_listings.findAll('li')
    rows = []
    for event in events:
        date = event.find('p', class_="eventDate date")
        name = event.find('h1', class_="event-title")
        if name:
            details = name.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                number_of_attendees = int(re.match("(\d*)", event.find('p', class_="attending").text)[0])
            except:
                number_of_attendees = np.nan
            rows.append([event_name, venue, event_date, number_of_attendees])
        elif date:
            event_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    #df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [5]:
def next_page(url):
    #Your code here
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    next_page_part = soup.find('a', attrs={'ga-event-action': "Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + next_page_part
    
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [6]:
#Your code here
df_dc = []
total_rows = 0
prior_rows = -1
max_events = 1000
current_page = "https://www.residentadvisor.net/events/us/washingtondc"
while total_rows <= max_events and total_rows > prior_rows:
        df = scrape_events(current_page)
        df_dc.append(df)
        prior_rows = total_rows
        total_rows += len(df)
        current_page = next_page(current_page)
        print('current rows:', total_rows)
        time.sleep(.25)
        
# Example
df = pd.concat(df_dc)
df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
print('Number of Events Scraped:', len(df))
df.sort_values(by=['Number_of_Attendees', 'Event_Date'], ascending=False).head()

current rows: 20
current rows: 34
current rows: 44
current rows: 58
current rows: 70
current rows: 78
current rows: 81
current rows: 85
current rows: 88
current rows: 90
current rows: 96
current rows: 98
current rows: 99
current rows: 99
Number of Events Scraped: 99


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
5,Theo Parrish,Dark Room - Baltimore,"Sat, 09 Nov 2019 /",146.0
11,hothouse,TBA - Washington DC,"Sat, 05 Oct 2019 /",76.0
10,Eddie C - Chris Nitti - Jacob Herschel,Dark Room - Baltimore,"Sat, 12 Oct 2019 /",71.0
17,Sunday Love: The Finale: DJ Three - Oona Dahl ...,Flash,"Sun, 06 Oct 2019 /",64.0
0,DJ Seinfeld,Flash,"Wed, 09 Oct 2019 /",31.0


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!