# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [8]:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
import time
import numpy as np

In [2]:
htmlpage = requests.get('https://www.residentadvisor.net/events/us/texas')
soup = BeautifulSoup(htmlpage.content, 'html.parser')

In [10]:
eventlist = soup.find('div', id='event-listing')

In [48]:
events = eventlist.findAll('li')
events[1].prettify()

'<li class="">\n <article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event">\n  <span style="display:none;">\n   <time datetime="2020-08-17T00:00" itemprop="startDate">\n    2020-08-17T00:00\n   </time>\n  </span>\n  <a href="/events/1397904">\n   <img height="76" src="/images/listing-default.gif" width="152"/>\n  </a>\n  <div class="bbox">\n   <h1 class="event-title" itemprop="summary">\n    <a href="/events/1397904" itemprop="url" title="Event details of The Airport Session Electro-Sax Jams with Noah Peterson">\n     The Airport Session Electro-Sax Jams with Noah Peterson\n    </a>\n    <span>\n     at\n     <a href="/club.aspx?id=182539">\n      The San Antonio International Airport\n     </a>\n    </span>\n   </h1>\n   <div class="grey event-lineup">\n    Noah Peterson\n   </div>\n  </div>\n </article>\n</li>\n'

In [58]:

for event in events:
    eventinfo = event.find('h1', class_='event-title')
    eventdate = event.find('p', class_='eventDate date')
eventname = eventinfo.text.split(' at ')[0]
venue = eventinfo.text.split(' at ')[1]


AttributeError: 'NoneType' object has no attribute 'text'

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [83]:

rows = []

def scrape_events(events_page_url):
    htmlpage = requests.get(events_page_url)
    soup = BeautifulSoup(htmlpage.content, 'html.parser')
    eventlist = soup.find('div', id='event-listing')
    events = eventlist.findAll('li')
    
    
    for event in events:
        eventinfo = event.find('h1', class_='event-title')
        eventdate = event.find('p', class_='eventDate date')
        if eventinfo:
            try:
                eventname = eventinfo.text.split(' at ')[0]
            except:
                eventname = np.nan
            try:
                venue = eventinfo.text.split(' at ')[1]
            except:
                venue = np.nan
            try:
                attendeenumber = int(re.match("(\d*)", event.find('p', class_="attending").text)[0])
            except:
                attendeenumber = np.nan
            rows.append([eventname, venue, eventDate, attendeenumber])
        elif eventdate:
            eventDate = eventdate.text
        
    df = pd.DataFrame(rows)
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [85]:
def next_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    url_next = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + url_next
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [92]:
rows = []
dfs = []
df = []
startrow = 0
url = 'https://www.residentadvisor.net/events/us/texas'

while startrow <= 2:
    df = scrape_events(url)
    dfs.append(df)
    startrow += len(df)
    url = next_page(url)
    time.sleep(.5)
df = pd.concat(dfs)
df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_attendees
0,The Airport Session Electro-Sax Jams with Noah...,The San Antonio International Airport,"Mon, 17 Aug 2020 /",
0,The Airport Session Electro-Sax Jams with Noah...,The San Antonio International Airport,"Mon, 17 Aug 2020 /",
1,Third Fridays with Noah Peterson,Sanchos Cantina,"Fri, 21 Aug 2020 /",1.0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!