# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.
# https://www.residentadvisor.net/events/us/michigan

In [None]:
# <div class="clearfix pt1">

In [1]:
from bs4 import BeautifulSoup
import requests

In [6]:
html_page = requests.get("https://www.residentadvisor.net/events/us/michigan")
soup = BeautifulSoup(html_page.content, 'html.parser')

In [39]:
main.prettify

<bound method Tag.prettify of <div class="content clearfix">
<div class="plus8">
<div>
<script src="/bundles/area-country-date-filter-js?v=Q02f5CsIXWPDeOtgJVocYJrwjcQrf96m1qneHFerLLA1"></script>
<ul class="clearfix classic" data-enableevent="True" id="ulButtons">
<li class="but arrow-down right" id="liCountry">
<input id="hdnCountryId" name="ctl00$ctl00$cphContent$_contentmain$ucAreaCountryDateFilter$hdnCountryId" type="hidden" value="2"/>
<span>US</span><input id="country-name" type="hidden" value="United States of America"/><ul class="links" style="display: none;"><li data-id="7"><a href="/events">Canada</a></li><li data-id="15"><a href="/events">France</a></li><li data-id="12"><a href="/events">Germany</a></li><li data-id="20"><a href="/events">Italy</a></li><li data-id="6"><a href="/events">Netherlands</a></li><li data-id="5"><a href="/events">Spain</a></li><li data-id="3"><a href="/events">United Kingdom</a></li><li class="selected" data-id="2"><a href="/events">United States of A

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser
# <p class="eventDate date">

In [38]:
main = soup.find('div', class_="content clearfix")
main.findAll('p', class_="eventDate date")[0].find('span').text

'Thu, 13 Aug 2020 /'

In [41]:
list = []
for element in main.findAll('p', class_="eventDate date"):
    date = element.find('span').text
    date = date[:-2]
    list.append(date)
list

['Thu, 13 Aug 2020', 'Sat, 15 Aug 2020']

In [82]:
def get_dates(main):
    list = []
    for element in main.findAll('p', class_="eventDate date"):
        date = element.find('span').text
        date = date[:-2]
        list.append(date)
    return list

In [49]:
get_date(main)

['Thu, 13 Aug 2020', 'Sat, 15 Aug 2020']

In [None]:
#  <h1 class="event-title" itemprop="summary"><a href="/events/1420700" itemprop="url" 
# title="Event details of Gettoblaster">Gettoblaster</a> 
# <span>at <a href="/club.aspx?id=184007">The Aretha: Lake Lounge</a>

In [55]:
main.findAll('h1', class_="event-title")[1].find('a').text

'Kai Alcé'

In [56]:
def get_event_names(main):
    list = []
    for element in main.findAll('h1', class_="event-title"):
        name = element.find('a').text
        list.append(name)
    return list

In [57]:
get_event_names(main)

['Gettoblaster', 'Kai Alcé', 'Omar S, John FM']

In [61]:
main.findAll('h1', class_="event-title")[1].find('span').find('a').text

'The Aretha: Lake Lounge'

In [62]:
def get_venues(main):
    list = []
    for element in main.findAll('h1', class_="event-title"):
        venue = element.find('span').find('a').text
        list.append(venue)
    return list

In [63]:
get_venues(main)

['The Aretha: Lake Lounge', 'The Aretha: Lake Lounge', 'Detroit']

In [67]:
main.findAll('p', class_="attending")[0].find('span').text

'1'

In [71]:
def get_num_attendees(main):
    list = []
    for element in main.findAll('p', class_="attending"):
        num = element.find('span').text
        list.append(int(num))
    return list

In [72]:
get_num_attendees(main)

[1, 1, 1]

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [110]:
import pandas as pd
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    main = soup.find('div', class_="content clearfix")
    dates = get_dates(main)
    dates.append(dates[1]) # I'm cheating here because of the way the dates scraped. 
    event_names = get_event_names(main)
    venues = get_venues(main)
    attendees = get_num_attendees(main)
    df = pd.DataFrame([event_names, venues, dates, attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

#     df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
#     return df

In [111]:
scrape_events("https://www.residentadvisor.net/events/us/michigan")

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Gettoblaster,The Aretha: Lake Lounge,"Thu, 13 Aug 2020",1
1,Kai Alcé,The Aretha: Lake Lounge,"Sat, 15 Aug 2020",1
2,"Omar S, John FM",Detroit,"Sat, 15 Aug 2020",1


## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!