# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [8]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [9]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [10]:
# import required libraries
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import re

In [14]:
# set variables for parsing
url = "https://www.residentadvisor.net/events"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

### Find the Page's Event Names

In [127]:
container = soup.find('div', id="events-listing").findAll('li')
pagers

#entries = event_listings.findAll('li')

[<li class="">
 <article class="highlight-top">
 <p>Fri, 10 Jan 2020</p>
 <a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1363481"><img class="nohide" src="/images/events/flyer/2020/1/us-0110-1363481-list.jpg"/></a>
 <p class="counter nohide">
 <span>1</span> attending
 </p>
 <a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1363481">
 <h1>
 Dining in the Dark - DJ Charles Martin
 </h1>
 </a>
 <p class="copy nohide">
 Greenhouse Bistro
 </p>
 </article>
 </li>, <li class="">
 <article class="highlight-top">
 <p>Sat, 11 Jan 2020</p>
 <a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1365223"><img class="nohide" src="/images/events/flyer/2020/1/us-0111-1365223-list.jpg"/></a>
 <p class="counter nohide">
 <span>2</span> attending
 </p>
 <a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1365223">
 <h1>
 Lif

In [129]:
# find the page events container
page_events_box = (soup.find('div', id="events-listing").findAll('h1'))
page_events_box

[<h1>
 Dining in the Dark - DJ Charles Martin
 </h1>, <h1>
 Lift x Rhythmfire Sessions: Rometti, Owen Alek, Philophonic
 </h1>]

In [80]:
# find the first event name with newline characters removed
first_event = page_events_box[0].text.strip()
first_event

'Dining in the Dark - DJ Charles Martin'

In [42]:
# create a list of dictionaries for conversion to a pandas dataframe
event_list = []

for event in page_events_box:
    event_dict = {}
    event_dict["event_name"] = event.text.strip()
    event_list.append(event_dict)
event_list

[{'event_name': 'Dining in the Dark - DJ Charles Martin'},
 {'event_name': 'Lift x Rhythmfire Sessions: Rometti, Owen Alek, Philophonic'}]

### Produce the Events DataFrame Series

In [44]:
df = pd.DataFrame(event_list)
df.columns = ['Event_Name']
df.head()

Unnamed: 0,Event_Name
0,Dining in the Dark - DJ Charles Martin
1,"Lift x Rhythmfire Sessions: Rometti, Owen Alek..."


### Venues

In [83]:
# find page events venues
#events_venues = (soup.findAll('p', {"class": "copy nohide"}))
events_venues = (soup.find('div', id="events-listing").findAll('p', {"class": "copy nohide"}))
events_venues

[<p class="copy nohide">
 Greenhouse Bistro
 </p>, <p class="copy nohide">
 <a href="\club.aspx?id=172593">Poor Boys</a>
 </p>]

In [84]:
# create a list of dictionaries for conversion to a pandas dataframe
venues_list = []

for venue in events_venues:
    venues_dict = {}
    venues_dict["venue"] = venue.text.strip()
    venues_list.append(venues_dict)
venues_list

[{'venue': 'Greenhouse Bistro'}, {'venue': 'Poor Boys'}]

### Event Dates

In [118]:
# find event date
#events_dates = (soup.find('article', {"class": "highlight-top"}).findAll('p'))
# events_dates = (soup.find('div', id="events-listing").findAll('article', {"class": "highlight-top"}))
# events_dates[0]

events_dates = (soup.find('div', id="events-listing").findAll('article', {"class": "highlight-top"}))
events_dates[0]

<article class="highlight-top">
<p>Fri, 10 Jan 2020</p>
<a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1363481"><img class="nohide" src="/images/events/flyer/2020/1/us-0110-1363481-list.jpg"/></a>
<p class="counter nohide">
<span>1</span> attending
</p>
<a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1363481">
<h1>
Dining in the Dark - DJ Charles Martin
</h1>
</a>
<p class="copy nohide">
Greenhouse Bistro
</p>
</article>

In [None]:
entries = event_listings.findAll('li')


In [77]:
# find the first event name with newline characters removed
first_event_date = events_dates[0].text.strip()
first_event_date

'Fri, 10 Jan 2020'

In [79]:
# create a list of dictionaries for conversion to a pandas dataframe
dates_list = []

for date in events_dates:
    dates_dict = {}
    dates_dict["event_date"] = date.text.strip()
    dates_list.append(dates_dict)
dates_list

[{'event_date': 'Fri, 10 Jan 2020'},
 {'event_date': '1 attending'},
 {'event_date': 'Greenhouse Bistro'}]

### Number of Event Attendees

In [None]:
# get number of attendees

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!