# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [18]:
# Open the inspect element feature in your browser
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time

In [2]:
html_page = requests.get('https://www.residentadvisor.net/events/us/georgia/week/2019-07-01')
soup = BeautifulSoup(html_page.content, 'lxml')

In [3]:
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>
<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in Georgia, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, georgia, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=nA9uavMPeFtRavUj5m2TICqAtGhp7tbC5z-iw5vW6k81" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=qgpSmyPbylOKeJFqy2yvCrTgAsw9yQYcJtLKS_vPO6s1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#00

In [4]:
event_name = soup.find_all('h1', class_='event-title')

In [17]:
event_name
for event in event_name:
    print(event.text)

  


Foolish Behaviour 4th of July Pre-Party at The Music Room
Souled Out Party at Gallery 992
Proper Taste: Feat. João & Elio Stereo at Midcity Cafe
Desert Hearts feat. Mikey Lion, Lee Reynolds, & Marbs at Ravine


<h1 class="event-title" itemprop="summary"><a href="/events/1273253" itemprop="url" title="Event details of Foolish Behaviour 4th of July Pre-Party">Foolish Behaviour 4th of July Pre-Party</a> <span>at <a href="/club.aspx?id=41317">The Music Room</a></span></h1>

In [6]:
event_date = soup.find_all('p', class_='eventDate date')
for date in event_date:
    print(date.text[0:-2])

Wed, 03 Jul 2019
Thu, 04 Jul 2019
Fri, 05 Jul 2019
Sat, 06 Jul 2019


In [7]:
attending = soup.find_all('p', class_='attending')
for attendee in attending:
    print(attendee.text[0])
    
    
attending

1
4
1


[<p class="attending"><span>1</span> Attending</p>,
 <p class="attending"><span>4</span> Attending</p>,
 <p class="attending"><span>1</span> Attending</p>]

In [8]:
attending2 = soup.find_all('div', class_='grey event-lineup')

atten2 = [atten.nextSibling for atten in attending2]
atten3 = []
for number in atten2:
    if number == None:
        atten3.append(0)
    else:
        atten3.append(number.text[0])
        
atten3


['1', 0, '4', '1']

In [9]:
container = soup.find_all('h1', class_="event-title")
venues = None
for place in container:
    place.findChildren('span')
    for venue in place.findChildren('span'):
        venue.findChildren('a')
        for location in venue.findChildren('a'):
            print(location.text)
        



The Music Room
Gallery 992
Midcity Cafe
Ravine


In [10]:
concert = event_name[2].text

In [11]:
concert = concert.replace('\n', '')
concert

'Proper Taste: Feat. João & Elio Stereo at Midcity Cafe'

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [14]:
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'lxml')
    
    event_names = [name.text for name in soup.find_all('h1', class_='event-title')]
    
    venues = []
    for place in container:
        place.findChildren('span')
        for venue in place.findChildren('span'):
            venue.findChildren('a')
            for location in venue.findChildren('a'):
                venues.append(location.text)
    
    
    event_date = [date.text[0:-2] for date in soup.find_all('p', class_='eventDate date')]
    attendees = [attendee.nextSibling for attendee in soup.find_all('div', class_='grey event-lineup')]
    attendees_clean = []
    for number in attendees:
        if number == None:
            attendees_clean.append(0)
        else:
            attendees_clean.append(number.text[0])
        
    
    df = pd.DataFrame([event_names, venues, event_date, attendees_clean]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [15]:
scrape_events('https://www.residentadvisor.net/events/us/georgia/week/2019-07-01')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Foolish Behaviour 4th of July Pre-Party at The...,The Music Room,"Wed, 03 Jul 2019",1
1,Souled Out Party at Gallery 992,Gallery 992,"Thu, 04 Jul 2019",0
2,Proper Taste: Feat. João & Elio Stereo at Midc...,Midcity Cafe,"Fri, 05 Jul 2019",4
3,"Desert Hearts feat. Mikey Lion, Lee Reynolds, ...",Ravine,"Sat, 06 Jul 2019",1


## Write a Function to Retrieve the URL for the Next Page

In [19]:
def next_page(url):
    #Your code here
    response = requests.get(url)
    soup = BeautifulSoup(response.content , 'lxml')
    url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net"+url_ext
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [20]:
#Your code here
dfs = []
total_rows = 0
cur_url = "https://www.residentadvisor.net/events/us/georgia/week/2016-07-01"
while total_rows <= 1000:
    df = scrape_events(cur_url)
    dfs.append(df)
    total_rows += len(df)
    cur_url = next_page(cur_url)
    time.sleep(.2)
    
df = pd.concat(dfs)
df = df.iloc[:1000]
print(len(df))
df.head(20)

1000


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Crew Love: Soul Clap, Wolf + Lamb, Nick Monaco...",The Music Room,"Sat, 02 Jul 2016",2.0
1,"&Me, Baez, Jeremy Ismael, Tocayo & More at The...",Gallery 992,"Sun, 03 Jul 2016",1.0
2,,Midcity Cafe,,
3,,Ravine,,
0,Wiggle Factor & Atlanta Techno Love present Cl...,The Music Room,"Sat, 09 Jul 2016",6.0


In [21]:
df.head(20)

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Crew Love: Soul Clap, Wolf + Lamb, Nick Monaco...",The Music Room,"Sat, 02 Jul 2016",2.0
1,"&Me, Baez, Jeremy Ismael, Tocayo & More at The...",Gallery 992,"Sun, 03 Jul 2016",1.0
2,,Midcity Cafe,,
3,,Ravine,,
0,Wiggle Factor & Atlanta Techno Love present Cl...,The Music Room,"Sat, 09 Jul 2016",6.0
1,Roman Flügel (TIX Sold at Door) at The Alley C...,Gallery 992,"Thu, 14 Jul 2016",7.0
2,,Midcity Cafe,,
3,,Ravine,,
0,Stripped Records Record Release Party with Nor...,The Music Room,"Fri, 15 Jul 2016",2.0
1,Tyrone Francis - Celebrating 50 Years of Life ...,Gallery 992,"Sat, 16 Jul 2016",1.0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!