# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.


## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser


## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [105]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
#import re

In [32]:
html_page = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(html_page.content, 'html.parser')

In [33]:
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in New York, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color=

In [38]:
events = soup.find('div', id='event-listing')
print(len(events))

3


In [41]:
list_items = events.findAll('li')
len(list_items)

140

In [47]:
list_items[1]

<li class=""><article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2019-06-11T00:00" itemprop="startDate">2019-06-11T00:00</time></span><a href="/events/1275001"><img height="76" src="/images/events/flyer/2019/6/us-0611-1275001-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1275001" itemprop="url" title="Event details of Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz">Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz</a> <span>at <a href="/club.aspx?id=138627">Ms. Yoo</a></span></h1><div class="grey event-lineup">Carlos Alkalina, playsuit, Sveta Voice, Pjay, Bytz</div><p class="attending"><span>5</span> Attending</p></div></article></li>

In [80]:
rows = []
for item in list_items:
    date = item.find('p', class_='eventDate date')
    event = item.find('h1', class_='event-title')
    if event:
        details = event.text.split(' at ')
        artists = details[0].strip()
        venue = details[1].strip()
        attending = item.find('p', class_='attending')
        if attending:
            no_attending = attending.text
        else:
            no_attending = 0
        rows.append([artists, venue, cur_date, no_attending])
    elif date:
        cur_date = date.text
    else:
        continue 

In [81]:
rows

[['Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz',
  'Ms. Yoo',
  'Tue, 11 Jun 2019 /',
  '5 Attending'],
 ['Cancelled', '14B Rooftop / Lounge', 'Tue, 11 Jun 2019 /', '2 Attending'],
 ['Tempo with Steve Tek (Open to Close)',
  'TBA Brooklyn',
  'Tue, 11 Jun 2019 /',
  '2 Attending'],
 ['Feel Real with DJ Disciple, Ejoe Wilson Friends',
  'Rumpus Room',
  'Tue, 11 Jun 2019 /',
  '1 Attending'],
 ['Small Rave 019 - Xcreenplay (Live)/ Perris/ Fergus Waveforms',
  'Bossa Nova Civic Club',
  'Tue, 11 Jun 2019 /',
  0],
 ['Feel Free: Grand Atrium + Buskko', 'Kinfolk 90', 'Tue, 11 Jun 2019 /', 0],
 ["Nacho Rojas' B2b2b2b2b2b2b Bday Bash [actual Bday & day of Arrival in US]",
  'TBA - New York',
  'Wed, 12 Jun 2019 /',
  '5 Attending'],
 ['Full Flex with DJ Swagger, Klein Zage b2b Joey G ii',
  'Good Room',
  'Wed, 12 Jun 2019 /',
  '8 Attending'],
 ['Party Party Party',
  'Bossa Nova Civic Club',
  'Wed, 12 Jun 2019 /',
  '8 Attending'],
 ['Ūndisclosed - Ryan Crosson, Jeff Veliz, Martin, Ju

In [45]:
date = list_items[0].find('p', class_='eventDate date')
print(date.text)

Tue, 11 Jun 2019 /


In [76]:
event = list_items[1].find('h1', class_='event-title')
print(event.text)

Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz at Ms. Yoo


In [54]:
details = event.text.split(' at ')

In [62]:
details

['Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz', 'Ms. Yoo']

In [106]:
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    events = soup.find('div', id='event-listing')
    list_items = events.findAll('li')
    
    rows = []
    for item in list_items:
        date = item.find('p', class_='eventDate date')
        event = item.find('h1', class_='event-title')
        if event:
            details = event.text.split(' at ')
            artists = details[0].strip()
            venue = details[1].strip()
            attending = item.find('p', class_='attending')
            try:
                no_attending = attending.text
            except:
                no_attending = np.nan
            rows.append([artists, venue, cur_date, no_attending])
        elif date:
            cur_date = date.text
        else:
            continue 
    
    
    df = pd.DataFrame(rows)
    df.columns = ["Event_Name", "Venue", "Event_Date", 
                  "Number_of_Attendees"]
    return df

In [107]:
scrape_events('https://www.residentadvisor.net/events')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz",Ms. Yoo,"Tue, 11 Jun 2019 /",5 Attending
1,Cancelled,14B Rooftop / Lounge,"Tue, 11 Jun 2019 /",2 Attending
2,Tempo with Steve Tek (Open to Close),TBA Brooklyn,"Tue, 11 Jun 2019 /",2 Attending
3,"Feel Real with DJ Disciple, Ejoe Wilson Friends",Rumpus Room,"Tue, 11 Jun 2019 /",1 Attending
4,Small Rave 019 - Xcreenplay (Live)/ Perris/ Fe...,Bossa Nova Civic Club,"Tue, 11 Jun 2019 /",
5,Feel Free: Grand Atrium + Buskko,Kinfolk 90,"Tue, 11 Jun 2019 /",
6,Nacho Rojas' B2b2b2b2b2b2b Bday Bash [actual B...,TBA - New York,"Wed, 12 Jun 2019 /",5 Attending
7,"Full Flex with DJ Swagger, Klein Zage b2b Joey...",Good Room,"Wed, 12 Jun 2019 /",8 Attending
8,Party Party Party,Bossa Nova Civic Club,"Wed, 12 Jun 2019 /",8 Attending
9,"Ūndisclosed - Ryan Crosson, Jeff Veliz, Martin...",TBA Brooklyn,"Wed, 12 Jun 2019 /",7 Attending


## Write a Function to Retrieve the URL for the Next Page

In [108]:
def next_page(url):
    #Your code here
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    next_page = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = 'https://www.residentadvisor.net' + next_page
    return next_page_url

In [109]:
next_page('https://www.residentadvisor.net/events')

'https://www.residentadvisor.net/events/us/newyork/week/2019-06-18'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [111]:
#Your code here
url = 'https://www.residentadvisor.net/events'
df = scrape_events(url)

while len(df) < 500: 
    url = next_page(url)
    df2 = scrape_events(url)
    df = pd.concat([df, df2])

print(len(df))

513


In [113]:
df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Delivery. Carlos Alkalina, Playsuit, Pjay, Bytz",Ms. Yoo,"Tue, 11 Jun 2019 /",5 Attending
1,Cancelled,14B Rooftop / Lounge,"Tue, 11 Jun 2019 /",2 Attending
2,Tempo with Steve Tek (Open to Close),TBA Brooklyn,"Tue, 11 Jun 2019 /",2 Attending
3,"Feel Real with DJ Disciple, Ejoe Wilson Friends",Rumpus Room,"Tue, 11 Jun 2019 /",1 Attending
4,Small Rave 019 - Xcreenplay (Live)/ Perris/ Fe...,Bossa Nova Civic Club,"Tue, 11 Jun 2019 /",


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!