# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
#Load the https://www.residentadvisor.net/events page in your browser.
from bs4 import BeautifulSoup
import requests
html_page = requests.get('https://www.residentadvisor.net/events') #Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing


In [2]:
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

In [3]:
response = requests.get("https://www.residentadvisor.net/events/us/newyork")
soup = BeautifulSoup(response.content, 'html.parser')

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [4]:
#Open the inspect element feature in your browser
event_listings = soup.find('div', id='event-listing')
event_listings

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=7&amp;yr=2019&amp;dy=24"><span>Wed, 24 Jul 2019 /</span></a></p></li><li class=""><article class="event-item clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1281685#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-07-24T00:00" itemprop="startDate">2019-07-24T00:00</time></span><a href="/events/1281685"><img height="76" src="/images/events/flyer/2019/7/us-0723-1281685-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1281685" itemprop="url" title="Event details of Body Music Therapy with Love Letters">Body Music Therapy with Love Letters</a> <span>at <a href="/club.aspx?id=105873

In [5]:
entries = event_listings.findAll('li')
entries

[<li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=7&amp;yr=2019&amp;dy=24"><span>Wed, 24 Jul 2019 /</span></a></p></li>,
 <li class=""><article class="event-item clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1281685#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-07-24T00:00" itemprop="startDate">2019-07-24T00:00</time></span><a href="/events/1281685"><img height="76" src="/images/events/flyer/2019/7/us-0723-1281685-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1281685" itemprop="url" title="Event details of Body Music Therapy with Love Letters">Body Music Therapy with Love Letters</a> <span>at <a href="/club.aspx?id=105873">Nowadays</a></span></h1><div class="grey event-lineup">Love Lett

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [6]:
#Sucessive exploration in function development
rows = []
for entry in entries:
    #Is it a date? If so, set current date.
    date = entry.find('p', class_='eventDate date')
    event = entry.find('h1', class_='event-title')
    if event:
        details = event.text.split(' at ')
        event_name = details[0].strip()
        venue = details[1].strip()
        try:
            n_attendees = int(re.match('(\d*)', entry.find('p', class_='attending').text)[0])
        except:
            n_attendees = np.nan
        rows.append([event_name])
    elif date:
        cur_data = date.text
    else:
        continue
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,0
0,Body Music Therapy with Love Letters
1,"Midnight Magic, Jacques Renault (Elsewhere Roo..."
2,"John Silas, Kfeelz, Raqx, John Barera and Vilaen"
3,Pure Immanence Xxxvi
4,Delivery. with Mira Fahrenheit and Friends


In [9]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    entries = event_listings.findAll('li')
    rows = []
    for entry in entries:
        date = entry.find('p', class_="eventDate date")
        event = entry.find("h1", class_="event-title")
        if event:
            details = event.text.split(" at ")
            event_name = details[0].strip()
            venue = details[1].strip()
            try: 
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name,venue,cur_date,n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    df.head()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df
scrape_events("https://www.residentadvisor.net/events/us/newyork")

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Body Music Therapy with Love Letters,Nowadays,"Wed, 24 Jul 2019 /",6.0
1,"Midnight Magic, Jacques Renault (Elsewhere Roo...",Elsewhere,"Wed, 24 Jul 2019 /",15.0
2,"John Silas, Kfeelz, Raqx, John Barera and Vilaen",Good Room,"Wed, 24 Jul 2019 /",6.0
3,Pure Immanence Xxxvi,Bossa Nova Civic Club,"Wed, 24 Jul 2019 /",6.0
4,Delivery. with Mira Fahrenheit and Friends,Ms. Yoo,"Wed, 24 Jul 2019 /",
5,"Ūndisclosed: Eli Fola, Catherine J, Jason Munoz",TBA Brooklyn,"Wed, 24 Jul 2019 /",
6,Open Decks Session 73,Eris,"Wed, 24 Jul 2019 /",
7,Expansions NYC Midsummer Party,Doux Supper Club,"Wed, 24 Jul 2019 /",
8,"Expansions NYC Mid-Summer Jam with Josh, Jihad...",Doux,"Wed, 24 Jul 2019 /",
9,"Morgana [free entry]: Mr.C, Kate Simko, Ryan C...",Brooklyn Mirage,"Thu, 25 Jul 2019 /",431.0


## Write a Function to Retrieve the URL for the Next Page

In [10]:
soup
next_button = soup.find('a', {"ga-event-action: "Next "})


<!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in New York, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=ATv7yC5anBBrxJoYdSr-DqUPyab_mqaaXHG0qxMzlYI1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=qgpSmyPbylOKeJFqy2yvCrTgAsw9yQYcJtLKS_vPO6s1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#000000" href="/images/ra_ico

In [None]:
# def next_page(url):
#     #Your code here
#     return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!