# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

In [2]:
from bs4 import BeautifulSoup
import requests

In [127]:
import pandas as pd
import numpy as np

### Investigating

In [3]:
html = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(html.content, 'html.parser')

In [4]:
soup

<!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in Texas, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, texas, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#000000" href="/images/ra_icon.svg" 

In [37]:
event_listing = soup.find('div', class_="fl col4")
event_listing

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=318&amp;v=day&amp;mn=7&amp;yr=2019&amp;dy=3"><span>Wed, 03 Jul 2019 /</span></a></p></li><li class=""><article class="event-item clearfix " itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2019-07-03T00:00" itemprop="startDate">2019-07-03T00:00</time></span><a href="/events/1282955"><img height="76" src="/images/events/flyer/2019/7/us-0703-1282955-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1282955" itemprop="url" title="Event details of Inner City Techno">Inner City Techno</a> <span>at <a href="/club.aspx?id=88969">Empire Control Room &amp; Garage</a>, <a href="/events.aspx?ai=321">Austin</a></span></h1><div class="grey event-lineup">BombChelle, Phamstar, Mr. &amp; Mrs, Tempr, Digital Dre, SNAXX</div><p class="attending"><span>5</span> Attending</p></

### Finding event date

In [96]:
event_listing.findAll('p', class_="eventDate date")[0]

<span>Wed, 03 Jul 2019 /</span>

In [142]:
dates = []
for date in event_listing.findAll('p', class_="eventDate date"):
    a = date.text.split()
    b = ' '.join(a[:4])
    dates.append(b)
print(dates)

['Wed, 03 Jul 2019', 'Thu, 04 Jul 2019', 'Fri, 05 Jul 2019', 'Sat, 06 Jul 2019', 'Sun, 07 Jul 2019']


### Finding event name

In [174]:
event_listing.findAll('h1', class_='event-title')

[<h1 class="event-title" itemprop="summary"><a href="/events/1282955" itemprop="url" title="Event details of Inner City Techno">Inner City Techno</a> <span>at <a href="/club.aspx?id=88969">Empire Control Room &amp; Garage</a>, <a href="/events.aspx?ai=321">Austin</a></span></h1>,
 <h1 class="event-title" itemprop="summary"><a href="/events/1272945" itemprop="url" title="Event details of Land of the Freeks (4th of July Pool Party)">Land of the Freeks (4th of July Pool Party)</a> <span>at <span class="grey" style="display:inline;">TBA - Austin</span>, <a href="/events.aspx?ai=321">Austin</a></span></h1>,
 <h1 class="event-title" itemprop="summary"><a href="/events/1283625" itemprop="url" title="Event details of Steady Rhythm [Friday Edition]">Steady Rhythm [Friday Edition]</a> <span>at <a href="/club.aspx?id=164544">Plush ATX</a>, <a href="/events.aspx?ai=321">Austin</a></span></h1>,
 <h1 class="event-title" itemprop="summary"><a href="/events/1284239" itemprop="url" title="Event details

In [176]:
event_listing.findAll('h1', class_='event-title')[0].find('a')

<a href="/events/1282955" itemprop="url" title="Event details of Inner City Techno">Inner City Techno</a>

In [173]:
titles = []
for title in event_listing.findAll('h1', class_='event-title'):
    a = title.find('a').attrs['title'].split()
    b = ' '.join(a[3:])
    titles.append(b)
print(titles)

['Inner City Techno', 'Land of the Freeks (4th of July Pool Party)', 'Steady Rhythm [Friday Edition]', 'Corona Electric Beach feat. Party Favor', '[CANCELLED] Midnight Rites: *Mntra* - Badbeat, Lance le rok, Oscar M', 'Life Before Monday - Desert Hearts Take Over', 'Private Label presents Ben Bohmer (Live)', 'Roni Size with Lefty, Know Matter, Freshtilldef and Anjlkllr at Empire Control Room']


### Finding venues

In [182]:
event_listing.findAll('h1', class_='event-title')[0].find('span').text

'at Empire Control Room & Garage, Austin'

In [189]:
venues = []
for venue in event_listing.findAll('h1', class_='event-title'):
    a = venue.find('span').text.split()
    b = ' '.join(a[1:])
    venues.append(b)
print(venues)

['Empire Control Room & Garage, Austin', 'TBA - Austin, Austin', 'Plush ATX, Austin', 'Backyard, Dallas/Fort Worth', 'The Techno Bunker, Dallas/Fort Worth', 'Club Here I Love You, El Paso', 'The Terrace at Stereo Live, Houston', 'Empire Control Room & Garage, Austin']


### Finding attendees

In [193]:
event_listing.findAll('p', class_='attending')[0].find('span').text

'5'

In [194]:
attendees = []
for attendee in event_listing.findAll('p', class_='attending'):
    a = attendee.find('span').text
    attendees.append(a)
print(attendees)

['5', '54', '1', '2', '1', '1', '2']


In [199]:
df1 = pd.DataFrame(attendees)
df1

Unnamed: 0,0
0,5
1,54
2,1
3,2
4,1
5,1
6,2


## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [200]:
def scrape_events(events_page_url):
    html = requests.get(events_page_url)
    soup = BeautifulSoup(html.content, 'html.parser')
    event_listing = soup.find('div', class_="fl col4")

    #event date
    dates = []
    for date in event_listing.findAll('p', class_='eventDate date'):
        a = date.text.split()
        b = ' '.join(a[:4])
        dates.append(b)

    #event name
    titles = []
    for title in event_listing.findAll('h1', class_='event-title'):
        a = title.find('a').attrs['title'].split()
        b = ' '.join(a[3:])
        titles.append(b)

    #venues
    venues = []
    for venue in event_listing.findAll('h1', class_='event-title'):
        a = venue.find('span').text.split()
        b = ' '.join(a[1:])
        venues.append(b)

    #attendees
    attendees = []
    for attendee in event_listing.findAll('p', class_='attending'):
        a = attendee.find('span').text
        attendees.append(a)
    
    #df's
    df1 = pd.DataFrame(titles)
    df2 = pd.DataFrame(venues)
    df3 = pd.DataFrame(dates)
    df4 = pd.DataFrame(attendees)
    
    df = pd.concat([df1, df2, df3, df4], axis = 1)
        
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [201]:
scrape_events('https://www.residentadvisor.net/events')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Inner City Techno,"Empire Control Room & Garage, Austin","Wed, 03 Jul 2019",5.0
1,Land of the Freeks (4th of July Pool Party),"TBA - Austin, Austin","Thu, 04 Jul 2019",56.0
2,Steady Rhythm [Friday Edition],"Plush ATX, Austin","Fri, 05 Jul 2019",1.0
3,Corona Electric Beach feat. Party Favor,"Backyard, Dallas/Fort Worth","Sat, 06 Jul 2019",2.0
4,"[CANCELLED] Midnight Rites: *Mntra* - Badbeat,...","The Techno Bunker, Dallas/Fort Worth","Sun, 07 Jul 2019",1.0
5,Life Before Monday - Desert Hearts Take Over,"Club Here I Love You, El Paso",,1.0
6,Private Label presents Ben Bohmer (Live),"The Terrace at Stereo Live, Houston",,2.0
7,"Roni Size with Lefty, Know Matter, Freshtillde...","Empire Control Room & Garage, Austin",,


## Stopping for now - p

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!