# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
from bs4 import BeautifulSoup
import requests

In [2]:
import re

In [3]:
html_page = requests.get('https://www.residentadvisor.net/events/us/newyork/day/2019-05-17')
soup = BeautifulSoup(html_page.content, 'html.parser')

In [4]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en,ja,es">
 <head id="_x1">
  <title>
   RA: Events in New York on Friday, 17 May 2019
  </title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="en,ja,es" http-equiv="content-language"/>
  <meta content="RA: Resident Advisor" name="Description"/>
  <meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, on, friday, 17, may, 2019" name="Keywords"/>
  <meta content="Resident Advisor" name="Author"/>
  <meta content="Resident Advisor" property="og:site_name"/>
  <meta content="712773712080127" property="fb:app_id"/>
  <link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
  <meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/>
  <link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
  <link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/>
  <l

In [5]:
regex = re.compile("fl col(.*)")
fl_col = soup.find('div', {"class": regex})

In [56]:
h1 = fl_col.findAll('h1')

In [57]:
h1s = [h1.find('a').attrs['title'][17:] for h1 in fl_col.findAll('h1')]

In [8]:
h1s

['Innervisions New York',
 'Headless Horseman Live / Vatican Shadow / Volvox / Becka Diamond',
 'Friday: PLO Man All Night',
 'ReSolute w Move D & Flabbergast',
 'Material 17: Nico Laa',
 'Full Moon with Sébastien Léger',
 'Pete Rock',
 'Museum of Love (DJ set), L&l&l Record Club Plus Spiritual Mental Physical',
 'Just Blaze, Matt FX and Trillnatured',
 'Rendezvous with Sons of Immigrants, Arvi, CGC',
 'Easy with Will Buck, Jordan James and Wilki',
 'Schimanski presents: Soup NYC vs Ill Behavior',
 'Afterhours - Bushwick A/V: Velasco / Cafuné / Sid Vaga',
 "D'Noir AM feat. Sashi, Mitchell & Cleveland",
 'Rakim Y Ken Y Live Club 2019 (18 to Party)',
 'Call Super + CCL',
 'Egyptian Lover',
 'Eamon & Justin, Soul Summit, Analog Soul, and More',
 'Satori & The Band From Space: Live in Concert',
 'Nycxdesign Urban Imprint Launch Party',
 'Sleepy & Boo, Dpak Manny Digz, Flowmingo - Nublu Classic',
 'Technofeminism',
 'Origins NY: 012',
 'Loveless Records and Friends with The Dance Pit, Monte

In [62]:
venue = soup.findAll('h1', class_='event-title')
venue[2].find('span').text[3:]

'Nowadays'

In [67]:
venues = []
for i in range(len(venue)):
    if venue[i].find('span').text[3:] == None:
        venues.append('Venue TBA')
    else:
        venues.append(venue[i].find('span').text[3:])

In [68]:
venues

['Knockdown Center',
 'BASEMENT',
 'Nowadays',
 'TBA - New York',
 'Hart bar',
 'House Of Yes',
 'Analog Bkny',
 'Good Room',
 'Elsewhere',
 'TBA Brooklyn',
 'Elsewhere',
 'Schimanski',
 'Polygon BK',
 'Polygon BK',
 'Club X-Kandalo (Formerly the Avalon/Newburgh Skate Park)',
 'public records',
 'Le Bain',
 'TBA - Brooklyn',
 'National Sawdust',
 'A/D/O',
 'Nublu',
 'Bossa Nova Civic Club',
 'Escondido',
 'Elsewhere',
 'Black Flamingo',
 'Bedlam Bar And Lounge',
 'H0L0',
 'The Deep End',
 'The Fuze Box, Buffalo/Rochester',
 'Fusion Lounge NY',
 'Starliner',
 'Bean & Barley Harlem',
 'Our Wicked Lady',
 'Bohemian Hall & Beer Garden',
 'Refuge Arts',
 'Friends & Lovers']

In [89]:
import datetime

# The size of each step in days
day_delta = datetime.timedelta(days=1)

start_date = datetime.date.today()
end_date = start_date + 1*day_delta

print(end_date)

type(str(end_date))


2019-05-18


str

In [9]:
def scrape_events(events_page_url):
    #Your code here
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!