# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
#Load the https://www.residentadvisor.net/events page in your browser.
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

In [2]:
response = requests.get('https://www.residentadvisor.net/events/us/newyork/day/2019-05-17')
soup = BeautifulSoup(response.content, 'html.parser')

In [3]:
soup

<!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in New York on Friday, 17 May 2019
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, on, friday, 17, may, 2019" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#000000" href="/images/ra_icon

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [4]:
#Open the inspect element feature in your browser
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in New York on Friday, 17 May 2019
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, on, friday, 17, may, 2019" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [5]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en,ja,es">
 <head id="_x1">
  <title>
   RA: Events in New York on Friday, 17 May 2019
  </title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="en,ja,es" http-equiv="content-language"/>
  <meta content="RA: Resident Advisor" name="Description"/>
  <meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, on, friday, 17, may, 2019" name="Keywords"/>
  <meta content="Resident Advisor" name="Author"/>
  <meta content="Resident Advisor" property="og:site_name"/>
  <meta content="712773712080127" property="fb:app_id"/>
  <link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
  <meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/>
  <link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
  <link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/>
  <l

In [6]:
soup.find('p', class_='attending').text[:-10]

'709'

In [7]:
attendee = soup.findAll('p', class_='attending')

In [40]:
attendee = soup.findAll('p', class_='attending')
attendee_list = []
for i in range(len(attendee)):
    attendee_list.append(int(attendee[i].text[:-10]))
attendee_list

[706,
 255,
 98,
 64,
 25,
 20,
 12,
 12,
 9,
 7,
 3,
 3,
 3,
 2,
 1,
 46,
 29,
 22,
 20,
 8,
 6,
 4,
 3,
 2,
 2,
 2,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1]

In [70]:
soup.find('h1', class_='event-title').find('span')

<span>at <a href="/club.aspx?id=69401">Knockdown Center</a></span>

In [29]:
location = soup.findAll('h1', class_='event-title')
location[3].find('span').text[3:]

'TBA - New York'

In [32]:
location = soup.findAll('h1', class_='event-title')
location_list = []
for i in range(len(location)):
    if location[i].find('span').text[3:] == None:
        location_list.append('TBD')
    else:
        location_list.append(location[i].find('span').text[3:])
    

In [33]:
location_list

['Knockdown Center',
 'BASEMENT',
 'Nowadays',
 'TBA - New York',
 'Hart bar',
 'House Of Yes',
 'Good Room',
 'Analog Bkny',
 'Elsewhere',
 'TBA Brooklyn',
 'Elsewhere',
 'Schimanski',
 'Polygon BK',
 'Polygon BK',
 'Club X-Kandalo (Formerly the Avalon/Newburgh Skate Park)',
 'public records',
 'Le Bain',
 'TBA - Brooklyn',
 'National Sawdust',
 'A/D/O',
 'Nublu',
 'Bossa Nova Civic Club',
 'Escondido',
 'Elsewhere',
 'Black Flamingo',
 'The Deep End',
 'Bedlam Bar And Lounge',
 'H0L0',
 'The Fuze Box, Buffalo/Rochester',
 'Fusion Lounge NY',
 'Starliner',
 'Bean & Barley Harlem',
 'Our Wicked Lady',
 'Bohemian Hall & Beer Garden',
 'Refuge Arts',
 'Friends & Lovers']

In [6]:
date = soup.findAll('time')

In [7]:
date[0].text[:10]

'2019-05-17'

In [8]:
date_list = []
for i in range(len(date)):
    date_list.append(date[i].text[:10])
date_list

['2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17',
 '2019-05-17']

In [9]:
event = soup.findAll('h1', class_="event-title")

In [10]:
event[0].text

'Innervisions New York at Knockdown Center'

In [11]:
event_list = []
for i in range(len(event)):
    event_list.append(event[i].text)

In [12]:
event_list

['Innervisions New York at Knockdown Center',
 'Headless Horseman Live / Vatican Shadow / Volvox / Becka Diamond at BASEMENT',
 'Friday: PLO Man All Night at Nowadays',
 'ReSolute w Move D & Flabbergast at TBA - New York',
 'Material 17: Nico Laa at Hart bar',
 'Full Moon with Sébastien Léger at House Of Yes',
 'Pete Rock at Analog Bkny',
 'Museum of Love (DJ set), L&l&l Record Club Plus Spiritual Mental Physical at Good Room',
 'Just Blaze, Matt FX and Trillnatured at Elsewhere',
 'Rendezvous with Sons of Immigrants, Arvi, CGC at TBA Brooklyn',
 'Easy with Will Buck, Jordan James and Wilki at Elsewhere',
 'Schimanski presents: Soup NYC vs Ill Behavior at Schimanski',
 'Afterhours - Bushwick A/V: Velasco / Cafuné / Sid Vaga at Polygon BK',
 "D'Noir AM feat. Sashi, Mitchell & Cleveland at Polygon BK",
 'Rakim Y Ken Y Live Club 2019 (18 to Party) at Club X-Kandalo (Formerly the Avalon/Newburgh Skate Park)',
 'Call Super + CCL at public records',
 'Egyptian Lover at Le Bain',
 'Eamon & Ju

In [5]:
def scrape_events(events_page_url):
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    event = soup.findAll('h1', class_="event-title")
    
    event_list = [] 
    for i in range(len(event)):
        event_list.append(event[i].text)
        
    location = soup.findAll('h1', class_='event-title')
    location_list = []
    for i in range(len(location)):
        if location[i].find('span').text[3:] == None:
            location_list.append('TBD')
        else:
            location_list.append(location[i].find('span').text[3:])
    
    date = soup.findAll('time')
    date_list = []
    for i in range(len(date)):
        date_list.append(date[i].text[:10])
    
    attendee = soup.findAll('p', class_='attending')
    attendee_list = []
    for i in range(len(attendee)):
        attendee_list.append(int(attendee[i].text[:-10]))
    
    df = pd.DataFrame([event_list, location_list, date_list, attendee_list]).transpose()
            
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [6]:
scrape_events('https://www.residentadvisor.net/events/us/newyork/day/2019-05-17')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Innervisions New York at Knockdown Center,Knockdown Center,2019-05-17,706
1,Headless Horseman Live / Vatican Shadow / Volv...,BASEMENT,2019-05-17,255
2,Friday: PLO Man All Night at Nowadays,Nowadays,2019-05-17,101
3,ReSolute w Move D & Flabbergast at TBA - New York,TBA - New York,2019-05-17,69
4,Material 17: Nico Laa at Hart bar,Hart bar,2019-05-17,26
5,Full Moon with Sébastien Léger at House Of Yes,House Of Yes,2019-05-17,21
6,"Museum of Love (DJ set), L&l&l Record Club Plu...",Good Room,2019-05-17,13
7,Pete Rock at Analog Bkny,Analog Bkny,2019-05-17,12
8,"Just Blaze, Matt FX and Trillnatured at Elsewhere",Elsewhere,2019-05-17,9
9,"Rendezvous with Sons of Immigrants, Arvi, CGC ...",TBA Brooklyn,2019-05-17,7


## Write a Function to Retrieve the URL for the Next Page

In [7]:
from datetime import timedelta, date, datetime


In [8]:
url = 'https://www.residentadvisor.net/events/us/newyork/day/2019-05-17'

In [9]:
x = datetime.strptime(url[-10:], '%Y-%m-%d')
x

datetime.datetime(2019, 5, 17, 0, 0)

In [10]:
import pandas as pd
daterange = pd.date_range('2019-05-17', '2019-06-17')

In [11]:
for single_date in daterange:
    print (single_date.strftime("%Y-%m-%d"))

2019-05-17
2019-05-18
2019-05-19
2019-05-20
2019-05-21
2019-05-22
2019-05-23
2019-05-24
2019-05-25
2019-05-26
2019-05-27
2019-05-28
2019-05-29
2019-05-30
2019-05-31
2019-06-01
2019-06-02
2019-06-03
2019-06-04
2019-06-05
2019-06-06
2019-06-07
2019-06-08
2019-06-09
2019-06-10
2019-06-11
2019-06-12
2019-06-13
2019-06-14
2019-06-15
2019-06-16
2019-06-17


In [12]:
x.strftime('%Y-%m-%d')

'2019-05-17'

In [13]:
x.isoformat()[:10]

'2019-05-17'

In [14]:
url[-10:]

'2019-05-17'

In [15]:
delta = datetime.timedelta(days=1)

AttributeError: type object 'datetime.datetime' has no attribute 'timedelta'

In [16]:
def next_page(url):
    
    x = datetime.strptime(url[-10:], '%Y-%m-%d')
    delta = timedelta(days=1)
    start_date = x
    end_date = start_date +1*delta  
    next_page_url = url[:-10]+ str(end_date)[:-9]
    
    #Your code here
    return next_page_url

In [17]:
next_page('https://www.residentadvisor.net/events/us/newyork/day/2019-05-17')

'https://www.residentadvisor.net/events/us/newyork/day/2019-05-18'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [18]:
def scraper(url,num_entries):
    df = pd.DataFrame()
    while len(df)<= num_entries:
        print(len(df))
        df1= scrape_events(url)
        name = f"Event_Scraper_{len(df)}.csv"
        df.to_csv(name)
        print(f"Stored {name}")
        df = pd.concat([df,df1])  
        url = next_page(url)
    return df
        

In [35]:
df = scraper('https://www.residentadvisor.net/events/us/newyork/day/2019-05-17', 300)

0
Stored Event_Scraper_0.csv
36
Stored Event_Scraper_36.csv
83
Stored Event_Scraper_83.csv
103
Stored Event_Scraper_103.csv
105
Stored Event_Scraper_105.csv
112
Stored Event_Scraper_112.csv
119
Stored Event_Scraper_119.csv
131
Stored Event_Scraper_131.csv
160
Stored Event_Scraper_160.csv
186
Stored Event_Scraper_186.csv
204
Stored Event_Scraper_204.csv
208
Stored Event_Scraper_208.csv
211
Stored Event_Scraper_211.csv
214
Stored Event_Scraper_214.csv
220
Stored Event_Scraper_220.csv
238
Stored Event_Scraper_238.csv
267
Stored Event_Scraper_267.csv
273
Stored Event_Scraper_273.csv
273
Stored Event_Scraper_273.csv
275
Stored Event_Scraper_275.csv
275
Stored Event_Scraper_275.csv
281
Stored Event_Scraper_281.csv
299
Stored Event_Scraper_299.csv


In [36]:
df = df.reset_index()
df

Unnamed: 0,index,Event_Name,Venue,Event_Date,Number_of_Attendees
0,0,Innervisions New York at Knockdown Center,Knockdown Center,2019-05-17,706
1,1,Headless Horseman Live / Vatican Shadow / Volv...,BASEMENT,2019-05-17,255
2,2,Friday: PLO Man All Night at Nowadays,Nowadays,2019-05-17,101
3,3,ReSolute w Move D & Flabbergast at TBA - New York,TBA - New York,2019-05-17,70
4,4,Material 17: Nico Laa at Hart bar,Hart bar,2019-05-17,26
5,5,Full Moon with Sébastien Léger at House Of Yes,House Of Yes,2019-05-17,21
6,6,"Museum of Love (DJ set), L&l&l Record Club Plu...",Good Room,2019-05-17,13
7,7,Pete Rock at Analog Bkny,Analog Bkny,2019-05-17,12
8,8,"Just Blaze, Matt FX and Trillnatured at Elsewhere",Elsewhere,2019-05-17,9
9,9,"Rendezvous with Sons of Immigrants, Arvi, CGC ...",TBA Brooklyn,2019-05-17,7


In [37]:
df = df.drop('index', axis = 1)

In [44]:
df.groupby('Venue').count().sort_values('Event_Name', ascending = False)

Unnamed: 0_level_0,Event_Name,Event_Date,Number_of_Attendees
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Elsewhere,34,34,34
TBA Brooklyn,15,15,15
Nowadays,15,15,15
public records,13,13,13
Good Room,13,13,13
Bossa Nova Civic Club,12,12,12
TBA - New York,12,12,12
Le Bain,12,12,12
H0L0,11,11,10
TBA - Brooklyn,8,8,7


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!