# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [57]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
def scrape_events(events_page_url):
    resp = requests.get(events_page_url)
    soup = BeautifulSoup(resp.text, 'html.parser')
    all = soup.find_all("article")
    entry = []
    for li in all[1:]:
        if li and li.h1 and li.h1.a:
            name = li.h1.a.text
        else:
            name = ""
        if li.h1 and li.h1.span:
            venue = li.h1.span.text
        else:
            venue = ""
        if li.span and li.span.time:
            date = li.span.time.text
        else:
            date = ""
        if li.p and li.p.span:
            attendee = li.p.span.text
        else:
            attendee = ""
            
        entr = [name, venue, date, attendee]
        entry.append(entr)
    
    df = pd.DataFrame(entry)
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [40]:
df = scrape_events("https://www.residentadvisor.net/events")

In [41]:
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,The Spring Up,at Headroom,2019-05-17T00:00,2
1,Drop Dance Party - Silent Disco Edition,at Afrobrazilian Cultural Center of NJ,2019-05-18T00:00,1
2,Summer Rooftop Series,at Pour Abbey's,2019-05-19T00:00,2


## Write a Function to Retrieve the URL for the Next Page

In [None]:
https://www.residentadvisor.net/events/us/newyork/week/2019-05-23

In [64]:
def next_page(url):
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, 'html.parser')
    a = soup.find(attrs={"id": "liNext2"})
    return "https://www.residentadvisor.net/" + a.a["href"]
    

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [51]:
next_page(url)

<a ga-event-action="Next " ga-event-category="event-listings" ga-on="click" href="/events/us/newjersey/week/2019-05-23">Next </a>


'https://www.residentadvisor.net//events/us/newjersey/week/2019-05-23'

In [65]:
url = "https://www.residentadvisor.net/events/us/newyork/week/2019-05-09"
df = pd.DataFrame(["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"])

for i in range(50):
    new_df = scrape_events(url)
    if len(new_df) >0: 
        df = pd.concat([df, new_df])
    url = next_page(url)
    #print(url)
    if not url: break
df.head()

  return self._int64index.union(other)


https://www.residentadvisor.net//events/us/newyork/week/2019-05-16


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  import sys


https://www.residentadvisor.net//events/us/newyork/week/2019-05-23
https://www.residentadvisor.net//events/us/newyork/week/2019-05-30
https://www.residentadvisor.net//events/us/newyork/week/2019-06-06
https://www.residentadvisor.net//events/us/newyork/week/2019-06-13
https://www.residentadvisor.net//events/us/newyork/week/2019-06-20
https://www.residentadvisor.net//events/us/newyork/week/2019-06-27
https://www.residentadvisor.net//events/us/newyork/week/2019-07-04
https://www.residentadvisor.net//events/us/newyork/week/2019-07-11
https://www.residentadvisor.net//events/us/newyork/week/2019-07-18
https://www.residentadvisor.net//events/us/newyork/week/2019-07-25
https://www.residentadvisor.net//events/us/newyork/week/2019-08-01
https://www.residentadvisor.net//events/us/newyork/week/2019-08-08
https://www.residentadvisor.net//events/us/newyork/week/2019-08-15
https://www.residentadvisor.net//events/us/newyork/week/2019-08-22
https://www.residentadvisor.net//events/us/newyork/week/2019-0

KeyError: 'href'

In [66]:
df

Unnamed: 0,0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Event_Name,,,,
1,Venue,,,,
2,Event_Date,,,,
3,Number_of_Attendees,,,,
0,,"Denis Sulta, Day Cart & Wig-Wam Plus The Funky...",at Good Room,2019-05-09T00:00,97
1,,Vitamins: Kiki Kudo ✿ X-Coast ✿ Opheliaxz ✿ Hi...,at Bossa Nova Civic Club,2019-05-09T00:00,10
2,,Xoxa,at Happyfun Hideaway,2019-05-09T00:00,7
3,,Shawn Dub + Monchan + Butter,at public records,2019-05-09T00:00,7
4,,"Divine Phrases 01: DJ Voices, Devoye, DJ Wawa",at Baby's All Right,2019-05-09T00:00,5
5,,"Quest?onmarc, Olive T & Gooddroid",at Le Bain,2019-05-09T00:00,


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!