# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [56]:
import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

In [93]:
#Load the https://www.residentadvisor.net/events page in your browser.
response = requests.get("https://www.residentadvisor.net/events/us/newyork")
soup = BeautifulSoup(response.content, 'html.parser')

In [97]:
event_listings = soup.find('div', id="event-listing")

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [98]:
entries = event_listings.find_all('li')

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [100]:
def scrape_events(events_page_url):
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    event_listings = soup.find('div', id="event-listing")
    entries = event_listings.find_all('li')
    rows = []
    for entry in entries:
        #Is it a date? If so, set current date.
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    return rows
scrape_events(events_page_url)

[['Body Music Therapy with Nihal Ramchandani',
  'Nowadays',
  'Tue, 25 Jun 2019 /',
  17],
 ['Techno BUS presents: Aggressive School Nite',
  'Jupiter Disco',
  'Tue, 25 Jun 2019 /',
  9],
 ['Cheers Bklyn 25', 'TBA Brooklyn', 'Tue, 25 Jun 2019 /', 8],
 ['New World Dysorder: A Riot', 'Bossa', 'Tue, 25 Jun 2019 /', 3],
 ['Forbidden Colors', 'public records', 'Tue, 25 Jun 2019 /', nan],
 ['Feel Real with DJ Disciple, Ejoe Wilson Friends',
  'Rumpus Room',
  'Tue, 25 Jun 2019 /',
  nan],
 ['Exotic House Party with Animal Feelings (Live) and Kristine Barilli (DJ)',
  'Elsewhere',
  'Tue, 25 Jun 2019 /',
  nan],
 ['Prefuse 73', 'Le Poisson Rouge', 'Wed, 26 Jun 2019 /', 5],
 ['Pure Immanence Xxxv', 'Bossa Nova Civic Club', 'Wed, 26 Jun 2019 /', 18],
 ['Nicky Siano - Live Podcast Launch', 'Good Room', 'Wed, 26 Jun 2019 /', 7],
 ['Funk You', 'House Of Yes', 'Wed, 26 Jun 2019 /', 7],
 ['Omni Assembly', 'Good Room', 'Wed, 26 Jun 2019 /', nan],
 ['Ūndisclosed - Nu Sky, Mitchell Frederick, Alex Fo

## Write a Function to Retrieve the URL for the Next Page

In [73]:
def next_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + url_ext
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [103]:
#Your code here
dfs = []
total_rows = 0
cur_url = "https://www.residentadvisor.net/events/us/newyork"
while total_rows <= 1000:
    df = scrape_events(cur_url)
    dfs.append(df)
    total_rows += len(df)
    cur_url = next_page(cur_url)
df = pd.concat(dfs)
df = df.iloc[:1000]
print(len(df))
df.head()
        

AttributeError: 'NoneType' object has no attribute 'attrs'

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!