# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [4]:
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    event_listing = soup.find('div', class_="fl col4")
    titles = [x.text.strip() for x in soup.select('h1.event-title')]
    dates = [x.text for x in event_listing.findAll('time')]
    span_list = event_listing.findAll('span')
    venues = [x.text.split('>')[0][3:] for x in span_list if "at " in x.text]
    attending = [int(x.text.split('>')[0][0]) for x in soup.findAll('p', class_="attending")]
    df = pd.DataFrame([titles, venues, dates, attending]).T
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df 

In [5]:
df = scrape_events("https://www.residentadvisor.net/events/us/newyork")

In [6]:
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Wave Sessions with Mala at The Brown Note,The Brown Note,2019-05-14T00:00,2
1,Body Music Therapy with Pure Immanence at Nowa...,Nowadays,2019-05-14T00:00,1
2,MASHT NYC Feat. Talking Rings at Jupiter Disco,Jupiter Disco,2019-05-14T00:00,1
3,"Tempo with Pablo Romero, Lorenzo Slider at TBA...",TBA Brooklyn,2019-05-14T00:00,6
4,Cipher at Bossa Nova Civic Club,Bossa Nova Civic Club,2019-05-14T00:00,2
5,"Delivery. Henry Chow, Sveta Voice, Pjay, Bytz ...",Ms. Yoo,2019-05-14T00:00,1
6,Feel Real presents Going Places at Rumpus Room,Rumpus Room,2019-05-14T00:00,1
7,"Postponed Until Further Notice 51717, Huerco S...",444 Club,2019-05-14T00:00,6
8,Satori and the Band From Space at House Of Yes,House Of Yes,2019-05-15T00:00,7
9,"Eamon & Justin, Soul Summit, Analog Soul, and ...",TBA - Brooklyn,2019-05-15T00:00,4


## Write a Function to Retrieve the URL for the Next Page

In [7]:
def next_page(url):
    from urllib.parse import urljoin
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    next_page_url = [urljoin(url,x.get('href')) for x in soup.findAll('a') if "Next " in x.text][0]
    return next_page_url

In [8]:
next_page("https://www.residentadvisor.net/events/us/newyork")

'https://www.residentadvisor.net/events/us/newyork/week/2019-05-21'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [32]:
def scrape_lots_of_events(starting_url, num_events_to_scrape=1000):
    url = starting_url
    df = pd.DataFrame()
    while len(df) < num_events_to_scrape:
        new_df = scrape_events(url)
        url = next_page(url)
        df = pd.concat([df, new_df])
    df.sort_values(by=['Number_of_Attendees', 'Event_Date'], inplace=True, ascending=False)
    df.reset_index(inplace=True)
    return df

In [None]:
df = scrape_lots_of_events("https://www.residentadvisor.net/events/us/newyork", num_events_to_scrape=1000)

In [None]:
len(df)

In [None]:
df.head()

In [None]:
df.tail()

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!