# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.

import re
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

site = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(site.content, 'html.parser')

date = soup.find('div', class_='strip slide small')
first_date = date.findAll('p')[0].text

each = soup.find('div', class_='strip slide small').findAll('li', class_='')
date = [x.findAll('p')[0].text for x in each]
name = [x.find('h1').text.strip('\n') for x in each]
venue = [x.findAll('p')[-1].text.strip('\n') for x in each]
attendees = [x.find('span').text for x in each]

next_site = soup.find('li', class_='clearfix pt1').findAll('a')[1]['href']
next_site

'/events/us/colorado/week/2020-09-10'

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [80]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [2]:
def scrape_events(events_page_url):
    site = requests.get(events_page_url)
    soup = BeautifulSoup(site.content, 'html.parser')
    each = soup.find('div', class_='strip slide small').findAll('li', class_='')
    date = [x.findAll('p')[0].text for x in each]
    name = [x.find('h1').text.strip('\n') for x in each]
    venue = [x.findAll('p')[-1].text.strip('\n') for x in each]
    attendees = [x.find('span').text for x in each]
    df = pd.DataFrame([name, venue, date, attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

len(scrape_events('https://www.residentadvisor.net/events'))

3

## Write a Function to Retrieve the URL for the Next Page

In [6]:
def next_page(url):
    url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + url_ext
    return next_page_url

next_page('https://www.residentadvisor.net/event')


'https://www.residentadvisor.net/events/us/colorado/week/2020-09-10'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here
def 1000_event_scraper:
    while len(scrape_events('https://www.residentadvisor.net/events')) < 1001:
        if len < 1:
            scrape_events('https://www.residentadvisor.net/events')
            our_url = 'https://www.residentadvisor.net/events'
        else:
            next_page(our_url)
        append to the dataframe and then recurr

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!