# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.


## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [221]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from datetime import datetime


def scrape_events(events_page_url):
    html_page = requests.get(url) # Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser') # Make some soup
    all_tag = soup.find('ul', {'id': 'items'})  # this is my container
    #cols = ['Date', "Event"]
    
    all_events = [x for x in all_tag.find_all('li')]
    new_events = []
    for event in all_events:
        if event.find('p', {'class': 'eventDate date'}):
            date = event.find('p', {'class': 'date'}).span.text.strip('/').strip()
            date = datetime.strptime(date, '%a, %d %b %Y')
            continue
        if event.find('h1', {'class': "event-title"}):
            title = event.find('h1', {'class': 'event-title'}).find('a')['title'].replace("Event details of", "")
            venue = event.find('div', {'class': 'bbox'}).find('span').text
            attending = int(event.find('p', {'class': 'attending'}).find('span').text)
    
        new_event = {'Date': date, "Event": title, "Venue": venue, "Attending": attending}
        new_events.append(new_event)
    
    df = pd.DataFrame(new_events)
    return df
    


url = "https://www.residentadvisor.net/events"
scrape_events(url)


Unnamed: 0,Date,Event,Venue,Attending
0,2020-05-01,Guy J,at Spybar,9
1,2020-05-02,Attlas: Lavender God Tour,at Spybar,6
2,2020-05-02,Elrow Rowsattacks Goes to Chicago,at Radius,14


## Write a Function to Retrieve the URL for the Next Page

In [222]:
def next_page(url):
    html_page = requests.get(url) # Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser') # Make some soup
    button = soup.find('a', {'ga-event-action': 'Next '})  # event container
    next_page_url = 'https://www.residentadvisor.net' + button['href']
    return next_page_url


# url = "https://www.residentadvisor.net/events"
# print(button['href'])
# next_page_url = 'https://www.residentadvisor.net' + button['href']
# print(next_page_url)
# next_page_url = button.attr('href')



#next_page(url)

#<a ga-on="click" ga-event-category="event-listings" ga-event-action="Next " href="/events/us/chicago/week/2020-05-07">Next </a>

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

url = "https://www.residentadvisor.net/events"
print(url)
df = scrape_events(url)
url = next_page(url)
print(url)
done = False

while len(df) < 1000:
    try:
        new_df = scrape_events(url)
        url = next_page(url)
        print(url, len(df))
        df = pd.concat([new_df, df])
    except:
        done = True

df.index = [x for x in range(len(df))]
df = df.sort_values(by=["Attending", 'Date'], ascending=[0, 1])
df

    

https://www.residentadvisor.net/events
https://www.residentadvisor.net/events/us/chicago/week/2020-05-07
https://www.residentadvisor.net/events/us/chicago/week/2020-05-14 3
https://www.residentadvisor.net/events/us/chicago/week/2020-05-21 5
https://www.residentadvisor.net/events/us/chicago/week/2020-05-28 7
https://www.residentadvisor.net/events/us/chicago/week/2020-06-04 9
https://www.residentadvisor.net/events/us/chicago/week/2020-06-11 11
https://www.residentadvisor.net/events/us/chicago/week/2020-06-18 13
https://www.residentadvisor.net/events/us/chicago/week/2020-06-25 15
https://www.residentadvisor.net/events/us/chicago/week/2020-07-02 15
https://www.residentadvisor.net/events/us/chicago/week/2020-07-09 15
https://www.residentadvisor.net/events/us/chicago/week/2020-07-16 20
https://www.residentadvisor.net/events/us/chicago/week/2020-07-23 20
https://www.residentadvisor.net/events/us/chicago/week/2020-07-30 21
https://www.residentadvisor.net/events/us/chicago/week/2020-08-06 21
ht

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!

COVID-19 Cancellations!!  There are almost no events.  I could never get to 1000 this way.