# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [7]:
# Load the https://www.residentadvisor.net/events page in your browser.
test_url = 'https://www.residentadvisor.net/events'

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

### Write a function to parse the page into soup

In [6]:
def get_soup(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    return soup

In [9]:
test_soup = get_soup(test_url)
test_soup

<!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1">
<script>
            if (typeof dataLayer === 'undefined') {
                dataLayer = []
            }
        </script>
<script async="" src="https://www.googletagmanager.com/gtag/js?id=AW-940832047"></script>
<script>
            window.dataLayer = window.dataLayer || [];

            function gtag() { dataLayer.push(arguments); }
            gtag('js', new Date());

            gtag('config', 'AW-940832047');
        </script>
<script>
            (function (w, d, s, l, i) {
                w[l] = w[l] || []; w[l].push({
                    'gtm.start':
                        new Date().getTime(), event: 'gtm.js'
                }); var f = d.getElementsByTagName(s)[0],
                    j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : ''; j.async = true; j.src =
                        'https://www.googletagmanager.com/gtm.js?id=' + i + dl; f.parentNode.insertBefore(j, f);
            })(window, document, 'scr

In [23]:
def get_events(soup):
    events = soup.findAll('article', class_="event-item")
    return events

In [28]:
test_events = get_events(test_soup)
test_event = test_events[0]
test_event.prettify()

'<article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event">\n <span style="display:none;">\n  <time datetime="2020-09-05T00:00" itemprop="startDate">\n   2020-09-05T00:00\n  </time>\n </span>\n <a href="/events/1396872">\n  <img height="76" src="/images/events/flyer/2020/9/ca-0905-1396872-list.jpg" width="152"/>\n </a>\n <div class="bbox">\n  <h1 class="event-title" itemprop="summary">\n   <a href="/events/1396872" itemprop="url" title="Event details of Electric Island Labour Day 1">\n    Electric Island Labour Day 1\n   </a>\n   <span>\n    at\n    <a href="/club.aspx?id=182456">\n     Ontario Place West Island\n    </a>\n   </span>\n  </h1>\n  <div class="grey event-lineup">\n   T.B.A\n  </div>\n  <p class="attending">\n   <span>\n    7\n   </span>\n   Attending\n  </p>\n </div>\n</article>'

In [29]:
def get_title(event):
    title = event.find('h1', class_="event-title").find('a').text
    return title

In [31]:
test_title = get_title(test_event)
test_title

'Electric Island Labour Day 1'

In [38]:
def get_venue(event):
    venue = event.find('h1', class_="event-title").find('span').find('a').text
    return venue

In [39]:
test_venue = get_venue(test_event)
test_venue

'Ontario Place West Island'

In [50]:
def get_attendees(event):
    attendees = int(event.find('p', class_='attending').find('span').text)
    return attendees

In [52]:
test_attendees = get_attendees(test_event)
test_attendees

7

In [48]:
def get_date(soup):
    date = soup.find('p', class_="eventDate").find('a').find('span').text.strip('/').strip()
    return date

In [49]:
test_date = get_date(test_soup)
test_date

'Sat, 05 Sep 2020'

## Write a Function to Retrieve the URL for the Next Page

In [21]:
def next_page(soup):
    next_page_url = soup.find('li', id="liNext2").find('a').attrs['href']
    return next_page_url

In [22]:
test_next_url = next_page(test_soup)
test_next_url

'/events/ca/toronto/week/2020-09-06'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [58]:
events_list = []
base_url = 'https://www.residentadvisor.net'
extension = '/events'
url= base_url+extension
print(initial_url)
while len(events_list) < 3:
    soup = get_soup(url)
    events = get_events(soup)
    next_extension = next_page(soup)
    url = base_url+next_extension
    if len(events)> 0:
        date = get_date(soup)
        for event in events:
            event_dict = {
                'Event_Name': get_title(event), 
                'Venue': get_venue(event), 
                'Event_Date': date, 
                'Number_of_Attendees': get_attendees(event)
            }
            events_list.append(event_dict)
    else:
        continue

df = pd.DataFrame(events_list)
df  

https://www.residentadvisor.net/events


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Electric Island Labour Day 1,Ontario Place West Island,"Sat, 05 Sep 2020",7
1,Electric Island Labour Day 2,Ontario Place West Island,"Sun, 06 Sep 2020",5
2,Disco & House Music Tribute Party with Crystal...,Revival,"Sun, 06 Sep 2020",8


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!