# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [186]:
from bs4 import BeautifulSoup
import requests

In [196]:
import pandas as pd

def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    container = soup.find('div', class_='fl col4')
    events_container = container.findAll('article')
    length = len(events_container)
    
    names = []
    venues = []
    dates = []
    attendees = []
    
    for i in range(0, length):
        
        name = events_container[i].find('h1').find('a').text
        names.append(name)
        
        venue = events_container[i].find('h1').find('span').text
        venues.append(venue)
        
        date = events_container[i].find('time').attrs['datetime']
        date = date[0:10]
        dates.append(date)
        
        for i in range(0, length):    
            attendee = events_container[i].find('p', class_='attending')
            if attendee == None:
                attendee = 0
            else:
                attendee = int(events_container[i].find('p', class_='attending').find('span').text)
            attendees.append(attendee)
 
    df = pd.DataFrame([names, venues, dates, attendees]).transpose()
    df.dropna(how='any', inplace=True)
    df.columns = ['Event_Name', 'Venue', 'Date', 'Number_Attending']
    
        
    return df
    
    
scrape_events('https://www.residentadvisor.net/events')    


Unnamed: 0,Event_Name,Venue,Date,Number_Attending
0,"Paradox with Rapha, Davide Del Vecchio, Alison...",at Egg London,2019-08-13,97
1,Defected Croatia 2019,at TBA - London,2019-08-13,40
2,Umami Salame Afterhours,"at Union Club, Vauxhall",2019-08-13,37
3,Sneak Every Tuesday at Xoyo,at XOYO,2019-08-13,31
4,Black Milk (Live),at The Jazz Cafe,2019-08-14,1
5,"Final Cut: Midweek Party - R&B, Charts, House ...",at Egg London,2019-08-14,6
6,Glory To Sound: Sophie,at Somerset House,2019-08-14,4
7,Diggers Dozen,at Brilliant Corners,2019-08-14,3
8,Mahiki Wednesday,at Mahiki,2019-08-14,0
9,Wednesdays // £2.50 Drinks,at Piccadilly Institute,2019-08-14,0


In [193]:
scrape_events('https://www.residentadvisor.net/events').

210

## Write a Function to Retrieve the URL for the Next Page

In [188]:
from datetime import datetime
from datetime import timedelta

In [189]:
def next_page(url):
    #Your code here
    date = url[-10:]
    date_format = datetime.strptime(date, '%Y-%m-%d').date()
    next_date = date_format + timedelta(days=7)
    next_date_format = str(next_date)
    next_page_url = url[0:-10] + next_date_format   
    return next_page_url

In [190]:
next_page('https://www.residentadvisor.net/events/uk/london/week/2019-08-02')

'https://www.residentadvisor.net/events/uk/london/week/2019-08-09'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [217]:
#Your code here

def next_events(number, url):
    
    current_url = url
    count = 0
    grand_df = pd.DataFrame(columns=['Event_Name', 'Venue', 'Date', 'Number_Attending'])
    
    while count <= number:
        df = scrape_events(current_url)
        count += len(scrape_events(current_url))
        current_url = next_page(current_url)
        grand_df = pd.concat([grand_df, df])
    
    grand_df.reset_index(inplace=True)
    grand_df.drop('index', axis=1, inplace=True)

    return grand_df
        
        
    
    


In [218]:
next_events(1000, 'https://www.residentadvisor.net/events/uk/london/week/2019-08-19')

Unnamed: 0,Event_Name,Venue,Date,Number_Attending
0,Desire Magic Mondays,"at Union Club, Vauxhall",2019-08-19,3
1,The Cause X London DJ School (DJ Lessons),at The Cause,2019-08-19,1
2,Rum & Wings - All Day Party,at Brixton Jamm,2019-08-19,1
3,Play London Every Monday at Xoyo,at XOYO,2019-08-19,1
4,Clear Soul Forces,at Cargo,2019-08-19,0
5,Point.// Sunday Daytime Outdoor,at Grow Tottenham,2019-08-19,0
6,Bashment Meets Hiphop & Afrobeats Day Party,at The Hoxton Pony,2019-08-19,0
7,Kandy Mondays,at The Roxy,2019-08-19,0
8,Paradox presents Tuesday Madness,at Egg London,2019-08-20,89
9,Umami Salame presents Saytek (Live),"at Union Club, Vauxhall",2019-08-20,36


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!