# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [111]:
#Load the https://www.residentadvisor.net/events page in your browser.
from bs4 import BeautifulSoup
import requests
import pandas as pd

html_page = requests.get('https://www.residentadvisor.net/events') #Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser')
#soup.prettify


## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
#Open the inspect element feature in your browser
listed = soup.find('div', id="event-listing")
dates = [li.find('a').text for li in listed.findAll('li')]

In [3]:
listed.findAll('h1')[0].a['title']


'Event details of Lnsc presents Deeperluv with Cassy (Kwench, AUS)'

In [4]:
listed.findAll('h1')[3].span.text



"at It'll Do, Dallas/Fort Worth"

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

## Write a Function to Retrieve the URL for the Next Page

In [108]:
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser')
    listed = soup.find('div', id="event-listing")
    listes=listed.findAll('li')
    title_2=[]
    venue_2=[]
    attend=[]
    dates=[]
    df3=pd.DataFrame
    for lis in listes:
        date = lis.find('p', class_="eventDate date")
        event = lis.find('h1', class_="event-title")
        att=lis.find('p', class_="attending")
        #if there is date use the new data for following events
        if date:
            current=date.text[:16] 
        elif event:
            title_2.append(event.a['title'][17:])
            venue_2.append(event.span.text[3:])
            dates.append(current)
            #if there is event and att is True add that to attandies column, if not append 0.
            if att:
                attend.append(int(att.span.text))

            else:
                attend.append(0)

    df3=pd.DataFrame([title_2,venue_2,dates,attend]).transpose()
    df3.columns=["Title", "Venue", "Date", "Attandies"]
    return df3

In [109]:
def next_page(url):
    html_page = requests.get(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser')
     #check if the next page has no event
    noevent = soup.find('div', class_="but noEvents")
    if noevent:
     #if there is no event return None
        return None  
    else:     
        nextpage = soup.find('div', class_="page-items content sub clearfix")
        add=nextpage.findAll('li')[1].a['href']
        next_page_url="https://www.residentadvisor.net"+ add 
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [116]:
# Run the code to scrape the first page 
df_all=pd.DataFrame
url="https://www.residentadvisor.net/events/"
df_all=scrape_events(url)

#run next_page to find the next url.

url=next_page(url)

#running scrape_event will only possible if there is a url. If the url is none, 
#program will stop scraping

while url:  
    df2=scrape_events(url)
    #concat the new dataframe with the next page events with the complete dataset
    df_all=df_all.append(df2, ignore_index=True)
    # find the next page url
    url=next_page(url)

df_all= df_all.sort_values(by=['Attandies', 'Date'])
df_all.index=df_all.index.sort_values()
df_all = df_all.iloc[:1000]
df_all.head()

Unnamed: 0,Title,Venue,Date,Attandies
0,Rüfüs Du Sol Solace Tour 2019,"The Moody Theater, Austin","Fri, 26 Jul 2019",0
1,Cosmophoria Feat. Culttastic and Night Drive,"Bauhaus, Houston","Fri, 26 Jul 2019",0
2,Alteon,"Limelight, San Antonio","Fri, 30 Aug 2019",0
3,MartyParty (of Pantyraid),"The Parish, Austin","Fri, 30 Aug 2019",0
4,House of Tones presents: Colette and DJ Heather,"The Venue ATX, Austin","Sat, 10 Aug 2019",0


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!