# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

In [49]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import numpy as np

In [50]:
page = requests.get('https://www.residentadvisor.net/events/us/newyork/day/2019-05-18')
soup = BeautifulSoup(page.content, 'lxml')

In [68]:
events = soup.find("div", id="event-listing")
entries = events.findAll('li')
print(entries[0].text.strip().replace("/", ""))
print(entries[1].find('h1', class_="event-title").text)
print(entries[1].find('div', class_="grey event-lineup").text)
print(entries[1].find('p', class_="attending").text)
print(entries[1].find('a'))
print(entries[1].find('href'))



entries[1]

Sat, 18 May 2019 
Unter x Discwoman: Mayhem at Sugar Hill Disco
Akua, AZF, Jasmine Infiniti, Juana, Rachel Noon, Russell E.L. Butler, Serena Jara, Umfang, Volvox, VTSS
558 Attending
<a href="/events/1253873#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a>
None


<li class=""><article class="event-item pick clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1253873#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-05-18T00:00" itemprop="startDate">2019-05-18T00:00</time></span><a href="/events/1253873"><img height="76" src="/images/events/flyer/2019/5/us-0518-1253873-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1253873" itemprop="url" title="Event details of Unter x Discwoman: Mayhem">Unter x Discwoman: Mayhem</a> <span>at <a href="/club.aspx?id=136811">Sugar Hill Disco</a></span></h1><div class="grey event-lineup">Akua, AZF, Jasmine Infiniti, Juana, Rachel Noon, Russell E.L. Butler, Serena Jara, Umfang, Volvox, VTSS</div><p class="attending"><span>558</span> Attending</p><div 

In [69]:
q = [entries[i].text for i in range(len(entries))]
q

['Sat, 18 May 2019 /',
 '2019-05-18T00:00Unter x Discwoman: Mayhem at Sugar Hill DiscoAkua, AZF, Jasmine Infiniti, Juana, Rachel Noon, Russell E.L. Butler, Serena Jara, Umfang, Volvox, VTSS558 AttendingRA PickDiscwoman and Unter come together for the crossover event of the season, with a wild lineup of high-intensity techno and hardcore sounds. ',
 '2019-05-18T00:00The Bunker with DJ Nobu, Dr. Rubinstein, Wata Igarashi, Derek Plaslaiko at Market HotelDJ Nobu, Dr Rubinstein, Wata Igarashi, Derek Plaslaiko341 AttendingRA PickDJ Nobu will beam outer-space frequencies straight into your third eye. ',
 "2019-05-18T00:00Wrecked with CEM at BASEMENTWrecked, CEM, Ron Like Hell, Ryan Smith207 AttendingRA PickOne of the city's sweatiest gay parties, headlined by a resident of the now famous Herrensauna in Berlin. ",
 '2019-05-18T00:00KUNÁ Sunset Rooftop with Elfenberg, El Mundo, Lemurian, Ameme at Williamsburg HotelElfenberg, El Mundo, Lemurian, Ameme, Oktave, Mashrik149 Attending',
 "2019-05-18

In [105]:
def scrape_events(events_page_url):
    #Your code here
    
    page = requests.get(events_page_url)
    soup = BeautifulSoup(page.content, 'lxml')
    
    events = soup.find("div", id="event-listing")
    entries = events.findAll('li')
    
    Event_Date = [entries[0].text.strip().replace("/", "") for i in range(len(entries))]
    
    Event_Name = []
    for i in range(len(entries)):
        try:
            Event_Name.append(entries[i].find('h1', class_="event-title").text)
        except:
            Event_Name.append("None_")
            
    Event_Lineup = []
    for i in range(len(entries)):
        try:
            Event_Lineup.append(entries[i].find('div', class_="grey event-lineup").text)
        except:
            Event_Lineup.append("None_")
            
    Event_Venue = []
    for i in range(len(entries)):
        try:
            Event_Venue.append(entries[i].find('div', class_="grey event-lineup").text)
        except:
            Event_Venue.append("None_")
    
    Number_of_Attendees = []
    for i in range(len(entries)):
        try:
            Number_of_Attendees.append(entries[i].find('p', class_="attending").text)
        except:
            Number_of_Attendees.append("None_")

    events_dic = {}
    events_dic["Event_Name"] = Event_Name
    events_dic["Event_Date"] = Event_Date
    events_dic["Event_Lineup"] = Event_Lineup
    events_dic["Number_of_Attendees"] = Number_of_Attendees
    
    df = pd.DataFrame(events_dic)
    df = df.iloc[1:]
    
#   df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
#   return df
    return df

In [106]:

scrape_events('https://www.residentadvisor.net/events/us/newyork/day/2019-05-18')




Unnamed: 0,Event_Name,Event_Date,Event_Lineup,Number_of_Attendees
1,Unter x Discwoman: Mayhem at Sugar Hill Disco,"Sat, 18 May 2019","Akua, AZF, Jasmine Infiniti, Juana, Rachel Noo...",558 Attending
2,"The Bunker with DJ Nobu, Dr. Rubinstein, Wata ...","Sat, 18 May 2019","DJ Nobu, Dr Rubinstein, Wata Igarashi, Derek P...",341 Attending
3,Wrecked with CEM at BASEMENT,"Sat, 18 May 2019","Wrecked, CEM, Ron Like Hell, Ryan Smith",207 Attending
4,"KUNÁ Sunset Rooftop with Elfenberg, El Mundo, ...","Sat, 18 May 2019","Elfenberg, El Mundo, Lemurian, Ameme, Oktave, ...",149 Attending
5,Golden Record x The Selectors with Masomenos (...,"Sat, 18 May 2019","Masomenos, Eddie Fowlkes, Steve O'Sullivan, Ta...",79 Attending
6,Solar IV at H0L0,"Sat, 18 May 2019","Simo Cell, DJ Wawa, Seth Magoon, Fever Dream, ...",76 Attending
7,"ReSolute w Amorf, SIT, & Vincent Lemieux at TB...","Sat, 18 May 2019","Amorf, SIT, Vincent Lemieux, O.Bee",72 Attending
8,"Kindergarten with Vagabundo Club Social, Tom N...","Sat, 18 May 2019","ZONE ONE, └ Vagabundo Club Social, └└ Tom Nobl...",71 Attending
9,"Neon Indian (DJ Set at Elsewhere Rooftop), Bal...","Sat, 18 May 2019","ROOFTOP, └ Neon Indian, └└ Baltra, └└└ Mira Fa...",58 Attending
10,Leaving New York to Change the World at TBA - ...,"Sat, 18 May 2019","Material Sound System, Very J, Robert Cary, Ne...",51 Attending


## Write a Function to Retrieve the URL for the Next Page

In [137]:
def next_page(url):
    #Your code here
    
    url_minus = url.replace(url[-2:], "{}")
    url_date = int(url[-2:])
    next_page_url = url_minus.format(str(url_date + 1))
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [138]:
#Your code here
url1 = 'https://www.residentadvisor.net/events/us/newyork/day/2019-05-18'
np = next_page(url1)
np

'https://www.residentadvisor.net/events/us/newyork/day/2019-05-19'

In [139]:
scrape_events(np)

Unnamed: 0,Event_Name,Event_Date,Event_Lineup,Number_of_Attendees
1,"Joseph Capriati Invites: Joseph Capriati, Len ...","Sun, 19 May 2019","Joseph Capriati, Len Faki, François K",572 Attending
2,Mister Sunday: Optimo All Day at Nowadays,"Sun, 19 May 2019","Optimo, JG Wilkes and JD Twitch",207 Attending
3,ebb + flow Boat Party with Dave Seaman & Justi...,"Sun, 19 May 2019","Dave Seaman, Justin Marchacos, Gavin Stephenso...",149 Attending
4,Zero presents... Dystōpia (Debut) with Be Sven...,"Sun, 19 May 2019","Be Svendsen, Nico Stojan, Magit Cacoon, Davi, ...",145 Attending
5,Tiki Disco (Elsewhere Rooftop Opening Weekend)...,"Sun, 19 May 2019",*Plenty of tickets available at the door for o...,46 Attending
6,"Weird Science no.11 with Gavin Rayna Russom, A...","Sun, 19 May 2019","Gavin Rayna Russom, An-i, Amourette",26 Attending
7,CAS NYC Celebrates 25 Years of Nas 'Illmatic' ...,"Sun, 19 May 2019","Love Injection, Paul Raffaele, Barbie Bertisch",13 Attending
8,Fundrager For 83: Earth Eater Trippjones GIA M...,"Sun, 19 May 2019","Eartheater, Trippjones, GIA, Moma ready, Noble...",3 Attending
9,"Revival Rooftop feat. Miss Jennifer, Logan at ...","Sun, 19 May 2019","Miss Jennifer, Logan",1 Attending
10,Poprally presents Stranger Vibes: A Night of A...,"Sun, 19 May 2019","Digital artist Sam Rolfes, performance artist ...",10 Attending


In [134]:
next_page(np)

IndexError: tuple index out of range

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!

In [202]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    event_listings = soup.find('div', id="event-listing")
    entries = event_listings.findAll('li')
    rows = []
    for entry in entries:
        #Is it a date? If so, set current date.
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    df.head()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [203]:
scrape_events('https://www.residentadvisor.net/events')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Innervisions New York,Knockdown Center,"Fri, 17 May 2019 /",706.0
1,Headless Horseman Live / Vatican Shadow / Volv...,BASEMENT,"Fri, 17 May 2019 /",255.0
2,Friday: PLO Man All Night,Nowadays,"Fri, 17 May 2019 /",95.0
3,ReSolute w Move D & Flabbergast,TBA - New York,"Fri, 17 May 2019 /",63.0
4,Material 17: Nico Laa,Hart bar,"Fri, 17 May 2019 /",24.0
5,Full Moon with Sébastien Léger,House Of Yes,"Fri, 17 May 2019 /",19.0
6,Pete Rock,Analog Bkny,"Fri, 17 May 2019 /",12.0
7,"Museum of Love (DJ set), L&l&l Record Club Plu...",Good Room,"Fri, 17 May 2019 /",11.0
8,"Just Blaze, Matt FX and Trillnatured",Elsewhere,"Fri, 17 May 2019 /",
9,"Rendezvous with Sons of Immigrants, Arvi, CGC",TBA Brooklyn,"Fri, 17 May 2019 /",


In [222]:
page2 = requests.get('https://washingtondc.craigslist.org/search/apa?availabilityMode=0&sale_date=all+dates')
apt_soup = BeautifulSoup(page2.content, 'lxml')

In [276]:
apts = apt_soup.findAll('li', class_='result-row')

In [277]:
apts

[<li class="result-row" data-pid="6890978800" data-repost-of="4906001946">
 <a class="result-image gallery" data-ids="1:00O0O_cKRizzCDdNt,1:00303_7LK5tEOyWaM,1:00l0l_5BmYFP0UHMi,1:00000_59uNVolto0a,1:00w0w_7feq5srRqyw,1:00C0C_cyT93Wfo3Cq,1:00q0q_hwYYTXMBvxq,1:00K0K_9Fom4cfmfcQ,1:00h0h_j7nuuaqAPw0" href="https://washingtondc.craigslist.org/doc/apa/d/washington-open-5-18-outstanding-1/6890978800.html">
 <span class="result-price">$1995</span>
 </a>
 <p class="result-info">
 <span class="icon icon-star" role="button">
 <span class="screen-reader-text">favorite this post</span>
 </span>
 <time class="result-date" datetime="2019-05-17 11:13" title="Fri 17 May 11:13:46 AM">May 17</time>
 <a class="result-title hdrlnk" data-id="6890978800" href="https://washingtondc.craigslist.org/doc/apa/d/washington-open-5-18-outstanding-1/6890978800.html">Open 5/18! Outstanding 1 Bedroom with Hardwood Floors &amp; in-unit W/D</a>
 <span class="result-meta">
 <span class="result-price">$1995</span>
 <span c

In [249]:
apts['data-pid']

'6890978800'

In [320]:
description = apt_soup.findAll('a', class_="result-title hdrlnk")[0]

In [321]:
description.text

'Open 5/18! Outstanding 1 Bedroom with Hardwood Floors & in-unit W/D'

In [243]:
price = apt_soup.findAll('span', class_="result-price")[0]

In [244]:
price.text

'$1995'

In [344]:
# initialize empty dictionary
dict = {}
# initialize empty list for column names

# loop the apartment enrites
apts = apt_soup.findAll('li', class_='result-row')

PID = [apt['data-pid'] for apt in apts]
Price = [apt.find('span', class_="result-price").text for apt in apts]

# if there are null values, use try/except
Desc = []
for apt in apts:
    try:
        Desc.append(apt.find('a', class_="result-title hdrlnk").text)
    except:
        Desc.append("None_")
        
Bedrooms = []
for apt in apts:
    try:
        Bedrooms.append(apt.find('span', class_="housing").text.strip()[:3])
    except:
        Bedrooms.append("None_")

for br in Bedrooms:
    if 'b' not in br:
        br="None_"

dict['PID'] = PID
dict['Description'] = Desc
dict['Bedrooms'] = Bedrooms
dict['Price'] = Price

df = pd.DataFrame(dict)


In [345]:
df

Unnamed: 0,PID,Description,Bedrooms,Price
0,6890978800,Open 5/18! Outstanding 1 Bedroom with Hardwood...,1br,$1995
1,6889360718,Modern Kitchen With Stainless Steel Appliances...,595,$1440
2,6890978544,Close to 66 - 2x2 townhome - Pets welcome,2br,$1390
3,6890978317,SUNNY & SPACIOUS 1BR Walking Distance to the M...,800,$1695
4,6890978222,HUGE Loft Style 1BR Next to Metro + MORE! Mins...,1br,$1999
5,6885778257,1 Bed/1 Bath Condo in Adams Morgan,1br,$2000
6,6888757112,"Management On-site, Only a Half Mile from PG P...",1br,$1175
7,6887094119,Large bedroom with a private bathroom for rent,2br,$1250
8,6890977862,1 Bedroom Available for Immediate Move! Rent $...,1br,$1730
9,6872559743,Fully Furnished Studio in Fantastic Neighborhood,1br,$2095
