# Web Scraping with Beautiful Soup: Resident Advisor

## Task:

Resident Advisor is an events listing website for electronic music.

Go to www.residentadvisor.net/events.  This is the url we'll be starting with for this lab.  For question 1, just use this url.  In the next two, you'll use country and region in the format: http://www.residentadvisor.net/country/region/ i.e. us/losangeles/.  Be sure to explore the web pages in both the browser and the HTML file.  You'll need both to really understand what's going on.

1. Which venues are hosting events this week?
2. Make a function which returns the events this week given region and country (this will take two arguments)
    - return the event name, link, and list of artists
    - function returns list of ['event name', 'www.linkaddress.com', ['artist1','artist2','artist3']]
3. Create a function which returns the users attending 
4. Putting data into a dataframe
5. Comparing data across dataframes


### Question 1 - Which venues are hosting events this week?

In [1]:
#Enter starting_date and ending_date for queries
starting_date = '2019-03-22'
ending_date = '2019-03-29'

In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import date
from dateutil.rrule import rrule, DAILY
import pprint as pp

In [4]:
#creates list of dates between starting_date and ending_date, inclusive
def find_dates(start_date, end_date):
    list_of_days = []
    daterange = pd.date_range(start_date, end_date)
    for single_date in daterange:
        list_of_days.append(single_date.strftime("%Y-%m-%d"))
    
    return list_of_days
#build RA url from date, country, and region
def build_url(date, country, region):
    host = 'https://www.residentadvisor.net/events/'
    url_country = country
    url_region = region
    url_date = date
    return f"{host}{url_country}/{url_region}/day/{url_date}"

In [5]:
#function to find venues with event in a given country & region for a given date
def find_nightly_venues(date, country, region):
    #calls url function and parses results into Beautiful Soup
    r = requests.get(build_url(date, country, region))
    c = r.content
    soup = BeautifulSoup(c, 'html.parser')
    
    #isolate HTML with event data
    events = soup.find_all('article', class_='event-item')
    
    #container for venues with events on a given night
    nightly_venues = []
    
    #loop through events to find element containing venue name
    for event in events:
        all_span = event.find_all('span')
        for item in all_span:
            if 'at' in item.get_text():
                #append event to container after splitting and stripping
                nightly_venues.append((item.get_text().split('at')[1].strip()))
            
    return nightly_venues

"""Function that loops through a list of dates, passing each date into the
find_nightly_venues function, and creates dictionary from return values"""
def find_venues(start_date, end_date, country, region):
    final_dict = {}
    list_of_dates = find_dates(start_date, end_date)
    for date in list_of_dates:
        final_dict[date] = find_nightly_venues(date, country, region)
    return final_dict

pp.pprint(find_venues(starting_date, ending_date, 'us', 'newyork'))

{'2019-03-22': ['Avant Gardner',
                'TBA - Brooklyn',
                'Knockdown Center',
                'Analog Bkny',
                'Elsewhere',
                '99 Scott Ave',
                'Nowadays',
                'Good Room',
                'TBA - Brooklyn',
                'TBA - New York',
                'Rose Gold',
                'Schimanski',
                'Eris',
                'Elsewhere',
                'TBA Brooklyn',
                'Hart bar',
                'Polygon BK',
                'House Of Yes',
                'Bossa Nova Civic Club',
                'Jupiter Disco',
                'Le Bain',
                'Nublu',
                'Ceremony',
                'Black Flamingo',
                'Le Poisson Rouge',
                'The Deep End',
                'The Spirit Room, Buffalo/Rochester',
                'Ms. Yoo',
                'TBA - New York',
                'Ignight'],
 '2019-03-23': ['Avant Gardner',
              

### Question 2 - Write a function to which returns the events this week given region and country.

In [6]:
#function to find events in a given country & region for a given date
def find_nightly_events(date, country, region):
    #pull correct url
    r = requests.get(build_url(date, country, region))
    c = r.content
    soup = BeautifulSoup(c, 'html.parser')
    
    #isolate HTML with event data
    events = soup.find_all('article', class_='event-item')
    
    nightly_events = []
    
    #loop through events to find element containing venue name
    for event in events:
        all_h1 = event.find_all('h1')
        for item in all_h1:
            nightly_events.append(item.get_text())
            
    return nightly_events

"""Function that loops through a list of dates, passing each date into the
find_nightly_events function, and creates dictionary from return values"""
def find_events(start_date, end_date, country, region):
    final_dict = {}
    list_of_dates = find_dates(start_date, end_date)
    for date in list_of_dates:
        final_dict[date] = find_nightly_events(date, country, region)
    return final_dict

pp.pprint(find_events(starting_date, ending_date, 'us', 'sanfrancisco'))

{'2019-03-22': ['We Are Monsters: Marie Davidson Live at The Stud',
                'DTE with Erika, Deraout, House of Velocity & Tariq at F8 1192 '
                'Folsom',
                'Huerco S. & DJ Python - Brouhaha x Public Works at Public '
                'Works',
                'Mark Farina 50th Birthday at The Great Northern',
                'Maayan Nidam Hosted by Chvck at Phonobar',
                'A Special Evening with Herodot (Sunrise, Romania) at TBA - '
                'San Francisco',
                'Riva Starr (Dirtybird / Defected / Snatch!) at TBA - San '
                'Francisco',
                'Fleetmac Wood presents Rumours Rave - Santa Cruz at The '
                'Catalyst',
                'PROX.IM.I.TY: Jordan Poling, Brendan Finlayson, Victor Vega '
                'at Underground SF',
                'Foals DJ Set at Halcyon',
                'Carl Cox, Joseph Capriati & Pleasurekraft at The Midway',
                'Supervixen Late Nite Sessi

In [4]:
# you should be able to output something like this
find_events('us','sanfrancisco')[0]x

['Housepitality:Della, Homero Espinosa, Jason Peters at F8 1192 Folsom',
 'http://residentadvisor.net/events/1173172',
 ['Della', 'Homero Espinosa', 'Jason Peters']]

### Question 3 - Create a function which returns the numbers of users attending each event this week, given country and region.  Then plot a histogram.

In [8]:
def users_attending(country, region):

    
    
    
    
    

In [11]:
# you should be able to output something like this
users_attending('us','newyork')[:10]

[8, 5, 4, 3, 49, 18, 10, 3, 2, 11]

In [None]:
#now use the function to make a histogram



### Question 4 - Put the data for all concerts in the US and UK into Pandas dataframes.
Think about what columns to include - concert titles, region, venues, URLs, dates, etc. You'll want to have a dataframe per country.

Also think about how to deal with inconsistent/NaN values.

In [1]:
import pandas as pd

### Question 5 (Bonus) - Compare the concert scenes of the two countries and find:
1. The difference in the number of concert-hosting venues per country
2. The number of concerts happening THIS SATURDAY in each country
3. Are there any artists playing in both countries?