# Web Scraping the Ryman Calendar

    In this exercise, your objective is to use BeautifulSoup in order to obtain a dataset of upcoming events at the Ryman. This information is available at https://ryman.com/events/, but you will take the contents of this website and convert it into a pandas DataFrame.



    The website splits the events across multiple pages, but start by just working on the first page. Later on in the exercise, you'll take what you've done for the first page and apply it across other pages.


In [66]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re

#### 1. Start by using either the inspector or by viewing the page source. Can you identify a tag that might be helpful for finding the names of all performers? For now, just worry about the headliner and don't worry about the opener. (Eg. For Vince Gill, featuring Wendy Moten, we only care about Vince Gill.) Make use of this to create a list containing just the names of each inductee.


In [2]:
endpoint = 'https://ryman.com/events/'
response = requests.get(endpoint)
soup = BeautifulSoup(response.text, 'html.parser')

In [3]:
headliners = list(filter(None, [row.h3.a.text.strip() for row in soup.select('.eventItem')]))
headliners = pd.DataFrame(headliners, columns={'Headliner': headliners})
headliners 

Unnamed: 0,Headliner
0,Sofia Niño De Rivera
1,A Tribute to Ramblin' Jack Elliott - CANCELED
2,Straight No Chaser
3,Clairo
4,Ryman Sidewalk Sessions with Sam Jones & the W...
5,Nitty Gritty Dirt Band & Friends
6,Kathleen Madigan
7,Dawes
8,Bonnie Raitt
9,Leon Bridges


#### 2. Next, try and find a tag that could be used to find the date and time for each show. Extract these into a list. Challenge: Convert these into two lists, one containing the date and the other containing the time. (Eg. split Mar 9, 2023 8:00 PM into Mar 9, 2023 and 8:00 PM.) 


In [None]:
strings = [row.text for row in soup.select('.m-date__singleDate')]
datetimes = [datetime.strptime(row, '%b %d, %Y %I:%M %p') for row in strings]
dates = [row.strftime('%b %d, %Y') for row in datetimes]
times = [row.strftime('%I:%M %p') for row in datetimes]
showtimes = pd.DataFrame({'Date':dates, 'Time':times})
showtimes

Unnamed: 0,Date,Time
0,"Nov 01, 2024",08:00 PM
1,"Nov 02, 2024",08:00 PM
2,"Nov 03, 2024",07:00 PM
3,"Nov 04, 2024",07:30 PM
4,"Nov 07, 2024",05:30 PM
5,"Nov 09, 2024",07:00 PM
6,"Nov 10, 2024",07:30 PM
7,"Nov 11, 2024",07:30 PM
8,"Nov 14, 2024",07:30 PM
9,"Nov 17, 2024",07:30 PM


In [45]:
# datelist = []
# for event in events:
#     month = event.find('a', class_='m-date__month').text.strip()
#     day = event.find('a', class_='m-date__day').text.strip()
#     year = event.find('a', class_='m-date__year').text.strip()
#     date = year + month + day
#     datelist.append(date)

# datelist

months = [month.text.strip() for month in soup.select('.m-date__month')]
days = [day.text.strip() for day in soup.select('.m-date__day')]
years = [year.text.strip() for year in soup.select('.m-date__year')]
dates = [x+' '+y+z for x,y,z in zip(months, days, years)]
display(dates)

times = [hour.text.strip() for hour in soup.select('.m-date__hour')]
display(times)

# datetime_df = pd.DataFrame({'Date':dates, 'Time':times})
# datetime_df

['Nov 1, 2024',
 'Nov 2, 2024',
 'Nov 3, 2024',
 'Nov 4, 2024',
 'Nov 7, 2024',
 'Nov 7, 2024',
 'Nov 8, 2024',
 'Nov 9, 2024',
 'Nov 10, 2024',
 'Nov 11, 2024',
 'Nov 11, 2024',
 'Nov 13, 2024']

['8:00 PM',
 '8:00 PM',
 '7:00 PM',
 '7:30 PM',
 '5:30 PM',
 '7:00 PM',
 '7:30 PM',
 '7:30 PM',
 '7:30 PM',
 '7:30 PM']

#### 3. Take the lists you created on parts 1 and 2 and convert them into a pandas DataFrame.


In [5]:
shows = pd.merge(left=headliners, right=showtimes, left_index=True, right_index=True)
shows

Unnamed: 0,Headliner,Date,Time
0,Sofia Niño De Rivera,"Nov 01, 2024",08:00 PM
1,A Tribute to Ramblin' Jack Elliott - CANCELED,"Nov 02, 2024",08:00 PM
2,Straight No Chaser,"Nov 03, 2024",07:00 PM
3,Clairo,"Nov 04, 2024",07:30 PM
4,Ryman Sidewalk Sessions with Sam Jones & the W...,"Nov 07, 2024",05:30 PM
5,Nitty Gritty Dirt Band & Friends,"Nov 09, 2024",07:00 PM
6,Kathleen Madigan,"Nov 10, 2024",07:30 PM
7,Dawes,"Nov 11, 2024",07:30 PM
8,Bonnie Raitt,"Nov 14, 2024",07:30 PM
9,Leon Bridges,"Nov 17, 2024",07:30 PM


#### 4. **Bonus #1:**: Add to your data frame the opening act for all shows that list an opener.


In [None]:
# strings = [row.text for row in soup.select('.m-date__singleDate')]
# datetimes = [datetime.strptime(row, '%b %d, %Y %I:%M %p') for row in strings]
# times = [row.strftime('%I:%M %p') for row in datetimes]

In [None]:
headliners = []
openers = []
months = []
events = soup.find_all('div', class_='eventItem')
here=[]
for event in events:
    layers = event.text.strip()
    layers = re.sub(r'[\n\t]+', '+', layers)
    row_data = layers.split('+')
    here.append(row_data)


# whole_df = whole_df.loc[:,['Headliner', 'Opener', 'Date', 'Time']]

# display(whole_df)
here
show_info_df = pd.DataFrame(here, columns=['Date', 'Location', 'Headliner', 'Opener', 'Notes']).dropna(subset='Headliner').drop(columns='Notes')
show_info_df['Opener'] = show_info_df['Opener'].apply(lambda x:x if 'with' in x else '')
show_info_df = show_info_df.loc[:,['Headliner', 'Opener', 'Location', 'Date']]
show_info_df
show_info_df


Unnamed: 0,Headliner,Opener,Location,Date
1,Sofia Niño De Rivera,,Ryman Auditorium,"Nov 1, 2024 8:00 PM"
2,A Tribute to Ramblin' Jack Elliott - CANCELED,,Ryman Auditorium,"Nov 2, 2024 8:00 PM"
3,Straight No Chaser,,Ryman Auditorium,"Nov 3, 2024 7:00 PM"
4,Clairo,with Alice Phoebe Lou,Opry House,"Nov 4, 2024 7:30 PM"
5,Ryman Sidewalk Sessions with Sam Jones & the W...,,PNC Plaza,"Nov 7, 2024 5:30 PM"
6,Nitty Gritty Dirt Band & Friends,"with Wine, Women and Song: Suzy Bo...",Ryman Auditorium,"Nov 7- 8, 2024"
7,Kathleen Madigan,,Ryman Auditorium,"Nov 9, 2024 7:00 PM"
8,Dawes,with Winnetka Bowling League,Ryman Auditorium,"Nov 10, 2024 7:30 PM"
9,Bonnie Raitt,with James Hunter,Opry House,"Nov 11, 2024 7:30 PM"
10,Leon Bridges,with Hermanos Gutiérrez,Ryman Auditorium,"Nov 11-13, 2024"


#### 5. **Bonus #2:**: Now, let's see if we can get the results beyond the first page. For this, you'll need to Web Developer Tools of your browser and navigate to the Network tab. Click the "Load More Events" button and you should see a GET request to the www.ryman.com domain. 

##### a. Inspect this request and you should see that it goes to a URL like "https://www.ryman.com/events/events_ajax/24?category=0&venue=0&team=0&exclude=&per_page=12&came_from_page=event-list-page". In your Jupyter notebook, send a get request to this url and inspect the results.  

##### b. You should find that the results that you get are HTML, but that they are not exactly formatted in a way that can be parsed. See if you can clean up the results set so that you can extract out the same information as above. 

##### c. Create a DataFrame that contains data for the next 60 shows.