I chose to create a database of all the events booked within a year at an event venue in Minneapolis, "Xcel Energy Center". This venue is home to the Minnesota Wild in addition to a music venue for large acts and such.

Disclaimer: I initially started doing this as separate functions but later found out a flaw in ThreadPoolExecutor in that it essentially randomizes the order in list outputs, and if you're trying to combine things that rely on a sequential order, this will prove useless.

As such, I ended up combining everything in one function. But I chose to leave this notebook as a procession of my progress rather than cleaning it up so it only has the final function. So the final function is near the bottom.

In [73]:
import pandas as pd
import re
from concurrent.futures import ThreadPoolExecutor

In [2]:
# Our jupyter/datascience-notebook Docker container comes with 
# BeautifulSoup4 and requests, both popular libraries!

from bs4 import BeautifulSoup
import requests

In [3]:
START_URL = 'https://www.xcelenergycenter.com/events/all'

In [4]:
response = requests.get(START_URL)
soup = BeautifulSoup(response.text, 'html.parser')

# From the initial webpage, get the "More Info" URL present next to each event.

In [5]:
%%time
##6.9 ms -- no need for distributing workloads 

all_urls = soup.select('#list > div > div.thumb > div > a')

all_urls_list = []

for i in range(0, len(all_urls)):
    all_urls_list.append(all_urls[i]['href'])    

##this includes both the ticket pages and the "more info" pages, but if a show is cancelled
##they will just remove the link to purchase tickets and then the list becomes out of order
##but the "more info" button is always there! so sub out those ones:
    
substring = 'xcelenergycenter'

more_info_url_list = []

for i in range(0, len(all_urls_list)):
    if substring in all_urls_list[i]:
        more_info_url_list.append(all_urls_list[i])    

print(more_info_url_list)

##make it a column
pdmore_info_url_list = pd.DataFrame(more_info_url_list)


['https://www.xcelenergycenter.com/events/detail/mshsl-girls-volleyball-tournament-5', 'https://www.xcelenergycenter.com/events/detail/drake-1', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-washington-2', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-vancouver-3', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-buffalo-1', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-ottawa-2', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-winnipeg-8', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-arizona-4', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-toronto-1', 'https://www.xcelenergycenter.com/events/detail/1013-kdwbs-jingle-ball-presented-by-capital-one', 'https://www.xcelenergycenter.com/events/detail/disney-on-ice-celebrates-100-years-of-magic', 'https://www.xcelenergycenter.com/events/detail/minnesota-wild-vs-montreal-2', 'https://www.xcelenergycenter.com

# Now, create separate functions to get each of the elements from the page that you would like from each URL.

In [6]:
%%time
##23 µs -- no need for distributing workloads 

titlez_list = []

def get_titles(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the title from the pages
    titlez = newsoup.select('#content > div.event_detail.one_sidebar_right.has_branding > div.title_section.clearfix > div.event_heading.below_branding > h1')    
    titlez_list.append(titlez[0].text[3:-1])
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.91 µs


### Use the thread pool functions to distribute the workload.

In [7]:
%%time
titlez_list = []

with ThreadPoolExecutor(50) as pool:
       resultstitles = pool.map(get_titles, more_info_url_list)
        
print(titlez_list)

['Minnesota Wild vs. Arizona', 'Disney On Ice', 'Drake - cancelled', "MSHSL Girls' State Volleyball Tournament", 'Minnesota Wild vs. Anaheim', 'Minnesota Wild vs. Dallas', 'Minnesota Wild vs. Winnipeg', 'Minnesota Wild vs. Anaheim', 'Wild About Children', 'Minnesota Wild vs. Chicago', 'Minnesota Wild vs. New York Rangers', 'Minnesota Wild vs. Los Angeles', "101.3 KDWB's Jingle Ball", 'NCHC Frozen Faceoff', 'Minnesota Wild vs. Montreal', 'Minnesota Wild vs. Philadelphia', 'Blake Shelton', 'Minnesota Wild vs. St. Louis', 'Minnesota Wild vs. Washington', 'Minnesota Wild vs. Columbus', 'Minnesota Wild vs. St. Louis', 'Minnesota Wild vs. Dallas', 'Cher', 'Minnesota Wild vs. San Jose', 'Trans-Siberian Orchestra', 'Minnesota Wild vs. Vancouver', 'Minnesota Wild vs. Edmonton', 'Kelly Clarkson', 'P!NK', 'Minnesota Wild vs. Winnipeg', 'Minnesota Wild vs. Detroit', 'Minnesota Wild vs. Colorado', 'Trevor Noah: Loud & Clear Tour', 'Minnesota Wild vs. Boston', 'Minnesota Wild vs. Buffalo', 'Minnesot

In [8]:
%%time
##23 µs -- no need for distributing workloads 

month_list = []

def get_months(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the month from the pages
    months = newsoup.select('span > span > span.m-date__month')    
    month_list.append(months[0].text)
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 5.96 µs


In [9]:
%%time
month_list = []

with ThreadPoolExecutor(50) as pool:
       resultsmon = pool.map(get_months, more_info_url_list)
        
print(month_list)

['Dec ', 'Dec ', 'Nov ', 'Dec ', 'Nov ', 'Dec ', 'Dec ', 'Nov ', 'Nov ', 'Dec ', 'Dec ', 'Nov ', 'Dec ', 'Nov ', 'Jan ', 'Nov ', 'Dec ', 'Nov ', 'Feb ', 'Jan ', 'Jan ', 'Feb ', 'Feb ', 'Dec ', 'Dec ', 'Feb ', 'Feb ', 'Jan ', 'Apr ', 'Mar ', 'Mar ', 'Feb ', 'Mar ', 'Jan ', 'Mar ', 'Mar ', 'Jan ', 'Dec ', 'Mar ', 'Feb ', 'Mar ', 'Feb ', 'Feb ', 'Apr ', 'Apr ', 'Mar ', 'Mar ', 'July ', 'May ', 'June ', 'June ', 'May ', 'Mar ', 'July ']
CPU times: user 3.26 s, sys: 627 ms, total: 3.88 s
Wall time: 2.7 s


In [10]:
%%time
##23 µs -- no need for distributing workloads 

weekday_list = []

def get_weekdays(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the weekday from the pages
    weekday = newsoup.select('span > span > span.m-date__weekday')
    
    weekday_list.append(weekday[0].text[:3])
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 5.72 µs


In [11]:
%%time

weekday_list = []

## this is 16 times faster with thread pool
with ThreadPoolExecutor(50) as pool:
       resultsWEEKDAY = pool.map(get_weekdays, more_info_url_list)
        
print(weekday_list)

['Thu', 'Thu', 'Wed', 'Tue', 'Sat', 'Wed', 'Tue', 'Sun', 'Sat', 'Thu', 'Fri', 'Sat', 'Tue', 'Thu', 'Mon', 'Sat', 'Sat', 'Thu', 'Tue', 'Sat', 'Thu', 'Sat', 'Tue', 'Fri', 'Mon', 'Thu', 'Sun', 'Fri', 'Mon', 'Sat', 'Sun', 'Sat', 'Sun', 'Sun', 'Tue', 'Sat', 'Sat', 'Fri', 'Sun', 'Tue', 'Tue', 'Tue', 'Fri', 'Sat', 'Sat', 'Sun', 'Tue', 'Mon', 'Thu', 'Thu', 'Wed', 'Sat', 'Thu', 'Fri']
CPU times: user 3.14 s, sys: 372 ms, total: 3.51 s
Wall time: 2.68 s


In [12]:
%%time
##22 µs -- no need for distributing workloads 

numday_list = []

def get_numdays(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the day of the month from the pages
    numday = newsoup.select('span > span > span.m-date__day')
    
    numday_list.append(numday[0].text)
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 6.91 µs


In [13]:
%%time

numday_list = []

## this is 17 times faster with the thread pool
with ThreadPoolExecutor(50) as pool:
       resultsNUM = pool.map(get_numdays, more_info_url_list)
        
print(numday_list)

['29', '12', '18', ' 8', '21', ' 9', '15', '11', ' 3', ' 6', '13', '27', '15', '16', ' 7', '25', ' 1', '15', '19', '18', '15', '17', '14', '13', '22', '19', '31', '10', '12', '25', ' 4', '11', ' 3', '24', '21', '22', '30', ' 5', '16', '17', '17', '12', '23', '16', ' 2', '15', '11', '19', '11', ' 2', '17', '17', ' 6', '25']
CPU times: user 3.23 s, sys: 535 ms, total: 3.76 s
Wall time: 2.77 s


In [14]:
%%time
##22 µs -- no need for distributing workloads 

year_list = []

def get_year(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the year of the month from the pages
    year = newsoup.select('#column_1 > div > aside > div > div > div.details > ul > li > span > span > span.m-date__year')
    
    year_list.append(year[0].text[-4:])
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs


In [15]:
%%time

year_list = []

## this is 17 times faster with the thread pool
with ThreadPoolExecutor(50) as pool:
       resultsNUM = pool.map(get_year, more_info_url_list)
        
print(year_list)

['2018', '2018', '2018', '2018', '2018', '2018', '2018', '2018', '2018', '2018', '2018', '2019', '2019', '2018', '2019', '2018', '2018', '2019', '2018', '2018', '2018', '2018', '2018', '2019', '2019', '2018', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019']
CPU times: user 3.15 s, sys: 497 ms, total: 3.65 s
Wall time: 2.64 s


In [16]:
%%time

buy_tix_url_list = []
##10 µs -- no need for distributing workloads 

def get_buy_tix_url(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the url to buy tickets from the pages
    buy_tix_url = newsoup.select('#column_1 > div > aside > div > div > div.buttonWrapper > div > a')
    
    if buy_tix_url == []:
        buy_tix_url_list.append("This event is cancelled.")
    else:
        buy_tix_url_list.append(buy_tix_url[0]['href'])
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.91 µs


In [17]:
%%time

buy_tix_url_list = []

## this is literally 16 times faster than without the thread pool
with ThreadPoolExecutor(50) as pool:
       results = pool.map(get_buy_tix_url, more_info_url_list)
        
print(buy_tix_url_list)

['https://www.ticketmaster.com/venueartist/49594/1901045?brand=xcelenergy&camefrom=CFC_XCEL_WEB_MSHSLVOLLEYBALL_NOV2018', 'This event is cancelled.', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-2019', 'http://www.ticketmaster.com/minnesota-wild-tickets/artist/805974?brand=xcelenergy&camefrom=CFC_XCEL_WEB_WILD_2018-201

In [18]:
%%time
##22 µs -- no need for distributing workloads 

time_list = []

def get_the_time(url): 
    
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    #grabs the starting time from the pages
    time = newsoup.select("span.time.cell")
    
    if time == []:
        time_list.append("NA")
    else:
        string = (time[0].text)
        time_list.append(re.findall(r'\d+', string))
    
    ##We don't actually want to return anything because we will be mapping this function


CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 5.96 µs


In [19]:
%%time

time_list = []

## this is 12.5 times faster than without the thread pool
with ThreadPoolExecutor(50) as pool:
       results = pool.map(get_the_time, more_info_url_list)
        
print(time_list)

[['3', '00'], ['7', '00'], ['6', '30'], ['7', '00'], ['7', '30'], ['5', '00'], ['7', '00'], ['5', '00'], ['7', '00'], ['2', '00'], ['7', '00'], ['7', '00'], ['7', '30'], ['7', '00'], ['7', '00'], ['7', '00'], ['7', '00'], ['7', '00'], ['7', '00'], ['8', '00'], ['7', '00'], ['7', '00'], ['7', '00'], ['12', '30'], ['7', '00'], ['3', '00'], ['7', '00'], ['6', '00'], ['7', '30'], ['7', '00'], ['8', '00'], ['4', '00'], ['7', '00'], ['7', '00'], ['9', '00'], ['7', '00'], ['7', '00'], 'NA', ['7', '00'], ['5', '30'], ['7', '30'], ['6', '00'], ['7', '30'], ['5', '00'], ['7', '30'], ['7', '00'], ['7', '30'], ['7', '00'], ['7', '00'], ['7', '30'], ['7', '30'], ['7', '00'], ['8', '00'], ['8', '00']]
CPU times: user 3.15 s, sys: 672 ms, total: 3.82 s
Wall time: 4.42 s


In [20]:
%%time
##80 µs -- no need for distributing workloads 

hour = [item[0] for item in time_list]
min = [item[1] for item in time_list]
full_time_list = []

for i in range(0, len(hour)):
    full_time_list.append(hour[i] + ":" + min[i])

CPU times: user 45 µs, sys: 1e+03 ns, total: 46 µs
Wall time: 48.9 µs


## Turn the initial "More Info URL" list into a pandas data frame to concatenate after.

In [21]:
more_info_url_pd = pd.DataFrame(more_info_url_list)

more_info_url_pd.columns = ["More Info URL"]

# I have discovered a major flaw in threadpooling -- it does not allow things to be done in an order. So now instead I will put all of this in one function in order to combat the randomization.

In [22]:
##First ensure all these lists are empty and/or give an easy way to clear them.

weekday_list = []
numday_list = []
month_list = []
year_list = []
time_list = []
titlez_list = []
buy_tix_url_list = []

In [23]:
weekday_list = []
numday_list = []
month_list = []
year_list = []
time_list = []
titlez_list = []
buy_tix_url_list = []


def grab_every_thang(url): 
  
    ##creates a soup item for each url starting with the starting url (given)
    newresponse = requests.get(url)
    newsoup = BeautifulSoup(newresponse.text, 'html.parser')
    
    
    #grabs the weekday from the pages
    weekday = newsoup.select('span > span > span.m-date__weekday')
    
    weekday_list.append(weekday[0].text[:3])
    
    
    #grabs the day of the month from the pages
    numday = newsoup.select('span > span > span.m-date__day')
    
    numday_list.append(numday[0].text)
    
    
    #grabs the month from the pages    
    months = newsoup.select('span > span > span.m-date__month')    
    month_list.append(months[0].text)
    
    
    #grabs the year of the month from the pages
    year = newsoup.select('#column_1 > div > aside > div > div > div.details > ul > li > span > span > span.m-date__year')
    
    year_list.append(year[0].text[-4:])
    
    
    #grabs the starting time from the pages
    time = newsoup.select("span.time.cell")
    
    if time == []:
        time_list.append("NA")
    else:
        string = (time[0].text)
        time_list.append(re.findall(r'\d+', string))
        
    
    #grabs the title from the pages
    titlez = newsoup.select('#content > div.event_detail.one_sidebar_right.has_branding > div.title_section.clearfix > div.event_heading.below_branding > h1')    
    titlez_list.append(titlez[0].text[3:-1])
    
    
    #grabs the url to buy tickets from the pages
    buy_tix_url = newsoup.select('#column_1 > div > aside > div > div > div.buttonWrapper > div > a')
    
    if buy_tix_url == []:
        buy_tix_url_list.append("This event is cancelled.")
    else:
        buy_tix_url_list.append(buy_tix_url[0]['href'])
    


### Now in theory they thread pools should run in an order such that each list will be created in the same random order, as long as there aren't too many pools.

In [49]:
weekday_list = []
numday_list = []
month_list = []
year_list = []
time_list = []
titlez_list = []
buy_tix_url_list = []

## this is a calculated number of pools so the final df isn't out of order
## takes about 10 seconds rather than 1 min 30
with ThreadPoolExecutor(10) as pool:
       resultados = pool.map(grab_every_thang, more_info_url_list)

In [50]:
##Ensure every list was completed

print(len(weekday_list))
print(len(numday_list))
print(len(month_list))
print(len(year_list))
print(len(time_list))
print(len(titlez_list))
print(len(buy_tix_url_list))

54
54
54
54
54
54
54


In [51]:
print(weekday_list)
print(numday_list)
print(month_list)
print(year_list)
print(time_list)
print(titlez_list)
print(buy_tix_url_list)


['Fri', 'Mon', 'Thu', 'Wed', 'Sat', 'Sun', 'Sat', 'Tue', 'Tue', 'Thu', 'Thu', 'Sun', 'Thu', 'Tue', 'Wed', 'Mon', 'Tue', 'Sat', 'Sat', 'Sat', 'Thu', 'Tue', 'Sat', 'Sat', 'Sat', 'Thu', 'Sat', 'Fri', 'Tue', 'Thu', 'Fri', 'Tue', 'Fri', 'Sun', 'Sat', 'Sun', 'Sat', 'Sun', 'Thu', 'Mon', 'Sun', 'Mon', 'Tue', 'Fri', 'Sat', 'Tue', 'Wed', 'Thu', 'Sun', 'Sat', 'Tue', 'Fri', 'Sat', 'Thu']
['23', ' 3', '15', '21', ' 1', '11', '17', '13', '27', ' 8', '13', '16', ' 6', '11', '12', '31', '18', '15', '29', '22', '10', '15', ' 2', '12', '19', '17', ' 9', '25', '12', ' 7', '15', '19', '15', '17', '16', '24', '16', ' 3', '14', '11', '17', '25', '19', '22', '30', ' 2', '17', ' 4', ' 5', '18', '11', '21', ' 6', '25']
['Nov ', 'Dec ', 'Nov ', 'Nov ', 'Dec ', 'Nov ', 'Nov ', 'Nov ', 'Nov ', 'Nov ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Dec ', 'Jan ', 'Jan ', 'Feb ', 'Jan ', 'Jan ', 'Jan ', 'Feb ', 'Jan ', 'Feb ', 'Feb ', 'Feb ', 'Feb ', 'Mar ', 'Feb ', 'Mar ', 'Feb ', 'Feb ',

In [52]:
##fix the time list
hour = [item[0] for item in time_list]
min = [item[1] for item in time_list]
full_time_list = []

for i in range(0, len(hour)):
    full_time_list.append(hour[i] + ":" + min[i])

In [53]:
##make month a number for later sorting purposes

for i in range(0, len(month_list)):
    if month_list[i] == "Jan ":
        month_list[i] = 1
    elif month_list[i] == "Feb ":
        month_list[i] = 2
    elif month_list[i] == "Mar ":
        month_list[i] = 3
    elif month_list[i] == "Apr ":
        month_list[i] = 4
    elif month_list[i] == "May ":
        month_list[i] = 5
    elif month_list[i] == "June ":
        month_list[i] = 6
    elif month_list[i] == "July ":
        month_list[i] = 7
    elif month_list[i] == "Aug ":
        month_list[i] = 8
    elif month_list[i] == "Sep ":
        month_list[i] = 9
    elif month_list[i] == "Oct ":
        month_list[i] = 10
    elif month_list[i] == "Nov ":
        month_list[i] = 11
    else:
        month_list[i] = 12

print(month_list)

[11, 12, 11, 11, 12, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 3, 2, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7, 7]


In [54]:
##make them a columns in pandas DF form
weekday_list = pd.DataFrame(weekday_list)
numday_list = pd.DataFrame(numday_list)
month_list = pd.DataFrame(month_list)
year_list = pd.DataFrame(year_list)
full_time_list = pd.DataFrame(full_time_list)
titlez_list = pd.DataFrame(titlez_list)
buy_tix_url_list = pd.DataFrame(buy_tix_url_list)


In [55]:
##Create the initial DF

Initial_DF = pd.concat([weekday_list,
                     numday_list,
                     month_list,
                     year_list,
                     full_time_list,
                     titlez_list, 
                     buy_tix_url_list],
                     axis = 1, ignore_index=True)
Initial_DF.columns = ["Week Day",
                   "Day",
                   "Month",
                    "Year",
                   "Time",
                   "Event", 
                   "Buy Tickets URL"]
Initial_DF

Unnamed: 0,Week Day,Day,Month,Year,Time,Event,Buy Tickets URL
0,Fri,23,11,2018,3:00,Minnesota Wild vs. Winnipeg,http://www.ticketmaster.com/minnesota-wild-tic...
1,Mon,3,12,2018,7:30,101.3 KDWB's Jingle Ball,http://www.ticketmaster.com/event/06005533CC98...
2,Thu,15,11,2018,7:00,Minnesota Wild vs. Vancouver,http://www.ticketmaster.com/minnesota-wild-tic...
3,Wed,21,11,2018,7:00,Minnesota Wild vs. Ottawa,http://www.ticketmaster.com/minnesota-wild-tic...
4,Sat,1,12,2018,6:00,Minnesota Wild vs. Toronto,http://www.ticketmaster.com/minnesota-wild-tic...
5,Sun,11,11,2018,N:A,Drake - cancelled,This event is cancelled.
6,Sat,17,11,2018,5:00,Minnesota Wild vs. Buffalo,http://www.ticketmaster.com/minnesota-wild-tic...
7,Tue,13,11,2018,7:00,Minnesota Wild vs. Washington,http://www.ticketmaster.com/minnesota-wild-tic...
8,Tue,27,11,2018,7:00,Minnesota Wild vs. Arizona,http://www.ticketmaster.com/minnesota-wild-tic...
9,Thu,8,11,2018,9:00,MSHSL Girls' State Volleyball Tournament,https://www.ticketmaster.com/venueartist/49594...


In [56]:
##Sort the DF based on Year, Month, and Day sequentially so it will be in a sensible order
##And, more importantly, the order that matches the inital URL list

SortedDF = Initial_DF.sort_values(['Year','Month', 'Day'])

##Reset the index to this new sorted order so we can add the original column
SortedDF = SortedDF.reset_index(drop=True)

SortedDF

Unnamed: 0,Week Day,Day,Month,Year,Time,Event,Buy Tickets URL
0,Thu,8,11,2018,9:00,MSHSL Girls' State Volleyball Tournament,https://www.ticketmaster.com/venueartist/49594...
1,Sun,11,11,2018,N:A,Drake - cancelled,This event is cancelled.
2,Tue,13,11,2018,7:00,Minnesota Wild vs. Washington,http://www.ticketmaster.com/minnesota-wild-tic...
3,Thu,15,11,2018,7:00,Minnesota Wild vs. Vancouver,http://www.ticketmaster.com/minnesota-wild-tic...
4,Sat,17,11,2018,5:00,Minnesota Wild vs. Buffalo,http://www.ticketmaster.com/minnesota-wild-tic...
5,Wed,21,11,2018,7:00,Minnesota Wild vs. Ottawa,http://www.ticketmaster.com/minnesota-wild-tic...
6,Fri,23,11,2018,3:00,Minnesota Wild vs. Winnipeg,http://www.ticketmaster.com/minnesota-wild-tic...
7,Tue,27,11,2018,7:00,Minnesota Wild vs. Arizona,http://www.ticketmaster.com/minnesota-wild-tic...
8,Sat,1,12,2018,6:00,Minnesota Wild vs. Toronto,http://www.ticketmaster.com/minnesota-wild-tic...
9,Mon,3,12,2018,7:30,101.3 KDWB's Jingle Ball,http://www.ticketmaster.com/event/06005533CC98...


In [71]:
FINAL_DF = pd.concat([SortedDF,
                                 more_info_url_pd],
                                 axis = 1, ignore_index=True)

FINAL_DF.columns = ["Week Day",
                               "Day",
                               "Month",
                                "Year",
                               "Time",
                               "Event", 
                               "Buy Tickets URL",
                               "More Info URL"]
FINAL_DF.index.name = '#'

FINAL_DF

Unnamed: 0_level_0,Week Day,Day,Month,Year,Time,Event,Buy Tickets URL,More Info URL
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Thu,8,11,2018,9:00,MSHSL Girls' State Volleyball Tournament,https://www.ticketmaster.com/venueartist/49594...,https://www.xcelenergycenter.com/events/detail...
1,Sun,11,11,2018,N:A,Drake - cancelled,This event is cancelled.,https://www.xcelenergycenter.com/events/detail...
2,Tue,13,11,2018,7:00,Minnesota Wild vs. Washington,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
3,Thu,15,11,2018,7:00,Minnesota Wild vs. Vancouver,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
4,Sat,17,11,2018,5:00,Minnesota Wild vs. Buffalo,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
5,Wed,21,11,2018,7:00,Minnesota Wild vs. Ottawa,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
6,Fri,23,11,2018,3:00,Minnesota Wild vs. Winnipeg,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
7,Tue,27,11,2018,7:00,Minnesota Wild vs. Arizona,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
8,Sat,1,12,2018,6:00,Minnesota Wild vs. Toronto,http://www.ticketmaster.com/minnesota-wild-tic...,https://www.xcelenergycenter.com/events/detail...
9,Mon,3,12,2018,7:30,101.3 KDWB's Jingle Ball,http://www.ticketmaster.com/event/06005533CC98...,https://www.xcelenergycenter.com/events/detail...


# Now to finally convert it to a CSV file

In [70]:
FINAL_DF.to_csv("Xcel Energy Center Upcoming Events")