## Scraping Tapology.com for UFC

I want to scrape tapology.com for the bout information of UFC events. I'm looking to create a few data frames that I will convert into csv's for further exploration. I'll start with importing the following modules:

In [1]:
%load_ext autoreload
%autoreload 2

import os
import sys

module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

from bs4 import BeautifulSoup
import requests
import pandas as pd

## Table: Events

I have the url to a search results page that contains all of the events I want to look at. So I'll start by converting that into a beautiful soup object. I'm going to use pandas read_html so that I can pull in the whole list.

In [2]:
df_results = pd.read_html('https://www.tapology.com/search?term=ufc&mainSearchFilter=events')
len(df_results)

3

Read_html returns a list of all the tables found in a web page, so I need to figure ot which one I want. To do this, I'll check the info for each of the dataframes. I know I have 864 results from the web page by just looking at it so I'll choose the one matching that in length.

In [3]:
for df in df_results:
    print(df.info(),'\n\n\n')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 864 entries, 0 to 863
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Events (864)  864 non-null    object 
 1   Unnamed: 1    0 non-null      float64
 2   Name          727 non-null    object 
 3   Unnamed: 3    0 non-null      float64
 4   Date          864 non-null    object 
 5   Unnamed: 5    0 non-null      float64
 6   Bouts         864 non-null    int64  
dtypes: float64(3), int64(1), object(3)
memory usage: 47.4+ KB
None 



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Event       8 non-null      object 
 1   Unnamed: 1  0 non-null      float64
 2   Start Time  8 non-null      object 
dtypes: float64(1), object(2)
memory usage: 320.0+ bytes
None 



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 

It looks like the first one is the one I want, so let's make sure it has all the info I want.

In [4]:
df_results[0].head()

Unnamed: 0,Events (864),Unnamed: 1,Name,Unnamed: 3,Date,Unnamed: 5,Bouts
0,Contender Series 2020,,Week 10,,2020.08.25,,0
1,Contender Series 2020,,Week 9,,2020.08.18,,0
2,UFC Fight Night,,,,2020.08.15,,1
3,Contender Series 2020,,Week 8,,2020.08.11,,0
4,Contender Series 2020,,Week 7,,2020.08.04,,0


The only thing it's missing is a link to the event page. It also has a few null columns, so I'll drop those as well.

In [5]:
df_results = df_results[0]
df_results = df_results.dropna(axis=1,how='all')
df_results.head()

Unnamed: 0,Events (864),Name,Date,Bouts
0,Contender Series 2020,Week 10,2020.08.25,0
1,Contender Series 2020,Week 9,2020.08.18,0
2,UFC Fight Night,,2020.08.15,1
3,Contender Series 2020,Week 8,2020.08.11,0
4,Contender Series 2020,Week 7,2020.08.04,0


That's cleaner but it is still missing the links so let's add those in. First I'll take the first table then I'll find all the tr elements and pull the href attribute from them. Then I'll add that list to the df_results under a new 'link' column.

In [6]:
page = requests.get('https://www.tapology.com/search?term=ufc&mainSearchFilter=events').text
soup = BeautifulSoup(page, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
rows[0]

<tr>
<th class="lrgB" scope="col">Events (864)</th>
<th class="gutter" scope="col"> </th>
<th class="lrgA" scope="col">Name</th>
<th class="gutter" scope="col"> </th>
<th class="rightC" scope="col">Date</th>
<th class="gutter" scope="col"> </th>
<th class="smlD" scope="col">Bouts</th>
</tr>

The first row is the header, so I'll drop that and then use a list comprehension. First I want to see how I can access the 'href' attribute.

In [7]:
rows = rows[1:]

I'll find out how to get one

In [8]:
rows[0].find('a').get('href')

'/fightcenter/events/68353-contender-series-2020-week-10'

Then I'll get the rest

In [9]:
links = [row.find('a').get('href') for row in rows]
links [:5]

['/fightcenter/events/68353-contender-series-2020-week-10',
 '/fightcenter/events/68352-contender-series-2020-week-9',
 '/fightcenter/events/67159-ufc-on-espn',
 '/fightcenter/events/68351-contender-series-2020-week-8',
 '/fightcenter/events/68350-contender-series-2020-week-7']

In [10]:
df_results['link'] = links
print(df_results.info())
df_results.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 864 entries, 0 to 863
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Events (864)  864 non-null    object
 1   Name          727 non-null    object
 2   Date          864 non-null    object
 3   Bouts         864 non-null    int64 
 4   link          864 non-null    object
dtypes: int64(1), object(4)
memory usage: 33.9+ KB
None


Unnamed: 0,Events (864),Name,Date,Bouts,link
0,Contender Series 2020,Week 10,2020.08.25,0,/fightcenter/events/68353-contender-series-202...
1,Contender Series 2020,Week 9,2020.08.18,0,/fightcenter/events/68352-contender-series-202...
2,UFC Fight Night,,2020.08.15,1,/fightcenter/events/67159-ufc-on-espn
3,Contender Series 2020,Week 8,2020.08.11,0,/fightcenter/events/68351-contender-series-202...
4,Contender Series 2020,Week 7,2020.08.04,0,/fightcenter/events/68350-contender-series-202...


The next thing I want to do is clear out all of the non UFC events, and I'll cut out The Ultimate Fighter fights as well because most of these, if not all, are exhibition matches. I'll probably need to use regular expressions for this.

In [11]:
import re

ufc = re.compile('^UFC') #matches the start of the string with UFC
contender = re.compile('^Contender')

for event in df_results['Events (864)'][:50]:
    if ufc.match(event) or contender.match(event):
        print('UFC')
    else:
        print('-----------', event, '-----------')

UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
----------- AUFC 25 -----------
UFC
UFC
UFC
UFC
UFC
UFC
----------- Davao Urban FC Fight Night 17 -----------
----------- AUFC 24 -----------
UFC
UFC
UFC
UFC
UFC
UFC
UFC
UFC
----------- Davao Urban FC Fight Night 16 -----------


Maybe I can use these regular expressions to create a mask that will filter out all non-UFC events.

In [12]:
def is_ufc(event_name):
    if ufc.match(event_name) or contender.match(event_name):
        return True
    else:
        return False

In [13]:
mask=df_results['Events (864)'].map(is_ufc)

In [14]:
ufc_only = df_results[mask]
ufc_only.reset_index()

print(ufc_only.info())
ufc_only.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 534 entries, 0 to 863
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Events (864)  534 non-null    object
 1   Name          517 non-null    object
 2   Date          534 non-null    object
 3   Bouts         534 non-null    int64 
 4   link          534 non-null    object
dtypes: int64(1), object(4)
memory usage: 25.0+ KB
None


Unnamed: 0,Events (864),Name,Date,Bouts,link
0,Contender Series 2020,Week 10,2020.08.25,0,/fightcenter/events/68353-contender-series-202...
1,Contender Series 2020,Week 9,2020.08.18,0,/fightcenter/events/68352-contender-series-202...
2,UFC Fight Night,,2020.08.15,1,/fightcenter/events/67159-ufc-on-espn
3,Contender Series 2020,Week 8,2020.08.11,0,/fightcenter/events/68351-contender-series-202...
4,Contender Series 2020,Week 7,2020.08.04,0,/fightcenter/events/68350-contender-series-202...


Seems like it worked. I'm going to export this as a csv and save it for later. Before I do that, I'm just going to rename the columns

In [15]:
ufc_only.columns = ['event', 'name', 'date', 'bouts', 'link']

In [12]:
#ufc_only.to_csv('events_ufc_tapology.csv')

## function testing

Disclaimer: the follwing function combines modifications used in 02_mef_tapology_scrape so it will differ slightly.

In [25]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
search_link = 'https://www.tapology.com/search?term=ufc&mainSearchFilter=events'

html_content = requests.get(search_link, headers=headers).content

src.get_ufc_events(html_content)

Unnamed: 0,event,name,date,bouts,link
20,UFC Fight Night,Woodley vs. Burns,2020-05-30,11,/fightcenter/events/69127-ufc-fight-night
21,UFC Fight Night,Overeem vs. Harris,2020-05-16,11,/fightcenter/events/67412-ufc-on-espn-33
22,UFC Fight Night,Smith vs. Teixeira,2020-05-13,10,/fightcenter/events/69126-ufc-fight-night
23,UFC 249,Ferguson vs. Gaethje,2020-05-09,11,/fightcenter/events/66312-ufc-250
30,UFC on ESPN+ 28,Lee vs. Oliveira,2020-03-14,12,/fightcenter/events/64600-ufc-on-espn-26
...,...,...,...,...,...
859,UFC 5,Return of the Beast,1995-04-07,10,/fightcenter/events/ufc-5-return-of-the-beast
860,UFC 4,Revenge of the Warriors,1994-12-16,10,/fightcenter/events/ufc-4-revenge-of-the-warriors
861,UFC 3,The American Dream,1994-09-09,6,/fightcenter/events/ufc-3-the-american-dream
862,UFC 2,No Way Out,1994-03-11,15,/fightcenter/events/ufc-2-no-way-out


## Table: Bouts
### Scraping a single event

I want to scrape info for each bout in a given UFC event. First I want to go to that event's page and see what it looks like.

In [17]:
ufc_only.head(25)

Unnamed: 0,event,name,date,bouts,link
0,Contender Series 2020,Week 10,2020.08.25,0,/fightcenter/events/68353-contender-series-202...
1,Contender Series 2020,Week 9,2020.08.18,0,/fightcenter/events/68352-contender-series-202...
2,UFC Fight Night,,2020.08.15,1,/fightcenter/events/67159-ufc-on-espn
3,Contender Series 2020,Week 8,2020.08.11,0,/fightcenter/events/68351-contender-series-202...
4,Contender Series 2020,Week 7,2020.08.04,0,/fightcenter/events/68350-contender-series-202...
5,UFC Fight Night,,2020.08.01,1,/fightcenter/events/69945-ufc-fight-night
6,Contender Series 2020,Week 6,2020.07.28,0,/fightcenter/events/68349-contender-series-202...
7,UFC Fight Night,Whittaker vs. Till,2020.07.25,4,/fightcenter/events/69764-ufc-fight-night
8,Contender Series 2020,Week 5,2020.07.21,0,/fightcenter/events/68348-contender-series-202...
9,UFC Fight Night,Figueiredo vs. Benavidez 2,2020.07.18,3,/fightcenter/events/69705-ufc-fight-night


The first 25 in the list haven't happened yet, so I'm going to make a new dataframe that only has previous event in it.

In [8]:
previous_ufc = ufc_only.loc[20:].reset_index(drop=True)
previous_ufc.head()
previous_ufc.to_csv('previous_ufc.csv')

NameError: name 'ufc_only' is not defined

#### Scraping an individual event

I'll take the last one as an example and navigate to the page. I should be able to concatenate the link string onto the website url.

In [19]:
event_url = 'https://www.tapology.com'+previous_ufc.loc[0]['link']
event_url

'https://www.tapology.com/fightcenter/events/69127-ufc-fight-night'

Okay, now that I have the link I'm going pull all the tables from that url and see what's in them.

In [20]:
df_event_tables = pd.read_html(event_url)

In [21]:
for table in df_event_tables:
    print(table.info(), '\n\n\n')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       1 non-null      object
dtypes: object(1)
memory usage: 136.0+ bytes
None 



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       1 non-null      object
dtypes: object(1)
memory usage: 136.0+ bytes
None 



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       1 non-null      object
dtypes: object(1)
memory usage: 136.0+ bytes
None 



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       1 non-null      object
dtypes

This event had 11 fights and none of these tables have 11 rows, so I don't think it pulled them in correcty. Maybe they are tagged as lists not tables. Let's check them out to see what's in them.

In [22]:
df_event_tables[0].head()

Unnamed: 0,0
0,Main Event 265 5 x 5


In [23]:
df_event_tables[1].head()

Unnamed: 0,0
0,Main Card 3 x 3


The last two are tables that are advertising other parts of the website (list of upcoming events and a list of the best WW fights). The first three are useless as well.

# Beautiful Soup

Instead I'll use Beautiful Soup. First let's find all the lists.

In [24]:
html = requests.get(event_url).text
soup = BeautifulSoup(html, 'html.parser')
all_lists = soup.find_all('ul')
all_lists[0]

<ul>
<li class="field"><div class="input string required identities_password_uid"><input class="string required show_hint" id="identities_password_uid" name="identities_password[uid]" title="Username" type="text"/></div></li>
<li class="field"><div class="input password optional identities_password_password"><input class="password optional show_hint" id="identities_password_password" name="identities_password[password]" title="Password" type="password"/></div></li>
<li class="btn"><input class="btn tapInSlider submit" data-disable-with="Create Password" name="commit" type="submit" value="Create Password"/></li>
<input id="identities_password_remember" name="identities_password[remember]" type="hidden" value="true"/>
</ul>

The first list looks like it contains login info. Really we just want the bout information list and the event information list. The event info is under the clearfix class and the bout info is under the fightCard class. First let's get the bout info because it's the most important.

In [25]:
bout_info = soup.find(class_='fightCard')
# bout_info_rows = bout_info.find_all('li')
# bout_info_rows

<ul class="fightCard">
<li class="fightCard">
<div class="fightCardBout">
<div class="fightCardResultHolder">
<div class="fightCardResult">
<span class="result">
Submission, Arm bar
</span>
<br/>
<span class="time">2:35 Round 1 of 3</span>
</div>
</div>
<div class="fightCardBoutNumber"></div>
<div class="fightCardFighterImage">
<img alt="Zhyrgalbek Chomonov" src="https://images.tapology.com/headshot_images/155988/icon/XpeknjjTEmg.jpg?1518443425"/>
</div>
<div class="fightCardFighterBout left win">
<div class="fightCardFighterName left">
<a href="/fightcenter/fighters/155988-jyrgalbek-chomonov">Zhyrgalbek Chomonov</a>
<span class="resultIcon"><img alt="Win icon green" src="/assets/fightcenter/win_icon_green-3c9a863c21396ff0ae4752a056bd468cddcbddb70960695ed1b962257296de14.png">
</img></span>
</div>
<div class="fightCardRecord">
<span class="fighterFlag">
<img alt="Kg" class="fightCardFlag" src="/assets/flags/KG-2dd75084b20a3bdfeb35b8bb9af402b42d09fae9e10d8fd3cc021f121eaa8732.gif">
</img>

This doesn't look like the same fightcard, so I'm going to look at all the fight card lists on this page.

In [26]:
fightCards = soup.find_all(class_='fightCard')
len(fightCards)

18

## User-Agent
The previous set of codes seems to give me a random page from tapology, I'm going to try setting a user-agent and see if that let's me access the page.

In [27]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}

response = requests.get(event_url, headers=headers)
response

<Response [200]>

In [28]:
soup = BeautifulSoup(response.text, 'html.parser')
bout_info = soup.find(class_='fightCard')
bout_list = bout_info.get_text().split('\n')
bout_list[:15]

['',
 '',
 '',
 '',
 '',
 '',
 'Decision, Unanimous',
 '',
 '',
 '5 Rounds, 25:00 Total',
 '',
 '',
 '11',
 '',
 '']

I want to get rid of these spaces:

In [29]:
bout_list = list(filter(lambda item: item != '', bout_list))
bout_list[:5]

['Decision, Unanimous',
 '5 Rounds, 25:00 Total',
 '11',
 'Gilbert Burns',
 'Climbed to 19-3']

In [30]:
len(bout_list)

110

There were 11 bouts and there are 110 items in the list so I'm going to assume that there are 10 items per bout, so let's group them into individual bouts.

In [31]:
card = []
for index in range(11):
    card.append(bout_list[index*10:index*10+10])
card[0]

['Decision, Unanimous',
 '5 Rounds, 25:00 Total',
 '11',
 'Gilbert Burns',
 'Climbed to 19-3',
 'Main Event',
 '170',
 '5 x 5',
 'Tyron Woodley',
 'Fell to 19-5']

In [32]:
df_card = pd.DataFrame(card)
df_card

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,"Decision, Unanimous","5 Rounds, 25:00 Total",11,Gilbert Burns,Climbed to 19-3,Main Event,170,5 x 5,Tyron Woodley,Fell to 19-5
1,"Decision, Split","3 Rounds, 15:00 Total",10,Augusto Sakai,Climbed to 15-1,Co-Main Event,265,3 x 5,Blagoy Ivanov,Fell to 18-4
2,"Decision, Unanimous","3 Rounds, 15:00 Total",9,Billy Quarantillo,Climbed to 14-2,Main Card,150,3 x 5,Spike Carlyle,Fell to 9-2
3,"Submission, Rear Naked Choke","3:26 Round 2 of 3, 8:26 Total",8,Roosevelt Roberts,Climbed to 10-1,Main Card,155,3 x 5,Brok Weaver,Fell to 15-5
4,"Submission, Kneebar",2:36 Round 1 of 3,7,Mackenzie Dern,Climbed to 8-1,Main Card,115,3 x 5,Hannah Cifers,Fell to 10-5
5,"Decision, Unanimous","3 Rounds, 15:00 Total",6,Katlyn Chookagian,Climbed to 14-3,Prelim,125,3 x 5,Antonina Shevchenko,Fell to 8-2
6,"Decision, Unanimous","3 Rounds, 15:00 Total",5,Daniel Rodriguez,Climbed to 12-1,Prelim,170,3 x 5,Gabe Green,Fell to 9-3
7,"KO/TKO, Knee to the Body to Ground and Pound",1:51 Round 1 of 3,4,Jamahal Hill,Climbed to 8-0,Prelim,205,3 x 5,Klidson Abreu,Fell to 15-5
8,"Submission, Arm Triangle Choke","3:18 Round 2 of 3, 8:18 Total",3,Brandon Royval,Climbed to 11-4,Prelim,125,3 x 5,Tim Elliott,Fell to 15-11
9,"Submission, One-Arm Guillotine Choke",3:03 Round 1 of 3,2,Casey Kenney,Climbed to 14-2,Prelim,135,3 x 5,Louis Smolka,Fell to 16-7


The only problem is that there aren't any column names and there aren't any links to the bouts. First let's add the links and then add the column names.

In [33]:
bout_links = bout_info.find_all('a')
bout_links = [a.get('href') for a in bout_links]
len(bout_links)

33

So there are 3 links for each match. Probably one for each fighter and one for the match info. But which one is the match info?

In [34]:
bout_links[:6]

['/fightcenter/fighters/31168-gilbert-burns-durinho',
 '/fightcenter/bouts/501343-ufc-fight-night-tyron-the-chosen-one-woodley-vs-gilbert-durinho-burns',
 '/fightcenter/fighters/tyron-woodley-t-wood',
 '/fightcenter/fighters/44468-augusto-sakai',
 '/fightcenter/bouts/501441-ufc-fight-night-blagoy-baga-ivanov-vs-augusto-sakai',
 '/fightcenter/fighters/blagoi-ivanov']

It looks like it's the second link in every group of 3. it looks like I'm reusing the code from before so I'm going to create a grouping function.

In [35]:
def group(a, group_size):
    final_list = []
    for index in range(int(len(a)/group_size)):
        final_list.append(a[index*group_size:index*group_size+group_size])
    return final_list

Split the group by 3:

In [36]:
links_by_match = group(bout_links, 3)
links_by_match[0]

['/fightcenter/fighters/31168-gilbert-burns-durinho',
 '/fightcenter/bouts/501343-ufc-fight-night-tyron-the-chosen-one-woodley-vs-gilbert-durinho-burns',
 '/fightcenter/fighters/tyron-woodley-t-wood']

Take the 2nd element of each group:

In [37]:
match_links_only = [match[1] for match in links_by_match]
match_links_only[:2]

['/fightcenter/bouts/501343-ufc-fight-night-tyron-the-chosen-one-woodley-vs-gilbert-durinho-burns',
 '/fightcenter/bouts/501441-ufc-fight-night-blagoy-baga-ivanov-vs-augusto-sakai']

now I should be able to tack that onto the end of my match dataframe:

In [38]:
df_card['link'] = match_links_only
df_card

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,link
0,"Decision, Unanimous","5 Rounds, 25:00 Total",11,Gilbert Burns,Climbed to 19-3,Main Event,170,5 x 5,Tyron Woodley,Fell to 19-5,/fightcenter/bouts/501343-ufc-fight-night-tyro...
1,"Decision, Split","3 Rounds, 15:00 Total",10,Augusto Sakai,Climbed to 15-1,Co-Main Event,265,3 x 5,Blagoy Ivanov,Fell to 18-4,/fightcenter/bouts/501441-ufc-fight-night-blag...
2,"Decision, Unanimous","3 Rounds, 15:00 Total",9,Billy Quarantillo,Climbed to 14-2,Main Card,150,3 x 5,Spike Carlyle,Fell to 9-2,/fightcenter/bouts/502990-ufc-fight-night-bill...
3,"Submission, Rear Naked Choke","3:26 Round 2 of 3, 8:26 Total",8,Roosevelt Roberts,Climbed to 10-1,Main Card,155,3 x 5,Brok Weaver,Fell to 15-5,/fightcenter/bouts/502972-ufc-fight-night-roos...
4,"Submission, Kneebar",2:36 Round 1 of 3,7,Mackenzie Dern,Climbed to 8-1,Main Card,115,3 x 5,Hannah Cifers,Fell to 10-5,/fightcenter/bouts/500998-ufc-fight-night-mack...
5,"Decision, Unanimous","3 Rounds, 15:00 Total",6,Katlyn Chookagian,Climbed to 14-3,Prelim,125,3 x 5,Antonina Shevchenko,Fell to 8-2,/fightcenter/bouts/502988-ufc-fight-night-katl...
6,"Decision, Unanimous","3 Rounds, 15:00 Total",5,Daniel Rodriguez,Climbed to 12-1,Prelim,170,3 x 5,Gabe Green,Fell to 9-3,/fightcenter/bouts/503985-ufc-fight-night-dani...
7,"KO/TKO, Knee to the Body to Ground and Pound",1:51 Round 1 of 3,4,Jamahal Hill,Climbed to 8-0,Prelim,205,3 x 5,Klidson Abreu,Fell to 15-5,/fightcenter/bouts/500971-ufc-fight-night-jama...
8,"Submission, Arm Triangle Choke","3:18 Round 2 of 3, 8:18 Total",3,Brandon Royval,Climbed to 11-4,Prelim,125,3 x 5,Tim Elliott,Fell to 15-11,/fightcenter/bouts/502983-ufc-fight-night-tim-...
9,"Submission, One-Arm Guillotine Choke",3:03 Round 1 of 3,2,Casey Kenney,Climbed to 14-2,Prelim,135,3 x 5,Louis Smolka,Fell to 16-7,/fightcenter/bouts/502982-ufc-fight-night-case...


In [39]:
df_card.columns = ['method', 'length', 'order', 'fighter_1', 'record_1','bout_type', 'weight', 'scheduled_rounds', 'fighter_2', 'record_2', 'link']

In [40]:
df_card

Unnamed: 0,method,length,order,fighter_1,record_1,bout_type,weight,scheduled_rounds,fighter_2,record_2,link
0,"Decision, Unanimous","5 Rounds, 25:00 Total",11,Gilbert Burns,Climbed to 19-3,Main Event,170,5 x 5,Tyron Woodley,Fell to 19-5,/fightcenter/bouts/501343-ufc-fight-night-tyro...
1,"Decision, Split","3 Rounds, 15:00 Total",10,Augusto Sakai,Climbed to 15-1,Co-Main Event,265,3 x 5,Blagoy Ivanov,Fell to 18-4,/fightcenter/bouts/501441-ufc-fight-night-blag...
2,"Decision, Unanimous","3 Rounds, 15:00 Total",9,Billy Quarantillo,Climbed to 14-2,Main Card,150,3 x 5,Spike Carlyle,Fell to 9-2,/fightcenter/bouts/502990-ufc-fight-night-bill...
3,"Submission, Rear Naked Choke","3:26 Round 2 of 3, 8:26 Total",8,Roosevelt Roberts,Climbed to 10-1,Main Card,155,3 x 5,Brok Weaver,Fell to 15-5,/fightcenter/bouts/502972-ufc-fight-night-roos...
4,"Submission, Kneebar",2:36 Round 1 of 3,7,Mackenzie Dern,Climbed to 8-1,Main Card,115,3 x 5,Hannah Cifers,Fell to 10-5,/fightcenter/bouts/500998-ufc-fight-night-mack...
5,"Decision, Unanimous","3 Rounds, 15:00 Total",6,Katlyn Chookagian,Climbed to 14-3,Prelim,125,3 x 5,Antonina Shevchenko,Fell to 8-2,/fightcenter/bouts/502988-ufc-fight-night-katl...
6,"Decision, Unanimous","3 Rounds, 15:00 Total",5,Daniel Rodriguez,Climbed to 12-1,Prelim,170,3 x 5,Gabe Green,Fell to 9-3,/fightcenter/bouts/503985-ufc-fight-night-dani...
7,"KO/TKO, Knee to the Body to Ground and Pound",1:51 Round 1 of 3,4,Jamahal Hill,Climbed to 8-0,Prelim,205,3 x 5,Klidson Abreu,Fell to 15-5,/fightcenter/bouts/500971-ufc-fight-night-jama...
8,"Submission, Arm Triangle Choke","3:18 Round 2 of 3, 8:18 Total",3,Brandon Royval,Climbed to 11-4,Prelim,125,3 x 5,Tim Elliott,Fell to 15-11,/fightcenter/bouts/502983-ufc-fight-night-tim-...
9,"Submission, One-Arm Guillotine Choke",3:03 Round 1 of 3,2,Casey Kenney,Climbed to 14-2,Prelim,135,3 x 5,Louis Smolka,Fell to 16-7,/fightcenter/bouts/502982-ufc-fight-night-case...


## function testing

I tried turning this into a function so let's see if it works the same:

In [4]:
import src

previous_ufc = pd.read_csv('previous_ufc.csv')
first_link = previous_ufc.loc[0]['link']

event_url = 'https://www.tapology.com'+first_link

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
response = requests.get(event_url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

function_df = src.create_bouts_table(soup, first_link)

In [6]:
function_df

Unnamed: 0,method,length,order,fighter_1,record_1,bout_type,weight,scheduled_rounds,fighter_2,record_2,link,event_link
0,"Decision, Unanimous","5 Rounds, 25:00 Total",11,Gilbert Burns,Climbed to 19-3,Main Event,170,5 x 5,Tyron Woodley,Fell to 19-5,/fightcenter/bouts/501343-ufc-fight-night-tyro...,/fightcenter/events/69127-ufc-fight-night
1,"Decision, Split","3 Rounds, 15:00 Total",10,Augusto Sakai,Climbed to 15-1,Co-Main Event,265,3 x 5,Blagoy Ivanov,Fell to 18-4,/fightcenter/bouts/501441-ufc-fight-night-blag...,/fightcenter/events/69127-ufc-fight-night
2,"Decision, Unanimous","3 Rounds, 15:00 Total",9,Billy Quarantillo,Climbed to 14-2,Main Card,150,3 x 5,Spike Carlyle,Fell to 9-2,/fightcenter/bouts/502990-ufc-fight-night-bill...,/fightcenter/events/69127-ufc-fight-night
3,"Submission, Rear Naked Choke","3:26 Round 2 of 3, 8:26 Total",8,Roosevelt Roberts,Climbed to 10-1,Main Card,155,3 x 5,Brok Weaver,Fell to 15-5,/fightcenter/bouts/502972-ufc-fight-night-roos...,/fightcenter/events/69127-ufc-fight-night
4,"Submission, Kneebar",2:36 Round 1 of 3,7,Mackenzie Dern,Climbed to 8-1,Main Card,115,3 x 5,Hannah Cifers,Fell to 10-5,/fightcenter/bouts/500998-ufc-fight-night-mack...,/fightcenter/events/69127-ufc-fight-night
5,"Decision, Unanimous","3 Rounds, 15:00 Total",6,Katlyn Chookagian,Climbed to 14-3,Prelim,125,3 x 5,Antonina Shevchenko,Fell to 8-2,/fightcenter/bouts/502988-ufc-fight-night-katl...,/fightcenter/events/69127-ufc-fight-night
6,"Decision, Unanimous","3 Rounds, 15:00 Total",5,Daniel Rodriguez,Climbed to 12-1,Prelim,170,3 x 5,Gabe Green,Fell to 9-3,/fightcenter/bouts/503985-ufc-fight-night-dani...,/fightcenter/events/69127-ufc-fight-night
7,"KO/TKO, Knee to the Body to Ground and Pound",1:51 Round 1 of 3,4,Jamahal Hill,Climbed to 8-0,Prelim,205,3 x 5,Klidson Abreu,Fell to 15-5,/fightcenter/bouts/500971-ufc-fight-night-jama...,/fightcenter/events/69127-ufc-fight-night
8,"Submission, Arm Triangle Choke","3:18 Round 2 of 3, 8:18 Total",3,Brandon Royval,Climbed to 11-4,Prelim,125,3 x 5,Tim Elliott,Fell to 15-11,/fightcenter/bouts/502983-ufc-fight-night-tim-...,/fightcenter/events/69127-ufc-fight-night
9,"Submission, One-Arm Guillotine Choke",3:03 Round 1 of 3,2,Casey Kenney,Climbed to 14-2,Prelim,135,3 x 5,Louis Smolka,Fell to 16-7,/fightcenter/bouts/502982-ufc-fight-night-case...,/fightcenter/events/69127-ufc-fight-night


In [10]:

np.datetime64('today')

numpy.datetime64('2020-06-06')

In [11]:
pd.to_datetime('today')

Timestamp('2020-06-06 09:34:39.136271')

In [24]:
pd.to_datetime('today') - pd.to_timedelta(1, unit='days')

Timestamp('2020-06-05 09:50:46.799647')

In [23]:
pd.to_timedelta(1, unit='days')

Timedelta('1 days 00:00:00')