# UFC FIGHT PREDICTOR (WebScraping)

Searching through peoples projects especially on Kaggle, there are not many good models to predict UFC fights...
what I am doing to make mine different is taking in a couple factors other datasets did not

* height
* weight
* reach

And if I cannot get good enough results with that then

* record

_the reason we don't want to use record right now is because it would be a whole nother project to pull their records at the time of the fight... therefore if we use record it would be their record TODAY and not in the past_

Even though UFC fights are somewhat unpredicatable because a good hit is a good hit **My goal
is to make a model more accurate then the models I have already seen**

# Importing Necessary Libraries

In [1]:
from bs4 import BeautifulSoup # Our webscraper
import pandas as pd # For our Data
import requests # To pull the website
import string # because I am lazy and not listing every single letter

# Building and Running our Scraper

In [2]:
# Website lists fighters A - Z so to go through each letter we have the alphabet listed
letters = list(string.ascii_lowercase) 
each_row = [] 

for letter in letters:
    # website goes by last name
    website = f"http://ufcstats.com/statistics/fighters?char={letter}&page=all"
    response = requests.get(website)
    
    # building the scraper for all table rows or 'tr'
    soup = BeautifulSoup(response.content, 'html.parser')
    rows = soup.find_all('tr', class_='b-statistics__table-row')
    
    # For each row finding all table data 'td' in the row
    for row in rows:
        name_info = row.find_all('td', class_="b-statistics__table-col")
        
        # For every data point we find assign it accordingly
        if len(name_info) > 1:
            firstname = name_info[0].get_text(strip=True)
            lastname = name_info[1].get_text(strip = True)
            nickname = name_info[2].get_text(strip = True)
            height = name_info[3].get_text(strip = True)
            weight = name_info[4].get_text(strip = True)
            reach = name_info[5].get_text(strip = True)
            stance = name_info[6].get_text(strip = True)
            wins = name_info[7].get_text(strip = True)
            losses = name_info[8].get_text(strip = True)
            draws = name_info[9].get_text(strip = True)
            
            # Add to one large list to reform the row
            complete_row = [firstname, lastname, nickname, height, weight,
                            reach, stance, wins, losses, draws]
            
            # adding all rows to our empty list
            each_row.append(complete_row)

# Making our DataFrame

In [3]:
# Making the Dataset and naming everything accordling
ufc_fighters = pd.DataFrame(each_row)
ufc_fighters = ufc_fighters.rename(columns = {0: 'First', 1: 'Last', 2: 'Nickname', 3: 'HT', 4: 'WT', 5: 'Reach', 6: 'Stance', 
                                              7: 'Win', 8: 'Loss', 9: 'Draw'})
ufc_fighters.head()

Unnamed: 0,First,Last,Nickname,HT,WT,Reach,Stance,Win,Loss,Draw
0,Tom,Aaron,,--,155 lbs.,--,,5,3,0
1,Danny,Abbadi,The Assassin,"5' 11""",155 lbs.,--,Orthodox,4,6,0
2,Nariman,Abbasov,Bayraktar,"5' 8""",155 lbs.,"66.0""",Orthodox,28,4,0
3,David,Abbott,Tank,"6' 0""",265 lbs.,--,Switch,10,15,0
4,Hamdy,Abdelwahab,The Hammer,"6' 2""",264 lbs.,"72.0""",Southpaw,5,0,0


In [4]:
ufc_fighters.tail()

Unnamed: 0,First,Last,Nickname,HT,WT,Reach,Stance,Win,Loss,Draw
4073,Dave,Zitanick,,--,170 lbs.,--,,5,7,0
4074,Alex,Zuniga,,--,145 lbs.,--,,6,3,0
4075,George,Zuniga,,"5' 9""",185 lbs.,--,,3,1,0
4076,Allan,Zuniga,Tigre,"5' 7""",155 lbs.,"70.0""",Orthodox,13,1,0
4077,Virgil,Zwicker,RezDog,"6' 2""",205 lbs.,"74.0""",,15,6,1


Our dataset seems to have worked exactly how we wanted it to thankfully

# Exporting

In [5]:
ufc_fighters.to_csv('ufc_fighters.csv', index = False)

# Second Scraper for More Data

First we need to get each link to every fight in the past 10ish years

In [6]:
# Defining the website used and giving an empty list for all websites of each fight to be used
website = "http://www.ufcstats.com/statistics/events/completed?page=all"
websites = []

# Once again requesting rhe website and scraping all 'a' divs since they contain urls
response = requests.get(website)
soup = BeautifulSoup(response.content, 'html.parser')
links = soup.find_all('a', class_ = "b-link b-link_style_black")

# The section the URL is in is the HREF so we find all of those then add it to the list
for link in links:
    url = link['href']
    websites.append(url)

Now that we have the link to every fight we need to find the link to every fight

In [7]:
# Now that we have the links to every fight we scarpe every fight for its info
fights = []

# For every fight we scrape the fighters
for site in websites:
    response = requests.get(site)
    soup = BeautifulSoup(response.content, 'html.parser')
    fighters = soup.find_all('a', class_ = 'b-link b-link_style_black')
    
    # Extracting who fought in each fight (the way UFC ordered it was odd numbers won and even number lost)
    for fight in fighters:
        text = fight.text.strip()
        fights.append(text)

In [8]:
# Two fighter in each fight therefore divide by 2 is amount of fights
len(fights)/2

7245.0

# Making the Dataset

In [9]:
# Making empty lists and n variable
winners = []
losers = []
n = 1

# The data was ordered that all odd numbers listed in the fight was the winner
for fighters in fights:
    if n%2 == 0:
        losers.append(fighters)
    else:
        winners.append(fighters)
    n += 1        

In [10]:
# If equal it most likely worked
len(winners) == len(losers)

True

In [11]:
# To lastly verify the last fighter that won should be ""Scott Morris""
# And the last to lose should be ""Sean Daugherty""
winners[-1], losers[-1]

('Scott Morris', 'Sean Daugherty')

**SUCCESS**

# Making our Winners and Fighters Database

In [12]:
# Turn into dictionary
ufc_fights = {
    'Winners': winners,
    'Losers': losers
}

# Convert to pandas because pandas is our friend and easier to use
ufc_fights = pd.DataFrame(ufc_fights)
ufc_fights.head()

Unnamed: 0,Winners,Losers
0,Tom Aspinall,Marcin Tybura
1,Julija Stoliarenko,Molly McCann
2,Nathaniel Wood,Andre Fili
3,Paul Craig,Andre Muniz
4,Fares Ziam,Jai Herbert


In [13]:
ufc_fights.to_csv('fights.csv')

Next Step is to organize our data