### Purpose

The goal of this notebook is to merge the wonderful rankings data compiled here: https://www.kaggle.com/martj42/ufc-rankings and my match data

In [1]:
import pandas as pd
from datetime import timedelta
import numpy as np

# 1. Load the Data

### Load the match data

In [2]:
match_df = pd.read_csv("../data/ufc-master.csv")
#Let's put all the labels in a dataframe
match_df['label'] = ''
#If the winner is not Red or Blue we can remove it.
mask = match_df['Winner'] == 'Red'
match_df['label'][mask] = 0
mask = match_df['Winner'] == 'Blue'
match_df['label'][mask] = 1

#df["Winner"] = df["Winner"].astype('category')
match_df = match_df[(match_df['Winner'] == 'Blue') | (match_df['Winner'] == 'Red')]


#Make sure label is numeric
match_df['label'] = pd.to_numeric(match_df['label'], errors='coerce')

#Let's fix the date
match_df['date'] = pd.to_datetime(match_df['date'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


### Load the rankings data

In [3]:
rankings_df = pd.read_csv("../data/rankings_history.csv")
rankings_df['date'] = pd.to_datetime(rankings_df['date'])

In [4]:
weightclass_list = rankings_df.weightclass.unique()
print(weightclass_list)

['Pound-for-Pound' 'Flyweight' 'Bantamweight' 'Featherweight'
 'Lightweight' 'Welterweight' 'Middleweight' 'Light Heavyweight'
 'Heavyweight' "Women's Bantamweight" "Women's Strawweight"
 "Women's Featherweight" "Women's Flyweight"]


The merged dataframe will contain all of the columns for the match dataframe.  It will also contain the following new columns:

* B_Pound-for-Pound_rank
* B_Flyweight_rank
* B_Bantamweight_rank
* B_Featherweight_rank
* B_Lightweight_rank
* B_Welterweight_rank
* B_Middleweight_rank
* B_Light Heavyweight_rank
* B_Heavyweight_rank
* B_Women's Bantamweight_rank
* B_Women's Strawweight_rank
* B_Women's Featherweight_rank
* B_Women's Flyweight_rank
* R_Pound-for-Pound_rank
* R_Flyweight_rank
* R_Bantamweight_rank
* R_Featherweight_rank
* R_Lightweight_rank
* R_Welterweight_rank
* R_Middleweight_rank
* R_Light Heavyweight_rank
* R_Heavyweight_rank
* R_Women's Bantamweight_rank
* R_Women's Strawweight_rank
* R_Women's Featherweight_rank
* R_Women's Flyweight_rank

* R_match_weightclass_rank
* B_match_weightclass_rank

* better_rank

The first batch of columns are just the current rank of the fighter in each weightclass.  I decided to do it this way as opposed to just having a 'rank' column and matching that to the weightclass of the fight, because a fighter can be ranked in multiple weightclasses, and that might give them an advantage that should be discoverable.  THe 'R_' or 'B_' refers to the red or blue fighter.

R_match_weightclass_rank and B_match_weightclass_rank are the rank of the fighter in the weightclass that the current match is taking place in.

better_rank will be {blue, red, neither} denoting the higher ranked fighter.


# 2. Now... How do we combine the two dataframes?

We have date information in both dataframes so I will use that.  We will get a list of all dates in the rankings dataframe.  The match dataframe will look at the most recent rankings before the date of the match and see if either fighter's name is in the list.

First let's get a list of dates from which we have ranking data


In [5]:
print(rankings_df.columns)

Index(['date', 'weightclass', 'fighter', 'rank'], dtype='object')


In [6]:
date_list = rankings_df.date.unique()
display(date_list)

array(['2013-02-04T00:00:00.000000000', '2013-02-11T00:00:00.000000000',
       '2013-02-12T00:00:00.000000000', '2013-02-18T00:00:00.000000000',
       '2013-02-25T00:00:00.000000000', '2013-03-18T00:00:00.000000000',
       '2013-04-08T00:00:00.000000000', '2013-04-22T00:00:00.000000000',
       '2013-05-02T00:00:00.000000000', '2013-06-10T00:00:00.000000000',
       '2013-07-08T00:00:00.000000000', '2013-07-29T00:00:00.000000000',
       '2013-08-05T00:00:00.000000000', '2013-08-30T00:00:00.000000000',
       '2013-09-06T00:00:00.000000000', '2013-09-23T00:00:00.000000000',
       '2013-10-11T00:00:00.000000000', '2013-10-21T00:00:00.000000000',
       '2013-10-28T00:00:00.000000000', '2013-11-08T00:00:00.000000000',
       '2013-11-11T00:00:00.000000000', '2013-11-18T00:00:00.000000000',
       '2013-12-09T00:00:00.000000000', '2013-12-16T00:00:00.000000000',
       '2013-12-30T00:00:00.000000000', '2014-01-17T00:00:00.000000000',
       '2014-01-27T00:00:00.000000000', '2014-02-03

In [7]:
print(min(date_list))

2013-02-04T00:00:00.000000000


We have matchup data that goes back a few years earlier than the ranking data, but that isn't a big deal.  We just have to write code that won't return an error if it can't find appropriate ranking data

Let's try to look smart and see if we can figure this out using a lambda function

In [8]:
display(rankings_df.head())

Unnamed: 0,date,weightclass,fighter,rank
0,2013-02-04,Pound-for-Pound,Anderson Silva,1
1,2013-02-04,Pound-for-Pound,Jon Jones,2
2,2013-02-04,Pound-for-Pound,Georges St-Pierre,3
3,2013-02-04,Pound-for-Pound,Jose Aldo,4
4,2013-02-04,Pound-for-Pound,Benson Henderson,5


In [9]:
display(match_df.columns)

Index(['R_fighter', 'B_fighter', 'R_odds', 'B_odds', 'R_ev', 'B_ev', 'date',
       'location', 'country', 'Winner', 'title_bout', 'weight_class', 'gender',
       'no_of_rounds', 'B_current_lose_streak', 'B_current_win_streak',
       'B_draw', 'B_avg_SIG_STR_landed', 'B_avg_SIG_STR_pct', 'B_avg_SUB_ATT',
       'B_avg_TD_landed', 'B_avg_TD_pct', 'B_longest_win_streak', 'B_losses',
       'B_total_rounds_fought', 'B_total_title_bouts',
       'B_win_by_Decision_Majority', 'B_win_by_Decision_Split',
       'B_win_by_Decision_Unanimous', 'B_win_by_KO/TKO', 'B_win_by_Submission',
       'B_win_by_TKO_Doctor_Stoppage', 'B_wins', 'B_Stance', 'B_Height_cms',
       'B_Reach_cms', 'B_Weight_lbs', 'R_current_lose_streak',
       'R_current_win_streak', 'R_draw', 'R_avg_SIG_STR_landed',
       'R_avg_SIG_STR_pct', 'R_avg_SUB_ATT', 'R_avg_TD_landed', 'R_avg_TD_pct',
       'R_longest_win_streak', 'R_losses', 'R_total_rounds_fought',
       'R_total_title_bouts', 'R_win_by_Decision_Majority',
  

In [10]:
def return_rank(fighter_name, date, wc):
    rank = ''
    keep_going = True;
    previous_d = ''
    for d in date_list:
        if keep_going:
            time_dif =  (d - date).total_seconds()
            if time_dif > -1:
                keep_going = False
                #print(fighter_name, time_dif, date, wc, d)
                temp_rankings_df = rankings_df[rankings_df['date']==previous_d].copy()
                temp_rankings_df = temp_rankings_df[temp_rankings_df['weightclass']==wc]
                temp_rankings_df = temp_rankings_df[temp_rankings_df['fighter']==fighter_name]
                #This means we have a match.  We need to return the rank
                if len(temp_rankings_df) > 0:
                    rank = int(temp_rankings_df.iloc[0]['rank'])
                    #display(rank)
                    #print(fighter_name)
                #print(len(temp_rankings_df))
            else:
                previous_d = d
    if isinstance(rank, int):
        print(rank)
        return(rank)
    else:
        return('')

better_rank

In [11]:
match_df['B_match_weightclass_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         x['weight_class']),axis=1)

9
4
9
6
13
12
9
8
8
4
6
9
13
12
12
5
3
4
3
7
3
2
12
11
3
2
5
14
6
13
14
4
15
12
12
14
10
6
4
4
10
14
10
4
1
8
3
9
13
1
7
10
9
14
1
10
9
6
5
13
13
15
9
15
9
12
1
15
4
12
11
7
6
8
15
11
5
9
12
6
14
9
11
10
14
5
11
6
10
9
2
1
1
12
10
14
6
5
14
7
1
1
4
9
13
3
4
13
15
6
4
13
15
1
15
4
12
7
14
10
9
12
13
3
5
5
8
9
11
4
10
12
8
8
9
3
2
15
9
13
15
13
6
15
12
7
14
15
4
5
8
2
10
10
8
14
1
14
11
9
11
14
15
10
5
14
12
4
10
13
15
15
5
2
9
7
10
14
1
8
12
5
9
5
11
14
2
4
7
9
10
13
10
14
14
1
1
10
14
4
4
5
7
12
13
9
13
7
13
5
12
10
11
7
13
13
10
1
4
12
4
6
11
5
5
3
8
15
7
12
12
2
5
12
14
7
4
11
14
11
6
10
10
15
11
1
13
8
8
11
3
12
5
10
7
5
8
8
15
9
14
2
9
8
8
9
1
2
14
11
13
10
4
9
12
4
7
15
6
8
1
4
4
5
6
11
9
8
13
12
13
9
2
4
4
7
12
7
13
7
3
15
4
9
14
1
10
5
13
13
15
9
7
1
6
5
15
9
8
12
13
9
14
14
3
3
14
11
11
13
7
15
12
1
2
2
7
4
3
5
7
10
13
10
3
6
6
9
1
5
12
6
12
12
9
8
1
8
14
9
8
13
15
2
12
10
7
15
1
5
2
12
13
8
11
14
5
14
11
13
11
15
2
8
9
15
10
8
2
4
8
7
6
14
3
8
15
13
9
8
14
14
10
15
12
10
10
3


In [12]:
match_df['R_match_weightclass_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         x['weight_class']),axis=1)

1
0
5
2
13
4
1
12
2
11
8
6
15
4
15
15
13
15
12
7
10
1
0
2
7
8
5
10
3
0
0
1
6
14
8
2
1
12
5
10
0
0
4
6
14
5
12
7
13
15
12
13
10
4
9
15
13
6
5
0
8
3
2
15
8
9
0
0
6
7
0
7
8
7
0
7
2
8
3
0
0
7
15
4
5
4
9
12
15
14
8
10
5
14
5
11
0
6
14
3
5
11
10
5
3
7
4
12
10
15
3
12
11
7
7
3
4
9
5
8
7
12
12
1
5
6
13
4
8
4
0
0
0
1
14
2
10
5
2
6
5
10
0
2
7
11
2
3
10
10
2
2
11
12
3
12
15
15
6
0
9
1
10
15
4
8
10
14
11
8
15
9
7
7
10
4
12
4
6
11
10
6
4
4
1
5
14
11
3
9
5
3
6
0
0
6
7
2
6
14
12
4
15
6
9
3
10
11
5
12
6
15
14
15
3
2
8
12
11
0
15
2
12
11
10
1
0
5
5
14
14
4
5
10
0
1
13
7
3
3
14
7
13
10
8
3
9
3
3
6
8
12
9
10
12
4
0
3
7
6
4
2
11
0
2
7
2
8
2
9
12
14
8
4
12
0
2
15
5
6
8
12
7
11
6
0
0
5
13
9
3
2
1
13
15
3
9
12
8
3
9
8
12
6
15
10
7
0
1
9
9
3
8
8
11
7
9
0
1
9
2
3
7
1
4
15
2
1
9
14
15
5
9
7
0
6
9
10
4
3
8
8
10
5
12
7
6
13
2
0
11
4
14
3
4
15
0
2
10
6
12
8
7
4
2
5
7
14
11
7
12
13
1
5
13
4
13
14
15
14
12
15
11
0
0
15
10
10
9
10
10
13
0
2
6
8
15
13
2
3
10
3
14
4
15
7
0
1
2
4
5
15
9
2
2
8
11
4
8
0
0
0
2
6
9
7
3
12
1

In [13]:
match_df['R_Women\'s Flyweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Flyweight'),axis=1)

1
2
10
12
5
8
15
9
0
7
0
5
12
7
1
2
0
2
7
5
6
12
14
12
10
1
3
4
3
7
8
9
2
13


In [14]:
match_df['R_Women\'s Featherweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Featherweight'),axis=1)

0
0
0
0
0
0


In [15]:
match_df['R_Women\'s Strawweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Strawweight'),axis=1)

6
7
0
14
8
0
8
5
2
10
0
9
6
14
7
11
3
8
2
6
11
15
1
12
3
9
7
0
4
2
14
11
11
6
5
9
0
3
1
9
8
5
13
1
0
12
14
4
3
12
5
6
7
9
15
1
0
6
10
13
9
3
0
5
14
6
10
2
3
9
7
5
3
0
15
4
7
8
2
9
0
5
13
6
8
6
0


In [16]:
match_df['R_Women\'s Bantamweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Bantamweight'),axis=1)

0
15
0
12
6
10
14
11
10
3
0
2
5
12
6
10
15
14
6
5
4
9
12
12
6
1
14
0
6
7
13
1
11
10
0
6
11
5
10
13
7
2
1
0
7
15
1
9
10
13
8
2
13
0
3
6
14
8
0
4
12
5
0
6
13
4
0
2
10
13
3
13
9
9
0
14
10
2
13
6
1
2
10
10
4
14
0
14
3
5
13
0
3
0
4
8
7
5
8
6
6
1


In [17]:
match_df['R_Heavyweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Heavyweight'),axis=1)

12
8
12
2
2
9
6
0
4
14
14
3
15
12
5
6
11
15
7
10
4
3
15
3
7
10
3
6
0
2
8
12
9
8
0
1
9
2
10
3
10
7
5
14
0
1
2
11
9
2
8
12
1
15
9
8
6
0
11
15
3
8
7
10
12
14
12
11
0
1
6
2
10
0
3
12
4
9
11
8
6
3
2
8
14
8
4
15
12
10
0
9
3
4
14
8
8
2
11
12
3
1
6
4
7
14
7
2
11
4
10
8
6
0
2
6
5
5
8
0
1
6
6
9


In [18]:
match_df['R_Light Heavyweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Light Heavyweight'),axis=1)

4
10
0
6
5
0
7
1
9
15
4
7
6
8
2
11
11
12
5
0
14
4
12
1
5
7
13
0
2
11
7
4
15
8
3
7
15
10
4
5
13
0
15
3
14
6
6
10
3
12
1
5
8
0
12
4
6
3
6
8
9
14
4
5
2
4
1
11
0
8
10
14
1
4
13
12
6
2
4
12
0
4
1
12
4
8
15
1
8
9
1
9
5
0
9
4
15
8
5
14
9
13
4
0
4
7
1
14
9
4
6
0
3
8
1
6
4
0
8
10
2


In [19]:
match_df['R_Middleweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Middleweight'),axis=1)

13
10
0
2
8
12
5
0
4
12
10
12
9
8
4
11
6
14
14
3
7
6
12
13
9
8
0
15
9
8
13
1
13
15
2
0
7
15
13
3
10
5
1
8
11
9
11
4
7
10
7
11
5
2
14
0
5
15
9
12
8
0
15
7
5
9
13
15
12
0
10
4
6
15
9
4
13
0
7
14
10
2
13
11
15
4
5
6
12
8
14
0
7
9
12
5
13
5
4
6
14
0
2
9
7
3
0
8
2
4
4


In [20]:
match_df['R_Welterweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Welterweight'),axis=1)

1
13
15
5
12
4
4
2
7
5
10
9
3
9
0
14
3
15
4
3
9
0
6
8
13
3
8
12
14
0
11
1
1
9
5
12
15
11
10
8
2
10
15
2
3
6
13
6
0
3
14
8
7
11
3
13
11
9
0
10
5
9
6
5
15
3
9
7
0
9
6
1
10
6
13
13
2
11
0
6
7
13
15
6
9
0
15
5
10
4
8
7
1
11
3
13
0
7
2
12
13
4
1
13
11
2
8
1
7
1
2
6
11
4
15
10
0
3
4
2
3
9
0
2
6
8


In [21]:
match_df['R_Lightweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Lightweight'),axis=1)

15
1
8
6
15
13
9
0
7
15
4
6
4
11
8
13
2
15
4
6
10
15
11
4
5
0
2
7
3
13
14
9
4
5
2
14
12
12
2
4
8
2
5
6
14
5
11
10
11
10
0
2
2
9
11
7
15
7
4
6
3
0
4
11
10
3
2
7
5
1
12
0
6
6
14
7
5
7
9
15
4
3
12
14
0
6
14
5
3
4
3
0
7
5
12
9
1
12
3
6
13
2
5
9
8
5
15
14
1
8
10
5
10
2
10
0
6
3
5
10
0
4


In [22]:
match_df['R_Featherweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Featherweight'),axis=1)

15
7
7
5
10
0
7
12
5
4
0
10
1
10
0
10
6
2
5
0
12
10
15
5
2
12
10
7
3
11
2
4
12
15
9
10
15
3
4
15
0
8
3
8
10
6
0
2
4
5
13
9
10
2
4
2
4
15
12
13
8
5
12
1
4
12
6
0
11
8
7
10
0
5
2
13
4
11
14
5
2
8
14
12
9
2
5
1
12
6
13
5
13
10
3
4
0
13
11
5
7
13
3
4
3
15
6
9
12
0
10
6
1
1
10
0
3
5
1
6


In [23]:
match_df['R_Bantamweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Bantamweight'),axis=1)

5
2
12
14
0
14
13
15
3
1
10
7
3
8
2
12
9
5
15
3
14
10
9
5
8
0
9
15
3
4
6
8
8
7
10
7
4
8
0
12
5
8
4
15
3
10
13
12
7
8
14
15
4
0
1
7
13
11
13
3
2
8
12
5
1
0
7
2
4
12
15
14
0
8
4
5
8
8
12
14
13
0
6
11
6
8
1
3
4
10
10
7
0
15
2
12
0
6
4
6
3
10
0
4
7
2
7
10
5
1
10
2
5
6
8
2
5
2
1


In [24]:
match_df['R_Flyweight_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Flyweight'),axis=1)

13
4
11
3
1
8
1
7
13
3
8
7
5
11
3
7
4
5
0
4
3
7
1
11
15
11
0
2
8
3
9
2
6
0
9
1
7
6
4
14
10
13
11
2
0
11
5
2
6
12
14
8
11
9
0
12
3
5
8
12
7
0
1
12
15
4
9
3
7
12
6
10
5
0
14
3
1
12
9
3
1
10
7
0
14
11
7
4
2
1
0
10
10
3
11
5
11
2
15
0
14
5
6
10
7
0
1
15
10
2
13
7
10
5
9
4
0
2
1
7
7
3
0
8
1


In [25]:
match_df['R_Pound-for-Pound_rank'] = match_df.apply(lambda x: return_rank(x['R_fighter'],
                                                                         x['date'],
                                                                         'Pound-for-Pound'),axis=1)

12
1
14
11
14
10
3
6
3
2
6
6
1
2
13
1
15
2
9
13
10
7
5
4
14
12
14
4
2
6
11
3
11
5
1
8
13
7
6
2
2
13
12
14
9
7
5
10
4
13
6
7
1
15
9
4
6
7
1
3
9
11
3
13
1
7
8
12
10
8
4
7
8
12
5
3
4
4
1
2
3
13
4
6
10
1
3
4
14
9
3
4
6
8
4
3
3
5
12
7
15
1
10
9
6
2
4
11
13
5
10
4
13
3
1
9
14
9
6
2
9
8
10
7
2
5
1
7
6
4
7
1
10
6
2
5
3


In [26]:
match_df['B_Women\'s Flyweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Flyweight'),axis=1)

12
11
13
6
10
1
8
3
6
15
9
5
5
1
6
12
14
9
7
7
7
10
3
9
10


In [27]:
match_df['B_Women\'s Featherweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Featherweight'),axis=1)

In [28]:
match_df['B_Women\'s Strawweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Strawweight'),axis=1)

10
8
4
14
9
7
3
13
1
9
15
12
1
11
9
4
9
14
5
13
6
12
10
1
8
5
9
6
11
4
4
14
2
3
6
11
2
7
5
1
12
14
4
7
8
3
15
10
12
1


In [29]:
match_df['B_Women\'s Bantamweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Women\'s Bantamweight'),axis=1)

2
4
10
15
15
12
11
5
1
10
7
13
15
12
9
14
0
5
13
7
15
2
5
8
2
11
1
13
15
11
10
2
1
8
14
11
7
4
5
7
11
2
10
7
12
7
7
5
11
5
15
1
13
3
13
8
2
12
7
9
4
5
2
6
9
10
10
9
4


In [30]:
match_df['B_Heavyweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Heavyweight'),axis=1)

13
9
6
3
14
1
9
15
4
9
14
13
9
13
11
8
11
15
4
2
5
11
14
13
5
12
4
14
8
12
9
9
1
4
8
12
12
9
3
14
13
7
4
12
8
9
10
15
14
15
3
6
9
7
12
3
5
5
10
12
7
14
2
9
11
11
10
1
13
7
5
9
13
4
15
14
4
8
14
14
3
9
9
12
5
1
9
10
8
3
4
9
3


In [31]:
match_df['B_Light Heavyweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Light Heavyweight'),axis=1)

8
12
3
6
4
12
9
13
10
10
4
8
3
15
6
2
10
14
10
14
12
9
0
13
11
11
8
2
7
15
7
13
5
13
2
7
1
12
6
12
15
13
9
8
12
2
12
11
6
7
13
4
9
14
2
5
7
6
11
14
3
12
13
3
7
2
10
6
10
6
2
14
8
11
9
6
6
1
7
3
1


In [32]:
match_df['B_Middleweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Middleweight'),axis=1)

3
3
7
10
13
9
1
8
14
14
15
14
10
5
15
15
9
10
1
5
12
11
15
2
8
9
9
14
8
3
15
13
6
5
8
13
8
15
10
4
13
9
10
5
15
4
3
13
7
10
15
1
3
14
12
10
6
3
12
11
4
13
8
10
9
10
7
15
3
11
13
14
8
8
10
1
5
9
1
10
5
9
10
8


In [33]:
match_df['B_Welterweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Welterweight'),axis=1)

6
12
5
12
10
11
6
8
14
2
10
12
2
13
14
5
10
2
13
4
8
7
15
12
4
4
7
10
1
6
14
5
1
12
12
8
14
11
14
4
14
4
12
2
14
11
8
9
8
4
12
15
12
2
12
15
14
5
7
1
11
9
10
5
3
5
14
3
11
12
14
6
1
10
6
4
3
1


In [34]:
match_df['B_Lightweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Lightweight'),axis=1)

11
6
4
13
7
1
10
5
15
5
11
4
7
7
3
8
8
9
8
14
1
8
10
4
12
7
6
11
14
10
4
5
13
7
14
11
5
9
6
3
15
10
6
2
13
8
14
5
4
2
15
9
7
12
11
9
15
1
12
5
8
14
1
11
14
5
13
14
7
11
12
7
4
8
2
10
6
1


In [35]:
match_df['B_Featherweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Featherweight'),axis=1)

9
12
12
4
8
11
6
1
3
15
4
13
5
10
1
13
15
10
4
11
4
13
3
7
13
6
1
15
13
9
12
1
7
9
14
5
11
0
7
8
6
0
14
2
5
9
10
12
1
8
3
7
3
11
15
8
9
4
10
14
2
7
1
9
12
11
9
12
12
2
8
9
7
5
6
10
7


In [36]:
match_df['B_Bantamweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Bantamweight'),axis=1)

9
4
15
15
9
12
1
9
4
9
13
15
7
4
0
14
7
10
1
14
13
11
5
14
10
14
8
13
9
2
13
9
10
7
5
2
14
10
12
11
15
8
3
2
8
1
13
10
1
12
14
3
14
11
10
5
11
8
14
7
4
10
15
1
8
3
9
4
3
7
9
10
2


In [37]:
match_df['B_Flyweight_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Flyweight'),axis=1)

9
5
3
2
14
4
14
13
14
12
6
12
9
13
6
5
15
4
15
12
8
2
10
12
15
5
14
1
12
5
12
10
8
11
4
3
15
5
7
12
10
3
9
8
15
13
11
2
11
8
12
2
8
12
5
14
13
1
8
13
12
4
8
7
6
13
8
9
3
13
4
5
15
13
7
15
1
5
8
5
4
7
8
10


In [38]:
match_df['B_Pound-for-Pound_rank'] = match_df.apply(lambda x: return_rank(x['B_fighter'],
                                                                         x['date'],
                                                                         'Pound-for-Pound'),axis=1)



5
10
11
8
14
5
14
2
14
4
15
14
12
4
4
10
12
12
14
14
14
13
3
10


In [73]:
def return_better_rank(r_rank, b_rank):
    if (r_rank == ''):
        if b_rank != '':
            return('Blue')
        else:
            return('neither')
    if (b_rank == ''):
        return('Red')
    r_rank = int(r_rank)
    b_rank = int(b_rank)
    if (r_rank < b_rank):
        return('Red')
    else:
        return('Blue')
    return('neither')

In [74]:
match_df['better_rank'] = match_df.apply(lambda x: return_better_rank(x['R_match_weightclass_rank'],
                                                                         x['B_match_weightclass_rank']),axis=1)

In [75]:
display(match_df.head())

Unnamed: 0,R_fighter,B_fighter,R_odds,B_odds,R_ev,B_ev,date,location,country,Winner,...,B_Heavyweight_rank,B_Light Heavyweight_rank,B_Middleweight_rank,B_Welterweight_rank,B_Lightweight_rank,B_Featherweight_rank,B_Bantamweight_rank,B_Flyweight_rank,B_Pound-for-Pound_rank,better_rank
0,Jessica Eye,Cynthia Calvillo,120.0,-130.0,120.0,76.923077,2020-06-13,"Las Vegas, Nevada, USA",USA,Blue,...,,,,,,,,,,Red
1,Karl Roberson,Marvin Vettori,210.0,-230.0,210.0,43.478261,2020-06-13,"Las Vegas, Nevada, USA",USA,Blue,...,,,,,,,,,,neither
2,Charles Rosa,Kevin Aguilar,170.0,-185.0,170.0,54.054054,2020-06-13,"Las Vegas, Nevada, USA",USA,Red,...,,,,,,,,,,neither
3,Andre Fili,Charles Jourdain,-220.0,200.0,45.454545,200.0,2020-06-13,"Las Vegas, Nevada, USA",USA,Red,...,,,,,,,,,,neither
4,Jordan Espinosa,Mark De La Rosa,-167.0,157.0,59.88024,157.0,2020-06-13,"Las Vegas, Nevada, USA",USA,Red,...,,,,,,,,,,neither


In [92]:
test = (match_df.iloc[1384])

In [93]:
display(test[['R_fighter', 'R_match_weightclass_rank', 'B_fighter', 'B_match_weightclass_rank', 'date', 'better_rank']])


R_fighter                          Tecia Torres
R_match_weightclass_rank                      5
B_fighter                          Juliana Lima
B_match_weightclass_rank                       
date                        2017-07-07 00:00:00
better_rank                                 Red
Name: 1399, dtype: object

In [106]:
match_df.to_csv('test.csv')

### Take a quick look at how the better ranked fighter does:

In [105]:
temp_df = match_df[match_df['better_rank']=='Red'].copy()
red_favorite_count = (len(temp_df))
temp_df = temp_df[temp_df['Winner']=='Red']
red_winner_count = len(temp_df)

red_pct = (red_winner_count / red_favorite_count)

temp_df = match_df[match_df['better_rank']=='Blue'].copy()
blue_favorite_count = (len(temp_df))
temp_df = temp_df[temp_df['Winner']=='Blue']
blue_winner_count = len(temp_df)

blue_pct = (blue_winner_count / blue_favorite_count)
print('When Red has the better rank they win ', "{:.2f}".format(red_pct*100), '% of the time')
print('When Blue has the better rank they win ', "{:.2f}".format(blue_pct*100), '% of the time')


When Red has the better rank they win  61.13 % of the time
When Blue has the better rank they win  55.88 % of the time


In [108]:
print(blue_favorite_count)

68
