# Purpose
This notebook details how to calculate the 3 fight averages for all of the statistics below.
This contrasts with the career average in that it provides a measure of how the fighter
is currently doing. If a fighter was very successful early in their career but has been
on a slump, the career average may still look good, but their 3-fight average will show
that they have not been doing well recently.

### Result
This notebook will create a dataframe that will have a row for each bout and will include:
 - Features:
     - career average successful significant strikes for each fighter (ca_ASSS)
     - career average significant strike accuracy (ca_ASSA)
     - career average significant strike defense (ca_ASSDe)
     - career average significant strike differential (ca_ASSDi)
     - 3-fight-average successful significant strikes for each fighter (fa3_ASSS)
     - 3-fight-average significant strike accuracy (fa3_ASSA)
     - 3-fight-average significant strike defense (fa3_ASSDe)
     - 3-fight-average significant strike differential (fa3_ASSDi)
 - Target:
     - combined average successful significant strikes for a sing bout (CASSS_bout)
 

In [1]:
%load_ext autoreload
%autoreload 2

import os
import sys
module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

import pandas as pd
from sqlalchemy import create_engine
from src import local
from src import functions

In [2]:
# Credentials
USER = local.user 
PASS = local.password
HOST = local.host
PORT = local.port

#create engine
engine = create_engine(f'postgresql://{USER}:{PASS}@{HOST}:{PORT}/match_finder')

# Get data from postgres database
### Join tables

- get the date from the events table
- use the bouts table to join the dates to the general table
- use the general table to join the bouts with the fighters


#### Accuracy case statement

In [3]:
accuracy_column = """
CASE 
    WHEN (sig_str_attempted > 0) THEN (CAST(sig_str_successful AS FLOAT)/CAST(sig_str_attempted AS FLOAT))
    ELSE 0
END AS accuracy
"""

In [4]:
query = """
SELECT bout_link, fighter_link, sig_str_attempted, sig_str_successful, "Date", round,
"""+accuracy_column+"""
FROM strikes_cleaned
JOIN bouts ON bouts.link = strikes_cleaned.bout_link
JOIN events ON events.link = bouts.event_link
"""

data = pd.read_sql(query, engine)

In [5]:
data = functions.format_data(data, event=False)

In [6]:
data

Unnamed: 0,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id
0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51
6,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.500000,e1147d3d2dabe1ce,11f715fa5e825e51
12,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.406250,e1147d3d2dabe1ce,11f715fa5e825e51
18,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51
24,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51
...,...,...,...,...,...,...,...,...,...
26447,http://www.ufcstats.com/fight-details/cecdc0da...,http://www.ufcstats.com/fighter-details/a5c53b...,0,0,1993-11-12,1,0.000000,a5c53b3ddb31cc7d,cecdc0da584274b9
26448,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/598a58...,27,15,1993-11-12,1,0.555556,598a58db87b890ee,2d2bbc86e941e05c
26449,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/d3711d...,28,12,1993-11-12,1,0.428571,d3711d3784b76255,2d2bbc86e941e05c
26450,http://www.ufcstats.com/fight-details/567a09fd...,http://www.ufcstats.com/fighter-details/279093...,5,3,1993-11-12,1,0.600000,279093302a6f44b3,567a09fd200cfa05


In order to get the striking defense, we need each column to include the fighters opponent. 

In [7]:
data_0 = functions.merge_fighter_instances(data, rounds=True)
data_1 = functions.merge_fighter_instances(data, rounds=True, flip=True)

data = pd.concat((data_0, data_1))

In [8]:
data = data.loc[:, ['bout_link_0', 'fighter_link_0', 'sig_str_attempted_0',
                'sig_str_successful_0', 'Date_0', 'round_0', 'accuracy_0',
                'fighter_id_0', 'bout_id_0', 'round_id', 'inst_id_0', 
                'sig_str_attempted_1', 'sig_str_successful_1', 'accuracy_1']]

In [9]:
data

Unnamed: 0,bout_link_0,fighter_link_0,sig_str_attempted_0,sig_str_successful_0,Date_0,round_0,accuracy_0,fighter_id_0,bout_id_0,round_id,inst_id_0,sig_str_attempted_1,sig_str_successful_1,accuracy_1
0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.500000
1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.500000,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.250000
2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.406250,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714
3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053
4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13102,http://www.ufcstats.com/fight-details/ac7ca2ec...,http://www.ufcstats.com/fighter-details/279093...,17,11,1993-11-12,1,0.647059,279093302a6f44b3,ac7ca2ec38b96c1a,ac7ca2ec38b96c1a1,ac7ca2ec38b96c1a279093302a6f44b3,3,0,0.000000
13103,http://www.ufcstats.com/fight-details/46acd54c...,http://www.ufcstats.com/fighter-details/46c8ec...,8,4,1993-11-12,1,0.500000,46c8ec317aff28ac,46acd54cc0c905fb,46acd54cc0c905fb1,46acd54cc0c905fb46c8ec317aff28ac,1,1,1.000000
13104,http://www.ufcstats.com/fight-details/cecdc0da...,http://www.ufcstats.com/fighter-details/429e7d...,3,0,1993-11-12,1,0.000000,429e7d3725852ce9,cecdc0da584274b9,cecdc0da584274b91,cecdc0da584274b9429e7d3725852ce9,0,0,0.000000
13105,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/598a58...,27,15,1993-11-12,1,0.555556,598a58db87b890ee,2d2bbc86e941e05c,2d2bbc86e941e05c1,2d2bbc86e941e05c598a58db87b890ee,28,12,0.428571


### Calculate significant strike defense by subtracting the opponents accuracy from 1

In [10]:
data['ssd_0'] = 1 - data['accuracy_1']

Clean the columns for continuity

In [11]:
data.columns = ['bout_link', 'fighter_link', 'sig_str_attempted',
       'sig_str_successful', 'Date', 'round', 'accuracy',
       'fighter_id', 'bout_id', 'round_id', 'inst_id',
       'sig_str_attempted_1', 'sig_str_successful_1', 'accuracy_1', 'ssde']

In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26214 entries, 0 to 13106
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   bout_link             26214 non-null  object        
 1   fighter_link          26214 non-null  object        
 2   sig_str_attempted     26214 non-null  int64         
 3   sig_str_successful    26214 non-null  int64         
 4   Date                  26214 non-null  datetime64[ns]
 5   round                 26214 non-null  object        
 6   accuracy              26214 non-null  float64       
 7   fighter_id            26214 non-null  object        
 8   bout_id               26214 non-null  object        
 9   round_id              26214 non-null  object        
 10  inst_id               26214 non-null  object        
 11  sig_str_attempted_1   26214 non-null  int64         
 12  sig_str_successful_1  26214 non-null  int64         
 13  accuracy_1      

### Calculating significant strike differential

In [13]:
data['ssdi'] = data['sig_str_successful'] - data['sig_str_successful_1']

In [14]:
data.reset_index(inplace = True)

In [15]:
data[data['bout_id'] == data['bout_id'][0]]

Unnamed: 0,index,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id,round_id,inst_id,sig_str_attempted_1,sig_str_successful_1,accuracy_1,ssde,ssdi
0,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.5,0.5,-1
1,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.5,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.25,0.75,12
2,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.40625,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,0.714286,7
3,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053,0.578947,5
4,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739,0.478261,5
13107,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,24,12,2020-07-25,1,0.5,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e519ce6d5a03af801b7,30,11,0.366667,0.633333,1
13108,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,12,3,2020-07-25,2,0.25,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e519ce6d5a03af801b7,30,15,0.5,0.5,-12
13109,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,21,6,2020-07-25,3,0.285714,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e519ce6d5a03af801b7,32,13,0.40625,0.59375,-7
13110,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,19,8,2020-07-25,4,0.421053,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e519ce6d5a03af801b7,34,13,0.382353,0.617647,-5
13111,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,23,12,2020-07-25,5,0.521739,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e519ce6d5a03af801b7,31,17,0.548387,0.451613,-5


## Create fighter-bout instance dataframe

A fighter-bout instance represents one fighter in one bout.
 - The same fighter has exactly one fighter-bout instance for every single bout he has been in. 
 - Every bout has exactly two fighter-bout instances, one for each fighter in the bout. 
  
In this case a fighter-bout instance is assigned a unique identifier comprised of the bout_id combined with the fighter_link.

In [16]:
fighter_bout_inst = functions.create_fighter_bout_instance_table(data)

In [17]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,sss_bout
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,11.666667
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,6.333333
0019ec81fd706ade326f94d6cfb1bf25,0019ec81fd706ade,326f94d6cfb1bf25,2019-10-18,8.666667
0019ec81fd706ade85073dbd1be65ed9,0019ec81fd706ade,85073dbd1be65ed9,2019-10-18,18.000000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,3.000000
...,...,...,...,...
ffe629a5232a878bb361180739bed4b0,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,11.000000
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,2.000000
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,19.000000


### Remove debut fights
There isn't any historical data for fighters with debut fights, so for now we will not use them in our analysis.

In [18]:
fighter_bout_inst = functions.remove_debut_bouts(fighter_bout_inst)

In [19]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,sss_bout
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,11.666667
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,6.333333
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,3.000000
0027e179b743c86c91ea901c458e95dd,0027e179b743c86c,91ea901c458e95dd,2015-03-14,7.333333
002921976d27b7dab4ad3a06ee4d660c,002921976d27b7da,b4ad3a06ee4d660c,2014-12-13,17.000000
...,...,...,...,...
ffe629a5232a878bb361180739bed4b0,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,11.000000
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,2.000000
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,19.000000


## Calculate metrics

The metrics I will using in this notebook are:
 - average successful significant strikes for each fighter (ASSS)
 - average significant strike accuracy (ASSA)
 - average significant strike defense (ASSD)

### Career Averages

In [20]:
def calculate_3_fight_average(metric, fighter_id, date, df):
    """
    input: fighter_link - str, a unique fighter identifier
           date - datetime64, cut off date, metric will be calculated using every fight up until this date
           df - dataframe, a fighter-instance table containing the metric
    output: float, the metric for the 3 fights prior to the date
    """
    fighter_history = df[(df['fighter_id']==fighter_id)&
                         (df['Date']<date)].sort_values('Date')

    last_3 = fighter_history['bout_id'].unique()[-3:]
    mask = fighter_history['bout_id'].map(lambda x: True if x in last_3 else False)
    last_3_stats=fighter_history[mask]

    fighter_metric = last_3_stats[metric].mean()
    return fighter_metric

### 3-Fight Averages

In [21]:
fa3_assa = fighter_bout_inst.apply(lambda row: calculate_3_fight_average('accuracy', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assa'] = fa3_assa

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assa'] = fa3_assa


In [22]:
fa3_asss = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('sig_str_successful', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_asss'] = fa3_asss

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_asss'] = fa3_asss


In [23]:
fa3_assde = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssde', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assde'] = fa3_assde

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assde'] = fa3_assde


In [24]:
fa3_assdi = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssdi', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assdi'] = fa3_assdi

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assdi'] = fa3_assdi


### Career Averages

In [25]:
ca_assa = fighter_bout_inst.apply(lambda row: calculate_3_fight_average('accuracy', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assa'] = ca_assa

ca_asss = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('sig_str_successful', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_asss'] = ca_asss

ca_assde = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssde', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assde'] = ca_assde

ca_assdi = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssdi', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assdi'] = ca_assdi

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_assa'] = ca_assa
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_asss'] = ca_asss
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_assde'] = ca_assde
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

In [26]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,sss_bout,fa3_assa,fa3_asss,fa3_assde,fa3_assdi,ca_assa,ca_asss,ca_assde,ca_assdi
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,11.666667,0.601620,7.750000,0.563305,0.750000,0.601620,7.750000,0.563305,0.750000
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,6.333333,0.281351,9.800000,0.543474,1.200000,0.281351,9.800000,0.543474,1.200000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,3.000000,0.359259,6.666667,0.561818,-14.333333,0.359259,6.666667,0.561818,-14.333333
0027e179b743c86c91ea901c458e95dd,0027e179b743c86c,91ea901c458e95dd,2015-03-14,7.333333,0.716091,12.900000,0.546091,7.700000,0.716091,12.900000,0.546091,7.700000
002921976d27b7dab4ad3a06ee4d660c,002921976d27b7da,b4ad3a06ee4d660c,2014-12-13,17.000000,0.814489,21.444444,0.667550,14.666667,0.814489,21.444444,0.667550,14.666667
...,...,...,...,...,...,...,...,...,...,...,...,...
ffe629a5232a878bb361180739bed4b0,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000,0.406944,8.055556,0.487001,-2.777778,0.406944,8.055556,0.487001,-2.777778
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,11.000000,0.360567,16.266667,0.717127,3.000000,0.360567,16.266667,0.717127,3.000000
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,2.000000,0.399883,9.278689,0.646161,-1.934426,0.399883,9.278689,0.646161,-1.934426
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,19.000000,0.448576,9.500000,0.572856,1.600000,0.448576,9.500000,0.572856,1.600000


## Create the final dataframe

First I will get a list af all bout ids. Then I will create a dataframe with the first row and another dataframe with the second row. Then I will join those dataframes along the column axis.

In [27]:
model_df = functions.merge_fighter_instances(fighter_bout_inst)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  instances_df['inst_id'] = instances_df['bout_id'] + instances_df['fighter_id']


## Creating tsss_bout

tsss_bout: Total Successful Significant Strikes for the Bout. This metric measure the combined total number of significant strikes in a bout.

In [30]:
model_df['casss_bout'] = model_df['sss_bout_0'] + model_df['sss_bout_1']
model_df

Unnamed: 0,bout_id,fighter_id_0,date_0,sss_bout_0,fa3_assa_0,fa3_asss_0,fa3_assde_0,fa3_assdi_0,ca_assa_0,ca_asss_0,...,fa3_asss_1,fa3_assde_1,fa3_assdi_1,ca_assa_1,ca_asss_1,ca_assde_1,ca_assdi_1,inst_id_1,tsss_bout,casss_bout
0,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,6.333333,0.281351,9.800000,0.543474,1.200000,0.281351,9.800000,...,7.750000,0.563305,0.750000,0.601620,7.750000,0.563305,0.750000,000da3152b7b5ab16da99156486ed6c2,18.000000,18.000000
1,0027e179b743c86c,91ea901c458e95dd,2015-03-14,7.333333,0.716091,12.900000,0.546091,7.700000,0.716091,12.900000,...,6.666667,0.561818,-14.333333,0.359259,6.666667,0.561818,-14.333333,0027e179b743c86c3aa794cbe1e3484b,10.333333,10.333333
2,002921976d27b7da,ebc1f40e00e0c481,2014-12-13,2.000000,0.319309,11.954545,0.487594,-2.545455,0.319309,11.954545,...,21.444444,0.667550,14.666667,0.814489,21.444444,0.667550,14.666667,002921976d27b7dab4ad3a06ee4d660c,19.000000,19.000000
3,002c1562708ac307,44470bfd9483c7ad,2014-05-24,22.000000,0.486772,7.666667,0.775809,2.333333,0.486772,7.666667,...,18.111111,0.673899,2.666667,0.271754,18.111111,0.673899,2.666667,002c1562708ac30722a92d7f62195791,60.000000,60.000000
4,002cb1bb411c5f60,d897897060f10a3a,2006-03-04,25.400000,0.408337,20.636364,0.647542,15.090909,0.408337,20.636364,...,11.222222,0.498367,-0.777778,0.460897,11.222222,0.498367,-0.777778,002cb1bb411c5f6022e47b53e4ceb27c,29.600000,29.600000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4033,ffbc12e4f821ec68,7a703c565ccaa18f,2014-02-15,5.333333,0.541667,18.000000,0.811688,15.500000,0.541667,18.000000,...,9.875000,0.617094,3.937500,0.345182,9.875000,0.617094,3.937500,ffbc12e4f821ec683591d0d5d382a381,11.666667,11.666667
4034,ffd3e3d37cba32da,92a9aa9c93192871,2014-10-25,15.666667,0.218406,12.272727,0.634401,3.363636,0.218406,12.272727,...,14.250000,0.543519,-1.250000,0.429282,14.250000,0.543519,-1.250000,ffd3e3d37cba32da7413b80dbb0f8f9f,24.333333,24.333333
4035,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000,0.406944,8.055556,0.487001,-2.777778,0.406944,8.055556,...,6.400000,0.547333,-7.200000,0.476768,6.400000,0.547333,-7.200000,ffe629a5232a878b08ae5cd9aef7ddd3,1.000000,1.000000
4036,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,2.000000,0.399883,9.278689,0.646161,-1.934426,0.399883,9.278689,...,16.266667,0.717127,3.000000,0.360567,16.266667,0.717127,3.000000,ffea776913451b6d22a92d7f62195791,13.000000,13.000000


In [31]:
model_df = model_df.loc[:,['fa3_asss_0', 'fa3_assa_0', 'fa3_assde_0', 'fa3_assdi_0', 'fa3_asss_1', 'fa3_assa_1', 'fa3_assde_1', 'fa3_assdi_1',
                           'ca_asss_0', 'ca_assa_0', 'ca_assde_0', 'ca_assdi_0', 'ca_asss_1', 'ca_assa_1', 'ca_assde_1', 'ca_assdi_1', 'casss_bout']]

In [32]:
model_df

Unnamed: 0,fa3_asss_0,fa3_assa_0,fa3_assde_0,fa3_assdi_0,fa3_asss_1,fa3_assa_1,fa3_assde_1,fa3_assdi_1,ca_asss_0,ca_assa_0,ca_assde_0,ca_assdi_0,ca_asss_1,ca_assa_1,ca_assde_1,ca_assdi_1,casss_bout
0,9.800000,0.281351,0.543474,1.200000,7.750000,0.601620,0.563305,0.750000,9.800000,0.281351,0.543474,1.200000,7.750000,0.601620,0.563305,0.750000,18.000000
1,12.900000,0.716091,0.546091,7.700000,6.666667,0.359259,0.561818,-14.333333,12.900000,0.716091,0.546091,7.700000,6.666667,0.359259,0.561818,-14.333333,10.333333
2,11.954545,0.319309,0.487594,-2.545455,21.444444,0.814489,0.667550,14.666667,11.954545,0.319309,0.487594,-2.545455,21.444444,0.814489,0.667550,14.666667,19.000000
3,7.666667,0.486772,0.775809,2.333333,18.111111,0.271754,0.673899,2.666667,7.666667,0.486772,0.775809,2.333333,18.111111,0.271754,0.673899,2.666667,60.000000
4,20.636364,0.408337,0.647542,15.090909,11.222222,0.460897,0.498367,-0.777778,20.636364,0.408337,0.647542,15.090909,11.222222,0.460897,0.498367,-0.777778,29.600000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4033,18.000000,0.541667,0.811688,15.500000,9.875000,0.345182,0.617094,3.937500,18.000000,0.541667,0.811688,15.500000,9.875000,0.345182,0.617094,3.937500,11.666667
4034,12.272727,0.218406,0.634401,3.363636,14.250000,0.429282,0.543519,-1.250000,12.272727,0.218406,0.634401,3.363636,14.250000,0.429282,0.543519,-1.250000,24.333333
4035,8.055556,0.406944,0.487001,-2.777778,6.400000,0.476768,0.547333,-7.200000,8.055556,0.406944,0.487001,-2.777778,6.400000,0.476768,0.547333,-7.200000,1.000000
4036,9.278689,0.399883,0.646161,-1.934426,16.266667,0.360567,0.717127,3.000000,9.278689,0.399883,0.646161,-1.934426,16.266667,0.360567,0.717127,3.000000,13.000000


In [33]:
model_df.to_csv('../../data/model_5_data.csv')