# Purpose
Previous iterations of this model had the target value set to either the Combined Total Successful Significant Strikes for the Bout or the Combined Average Successful Significant Strikes. Combined Total Successful Significant Strikes for the Bout (CTSSS-B) measures the total number of strikes landed by either fighter in the entire fight, while the Combined Average Successful Significant Strikes by Round (CASSS-R) measures the average number of shots landed by either fighter in each round. CTSSS-B was replaced because it does not take main events into consideration. If a fighter lasts 5 rounds instead of 3, then the CTSSS-B will likely be much higher, but the model has no way of knowing that. 

Now, CASSS-R will be replaced because it does not take into consideration early knockouts. If a two fighters punch nonstop from the begining of the round and one of them gets knocked out after one minute, you might have a CASSS-R score of around 35. What isn't captured in this metric is that that CASSS-R score would have been much higher had the fight lasted longer. Another change that is going to be made is that we will be measure attempted strikes instead of successful ones. A fight between two high level strikers may involve a lot of defense, which is not necessarily bad, so we want to include all strike attempts.

The new metric will be Combined Average Significant Strike Attempts per Minute (CASSA-M). This will take into account matches that don't last very long but are full of action, and it will recognize fighters who throw with high volume regardless of whether their opponent  avoids the hit.

### Result
This notebook will create a dataframe that will have a row for each bout and will include:
 - Features:
     - career average significant strike attempts per minute for each fighter (ca_assa_m)
     - career average significant strike defense (ca_ASSDe)
     - career average significant strike differential (ca_ASSDi)
     - 3-fight-average significant strike attempts per minute for each fighter (ca_assa_m)
     - 3-fight-average significant strike defense (fa3_ASSDe)
     - 3-fight-average significant strike differential (fa3_ASSDi)

 - Target:
     - Combined Average Significant Strike Attempts per Minute (CASSA_M)
 

In [2]:
%load_ext autoreload
%autoreload 2

import os
import sys
module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

import pandas as pd
from sqlalchemy import create_engine
from src import local
from src import functions

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
# Credentials
USER = local.user 
PASS = local.password
HOST = local.host
PORT = local.port

#create engine
engine = create_engine(f'postgresql://{USER}:{PASS}@{HOST}:{PORT}/match_finder')

# Get data from postgres database
### Join tables

- get the date from the events table
- use the bouts table to join the dates to the general table
- use the general table to join the bouts with the fighters


#### Accuracy case statement

In [4]:
accuracy_column = """
CASE 
    WHEN (sig_str_attempted > 0) THEN (CAST(sig_str_successful AS FLOAT)/CAST(sig_str_attempted AS FLOAT))
    ELSE 0
END AS accuracy
"""

In [5]:
query = """
SELECT bout_link, fighter_link, sig_str_attempted, 
sig_str_successful, "Date", round, "Time", 
"Round" as final_round, "Timeformat",
"""+accuracy_column+"""
FROM strikes_cleaned
JOIN bouts ON bouts.link = strikes_cleaned.bout_link
JOIN events ON events.link = bouts.event_link
"""

data = pd.read_sql(query, engine)

In [6]:
data = functions.format_data(data, event=False)

In [7]:
data

Unnamed: 0,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,Time,final_round,Timeformat,accuracy,fighter_id,bout_id
0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,5:00,5,5 Rnd (5-5-5-5-5),0.366667,e1147d3d2dabe1ce,11f715fa5e825e51
6,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,5:00,5,5 Rnd (5-5-5-5-5),0.500000,e1147d3d2dabe1ce,11f715fa5e825e51
12,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,5:00,5,5 Rnd (5-5-5-5-5),0.406250,e1147d3d2dabe1ce,11f715fa5e825e51
18,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,5:00,5,5 Rnd (5-5-5-5-5),0.382353,e1147d3d2dabe1ce,11f715fa5e825e51
24,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,5:00,5,5 Rnd (5-5-5-5-5),0.548387,e1147d3d2dabe1ce,11f715fa5e825e51
...,...,...,...,...,...,...,...,...,...,...,...,...
26447,http://www.ufcstats.com/fight-details/cecdc0da...,http://www.ufcstats.com/fighter-details/a5c53b...,0,0,1993-11-12,1,2:18,1,No Time Limit,0.000000,a5c53b3ddb31cc7d,cecdc0da584274b9
26448,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/598a58...,27,15,1993-11-12,1,4:20,1,No Time Limit,0.555556,598a58db87b890ee,2d2bbc86e941e05c
26449,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/d3711d...,28,12,1993-11-12,1,4:20,1,No Time Limit,0.428571,d3711d3784b76255,2d2bbc86e941e05c
26450,http://www.ufcstats.com/fight-details/567a09fd...,http://www.ufcstats.com/fighter-details/279093...,5,3,1993-11-12,1,0:26,1,No Time Limit,0.600000,279093302a6f44b3,567a09fd200cfa05


In [8]:
data_original = data

In order to get the striking defense, we need each column to include the fighters opponent. 

In [9]:
data_0 = functions.merge_fighter_instances(data, rounds=True)
data_1 = functions.merge_fighter_instances(data, rounds=True, flip=True)

data = pd.concat((data_0, data_1))

In [10]:
data = data.loc[:, ['bout_link_0', 'fighter_link_0', 'sig_str_attempted_0',
                    'sig_str_successful_0', 'Date_0', 'round_0', 'accuracy_0',
                    'fighter_id_0', 'bout_id_0', 'round_id', 'inst_id_0', 
                    'sig_str_attempted_1', 'sig_str_successful_1', 'accuracy_1',
                    'Time_0', 'Timeformat_0']]

In [11]:
data

Unnamed: 0,bout_link_0,fighter_link_0,sig_str_attempted_0,sig_str_successful_0,Date_0,round_0,accuracy_0,fighter_id_0,bout_id_0,round_id,inst_id_0,sig_str_attempted_1,sig_str_successful_1,accuracy_1,Time_0,Timeformat_0
0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.500000,5:00,5 Rnd (5-5-5-5-5)
1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.500000,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.250000,5:00,5 Rnd (5-5-5-5-5)
2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.406250,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,5:00,5 Rnd (5-5-5-5-5)
3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053,5:00,5 Rnd (5-5-5-5-5)
4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739,5:00,5 Rnd (5-5-5-5-5)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13102,http://www.ufcstats.com/fight-details/ac7ca2ec...,http://www.ufcstats.com/fighter-details/279093...,17,11,1993-11-12,1,0.647059,279093302a6f44b3,ac7ca2ec38b96c1a,ac7ca2ec38b96c1a1,ac7ca2ec38b96c1a279093302a6f44b3,3,0,0.000000,0:59,No Time Limit
13103,http://www.ufcstats.com/fight-details/46acd54c...,http://www.ufcstats.com/fighter-details/46c8ec...,8,4,1993-11-12,1,0.500000,46c8ec317aff28ac,46acd54cc0c905fb,46acd54cc0c905fb1,46acd54cc0c905fb46c8ec317aff28ac,1,1,1.000000,1:49,No Time Limit
13104,http://www.ufcstats.com/fight-details/cecdc0da...,http://www.ufcstats.com/fighter-details/429e7d...,3,0,1993-11-12,1,0.000000,429e7d3725852ce9,cecdc0da584274b9,cecdc0da584274b91,cecdc0da584274b9429e7d3725852ce9,0,0,0.000000,2:18,No Time Limit
13105,http://www.ufcstats.com/fight-details/2d2bbc86...,http://www.ufcstats.com/fighter-details/598a58...,27,15,1993-11-12,1,0.555556,598a58db87b890ee,2d2bbc86e941e05c,2d2bbc86e941e05c1,2d2bbc86e941e05c598a58db87b890ee,28,12,0.428571,4:20,No Time Limit


### Calculate significant strike defense by subtracting the opponents accuracy from 1

In [12]:
data['ssd_0'] = 1 - data['accuracy_1']

Clean the columns for continuity

In [14]:
data.columns

Index(['bout_link_0', 'fighter_link_0', 'sig_str_attempted_0',
       'sig_str_successful_0', 'Date_0', 'round_0', 'accuracy_0',
       'fighter_id_0', 'bout_id_0', 'round_id', 'inst_id_0',
       'sig_str_attempted_1', 'sig_str_successful_1', 'accuracy_1', 'Time_0',
       'Timeformat_0', 'ssd_0'],
      dtype='object')

In [16]:
data.columns = ['bout_link', 'fighter_link', 'sig_str_attempted',
                'sig_str_successful', 'Date', 'round', 'accuracy',
                'fighter_id', 'bout_id', 'round_id', 'inst_id',
                'sig_str_attempted_1', 'sig_str_successful_1', 'accuracy_1', 
                'time', 'timeformat', 'ssde']

In [17]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26214 entries, 0 to 13106
Data columns (total 17 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   bout_link             26214 non-null  object        
 1   fighter_link          26214 non-null  object        
 2   sig_str_attempted     26214 non-null  int64         
 3   sig_str_successful    26214 non-null  int64         
 4   Date                  26214 non-null  datetime64[ns]
 5   round                 26214 non-null  object        
 6   accuracy              26214 non-null  float64       
 7   fighter_id            26214 non-null  object        
 8   bout_id               26214 non-null  object        
 9   round_id              26214 non-null  object        
 10  inst_id               26214 non-null  object        
 11  sig_str_attempted_1   26214 non-null  int64         
 12  sig_str_successful_1  26214 non-null  int64         
 13  accuracy_1      

### Calculating significant strike differential

In [18]:
data['ssdi'] = data['sig_str_successful'] - data['sig_str_successful_1']

In [19]:
data.reset_index(inplace = True)

In [20]:
data[data['bout_id'] == data['bout_id'][0]]

Unnamed: 0,index,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id,round_id,inst_id,sig_str_attempted_1,sig_str_successful_1,accuracy_1,time,timeformat,ssde,ssdi
0,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.5,5:00,5 Rnd (5-5-5-5-5),0.5,-1
1,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.5,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.25,5:00,5 Rnd (5-5-5-5-5),0.75,12
2,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.40625,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,5:00,5 Rnd (5-5-5-5-5),0.714286,7
3,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053,5:00,5 Rnd (5-5-5-5-5),0.578947,5
4,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739,5:00,5 Rnd (5-5-5-5-5),0.478261,5
13107,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,24,12,2020-07-25,1,0.5,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e519ce6d5a03af801b7,30,11,0.366667,5:00,5 Rnd (5-5-5-5-5),0.633333,1
13108,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,12,3,2020-07-25,2,0.25,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e519ce6d5a03af801b7,30,15,0.5,5:00,5 Rnd (5-5-5-5-5),0.5,-12
13109,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,21,6,2020-07-25,3,0.285714,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e519ce6d5a03af801b7,32,13,0.40625,5:00,5 Rnd (5-5-5-5-5),0.59375,-7
13110,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,19,8,2020-07-25,4,0.421053,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e519ce6d5a03af801b7,34,13,0.382353,5:00,5 Rnd (5-5-5-5-5),0.617647,-5
13111,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,23,12,2020-07-25,5,0.521739,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e519ce6d5a03af801b7,31,17,0.548387,5:00,5 Rnd (5-5-5-5-5),0.451613,-5


# Calculate ASSA-M
This is the Average Significant Strike Attempts per Minute. This will be used to calculate the CASSA-M referenced at the start of this notebook.

### Create round length column
First we need to have a time for each round. The current time column only measures the time on the clock at which the fight was stopped. This number can only be used for the last round. We'll group all the bout_ids and create a datafram that matches the highest round_id value to the time column.

This will be based on the assumption that all rounds in the UFC are 5 minutes long, let's see if that's accurate.

In [21]:
data.timeformat.value_counts()

3 Rnd (5-5-5)           22652
5 Rnd (5-5-5-5-5)        2964
1 Rnd + OT (12-3)         192
3 Rnd + OT (5-5-5-5)      106
No Time Limit              74
1 Rnd + 2OT (15-3-3)       62
2 Rnd (5-5)                50
1 Rnd (20)                 42
1 Rnd (15)                 16
1 Rnd (10)                 12
1 Rnd (12)                  8
1 Rnd + OT (15-3)           6
1 Rnd + OT (30-5)           6
1 Rnd + 2OT (24-3-3)        6
1 Rnd + OT (27-3)           4
1 Rnd (18)                  4
1 Rnd + OT (31-5)           4
1 Rnd + OT (30-3)           4
1 Rnd (30)                  2
Name: timeformat, dtype: int64

It looks like there are a lot of different round formats. They are likely from before they standardized the rules, so let's look at when these rounds took place.

In [24]:
data.groupby('timeformat').Date.max()

timeformat
1 Rnd (10)             1996-02-16
1 Rnd (12)             1998-05-15
1 Rnd (15)             1996-09-20
1 Rnd (18)             1995-12-16
1 Rnd (20)             1995-09-08
1 Rnd (30)             1995-04-07
1 Rnd + 2OT (15-3-3)   1999-05-07
1 Rnd + 2OT (24-3-3)   1996-05-17
1 Rnd + OT (12-3)      1999-05-07
1 Rnd + OT (15-3)      1996-09-20
1 Rnd + OT (27-3)      1995-12-16
1 Rnd + OT (30-3)      1995-09-08
1 Rnd + OT (30-5)      1995-09-08
1 Rnd + OT (31-5)      1995-04-07
2 Rnd (5-5)            2001-02-23
3 Rnd (5-5-5)          2020-07-25
3 Rnd + OT (5-5-5-5)   2018-11-30
5 Rnd (5-5-5-5-5)      2020-07-25
No Time Limit          1994-12-16
Name: Date, dtype: datetime64[ns]

It looks like all rounds were 5 minutes long starting in 2001. Let's see what our data would look like if we dropped all fights before 2001.

In [30]:
cutoff = pd.to_datetime('2001')

data[data['Date']>=cutoff]

Unnamed: 0,index,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id,round_id,inst_id,sig_str_attempted_1,sig_str_successful_1,accuracy_1,time,timeformat,ssde,ssdi
0,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.500000,5:00,5 Rnd (5-5-5-5-5),0.500000,-1
1,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.500000,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.250000,5:00,5 Rnd (5-5-5-5-5),0.750000,12
2,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.406250,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,5:00,5 Rnd (5-5-5-5-5),0.714286,7
3,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053,5:00,5 Rnd (5-5-5-5-5),0.578947,5
4,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739,5:00,5 Rnd (5-5-5-5-5),0.478261,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25848,12741,http://www.ufcstats.com/fight-details/651da45c...,http://www.ufcstats.com/fighter-details/4604ab...,1,0,2001-02-23,1,0.000000,4604ab1de9058474,651da45cc83ce011,651da45cc83ce0111,651da45cc83ce0114604ab1de9058474,23,20,0.869565,3:27,2 Rnd (5-5),0.130435,-20
25849,12742,http://www.ufcstats.com/fight-details/a949b05c...,http://www.ufcstats.com/fighter-details/1652f3...,40,30,2001-02-23,1,0.750000,1652f3213655b935,a949b05c64e43131,a949b05c64e431311,a949b05c64e431311652f3213655b935,1,0,0.000000,5:00,2 Rnd (5-5),1.000000,30
25850,12743,http://www.ufcstats.com/fight-details/a949b05c...,http://www.ufcstats.com/fighter-details/1652f3...,14,9,2001-02-23,2,0.642857,1652f3213655b935,a949b05c64e43131,a949b05c64e431312,a949b05c64e431311652f3213655b935,2,2,1.000000,5:00,2 Rnd (5-5),0.000000,7
25851,12744,http://www.ufcstats.com/fight-details/bfb468c3...,http://www.ufcstats.com/fighter-details/029880...,8,7,2001-02-23,1,0.875000,029880cdbf5ca089,bfb468c3427faa50,bfb468c3427faa501,bfb468c3427faa50029880cdbf5ca089,1,1,1.000000,4:47,2 Rnd (5-5),0.000000,6


We still have most of our fights, so we'll use this as our timeframe from now on.

In [31]:
data = data[data['Date']>=cutoff]

In [35]:
bout_groups = data.groupby('bout_id')
round_id = bout_groups.round_id.max()
round_length = bout_groups.time.max()

final_round_lengths = pd.DataFrame(dict(round_id = round_id, round_length = round_length))

In [38]:
final_round_lengths.set_index('round_id', inplace=True)
final_round_lengths

Unnamed: 0_level_0,round_length
round_id,Unnamed: 1_level_1
000da3152b7b5ab13,5:00
0019ec81fd706ade3,5:00
0027e179b743c86c3,3:12
002921976d27b7da1,4:13
002c1562708ac3071,4:06
...,...
ffd3e3d37cba32da3,5:00
ffe4379d6bd1e82b2,1:43
ffe629a5232a878b1,1:59
ffea776913451b6d1,2:37


In [40]:
new_data = data.join(final_round_lengths, on='round_id', how='outer')
new_data.head(15)

Unnamed: 0,index,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id,round_id,inst_id,sig_str_attempted_1,sig_str_successful_1,accuracy_1,time,timeformat,ssde,ssdi,round_length
0,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.5,5:00,5 Rnd (5-5-5-5-5),0.5,-1,
13107,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,24,12,2020-07-25,1,0.5,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e511,11f715fa5e825e519ce6d5a03af801b7,30,11,0.366667,5:00,5 Rnd (5-5-5-5-5),0.633333,1,
1,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.5,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.25,5:00,5 Rnd (5-5-5-5-5),0.75,12,
13108,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,12,3,2020-07-25,2,0.25,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e512,11f715fa5e825e519ce6d5a03af801b7,30,15,0.5,5:00,5 Rnd (5-5-5-5-5),0.5,-12,
2,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.40625,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,5:00,5 Rnd (5-5-5-5-5),0.714286,7,
13109,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,21,6,2020-07-25,3,0.285714,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e513,11f715fa5e825e519ce6d5a03af801b7,32,13,0.40625,5:00,5 Rnd (5-5-5-5-5),0.59375,-7,
3,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,34,13,2020-07-25,4,0.382353,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e51e1147d3d2dabe1ce,19,8,0.421053,5:00,5 Rnd (5-5-5-5-5),0.578947,5,
13110,3,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,19,8,2020-07-25,4,0.421053,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e514,11f715fa5e825e519ce6d5a03af801b7,34,13,0.382353,5:00,5 Rnd (5-5-5-5-5),0.617647,-5,
4,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,31,17,2020-07-25,5,0.548387,e1147d3d2dabe1ce,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e51e1147d3d2dabe1ce,23,12,0.521739,5:00,5 Rnd (5-5-5-5-5),0.478261,5,5:00
13111,4,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,23,12,2020-07-25,5,0.521739,9ce6d5a03af801b7,11f715fa5e825e51,11f715fa5e825e515,11f715fa5e825e519ce6d5a03af801b7,31,17,0.548387,5:00,5 Rnd (5-5-5-5-5),0.451613,-5,5:00


 Now that we have the final rounds filled in, ever null value should be '5:00'.

In [43]:
new_data.round_length = new_data.round_length.fillna('5:00')

In [44]:
new_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25492 entries, 0 to 25852
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   index                 25492 non-null  int64         
 1   bout_link             25492 non-null  object        
 2   fighter_link          25492 non-null  object        
 3   sig_str_attempted     25492 non-null  int64         
 4   sig_str_successful    25492 non-null  int64         
 5   Date                  25492 non-null  datetime64[ns]
 6   round                 25492 non-null  object        
 7   accuracy              25492 non-null  float64       
 8   fighter_id            25492 non-null  object        
 9   bout_id               25492 non-null  object        
 10  round_id              25492 non-null  object        
 11  inst_id               25492 non-null  object        
 12  sig_str_attempted_1   25492 non-null  int64         
 13  sig_str_successf

### Calculate assa-m

before calculating, we need to convert the round length column into a timedelta object.

In [49]:
new_data.round_length = '00:0' + new_data.round_length

In [51]:
new_data.round_length = pd.to_timedelta(new_data.round_length)

In [52]:
new_data.round_length.describe()

count                     25492
mean     0 days 00:04:28.584340
std      0 days 00:01:10.311186
min             0 days 00:00:05
25%             0 days 00:05:00
50%             0 days 00:05:00
75%             0 days 00:05:00
max             0 days 00:05:00
Name: round_length, dtype: object

In [55]:
new_data.round_length = new_data.round_length.map(lambda x: x.total_seconds()/60)

In [56]:
new_data['ssa_m'] = new_data['sig_str_attempted'] / new_data.round_length
new_data

Unnamed: 0,index,bout_link,fighter_link,sig_str_attempted,sig_str_successful,Date,round,accuracy,fighter_id,bout_id,...,inst_id,sig_str_attempted_1,sig_str_successful_1,accuracy_1,time,timeformat,ssde,ssdi,round_length,ssa_m
0,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,11,2020-07-25,1,0.366667,e1147d3d2dabe1ce,11f715fa5e825e51,...,11f715fa5e825e51e1147d3d2dabe1ce,24,12,0.500000,5:00,5 Rnd (5-5-5-5-5),0.500000,-1,5.000000,6.000000
13107,0,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,24,12,2020-07-25,1,0.500000,9ce6d5a03af801b7,11f715fa5e825e51,...,11f715fa5e825e519ce6d5a03af801b7,30,11,0.366667,5:00,5 Rnd (5-5-5-5-5),0.633333,1,5.000000,4.800000
1,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,30,15,2020-07-25,2,0.500000,e1147d3d2dabe1ce,11f715fa5e825e51,...,11f715fa5e825e51e1147d3d2dabe1ce,12,3,0.250000,5:00,5 Rnd (5-5-5-5-5),0.750000,12,5.000000,6.000000
13108,1,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/9ce6d5...,12,3,2020-07-25,2,0.250000,9ce6d5a03af801b7,11f715fa5e825e51,...,11f715fa5e825e519ce6d5a03af801b7,30,15,0.500000,5:00,5 Rnd (5-5-5-5-5),0.500000,-12,5.000000,2.400000
2,2,http://www.ufcstats.com/fight-details/11f715fa...,http://www.ufcstats.com/fighter-details/e1147d...,32,13,2020-07-25,3,0.406250,e1147d3d2dabe1ce,11f715fa5e825e51,...,11f715fa5e825e51e1147d3d2dabe1ce,21,6,0.285714,5:00,5 Rnd (5-5-5-5-5),0.714286,7,5.000000,6.400000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25850,12743,http://www.ufcstats.com/fight-details/a949b05c...,http://www.ufcstats.com/fighter-details/1652f3...,14,9,2001-02-23,2,0.642857,1652f3213655b935,a949b05c64e43131,...,a949b05c64e431311652f3213655b935,2,2,1.000000,5:00,2 Rnd (5-5),0.000000,7,5.000000,2.800000
12744,12744,http://www.ufcstats.com/fight-details/bfb468c3...,http://www.ufcstats.com/fighter-details/cb6783...,1,1,2001-02-23,1,1.000000,cb6783c39c01d896,bfb468c3427faa50,...,bfb468c3427faa50cb6783c39c01d896,8,7,0.875000,4:47,2 Rnd (5-5),0.125000,-6,5.000000,0.200000
25851,12744,http://www.ufcstats.com/fight-details/bfb468c3...,http://www.ufcstats.com/fighter-details/029880...,8,7,2001-02-23,1,0.875000,029880cdbf5ca089,bfb468c3427faa50,...,bfb468c3427faa50029880cdbf5ca089,1,1,1.000000,4:47,2 Rnd (5-5),0.000000,6,5.000000,1.600000
12745,12745,http://www.ufcstats.com/fight-details/bfb468c3...,http://www.ufcstats.com/fighter-details/cb6783...,6,0,2001-02-23,2,0.000000,cb6783c39c01d896,bfb468c3427faa50,...,bfb468c3427faa50cb6783c39c01d896,8,5,0.625000,4:47,2 Rnd (5-5),0.375000,-5,4.783333,1.254355


In [57]:
new_data.ssa_m.describe()

count    25492.000000
mean         7.804384
std          5.658327
min          0.000000
25%          3.800000
50%          6.800000
75%         10.400000
max         95.000000
Name: ssa_m, dtype: float64

In [91]:
data=new_data

## Create fighter-bout instance dataframe

A fighter-bout instance represents one fighter in one bout.
 - The same fighter has exactly one fighter-bout instance for every single bout he has been in. 
 - Every bout has exactly two fighter-bout instances, one for each fighter in the bout. 
  
In this case a fighter-bout instance is assigned a unique identifier comprised of the bout_id combined with the fighter_link.

In [92]:
def create_fighter_bout_instance_table(data):
    """
    input: dataframe, a formatted fighter round instance table, either general stats or strike stats.
    output: new fighter bout instance dataframe 

    A fighter-bout instance represents one fighter in one bout.
     - The same fighter has exactly one fighter-bout instance for every single bout he has been in. 
     - Every bout has exactly two fighter-bout instances, one for each fighter in the bout. 
    In this case a fighter-bout instance is assigned a unique identifier comprised of the bout_id combined with the fighter_link.
    """
    
    data['fighter_bout_inst'] = data['bout_id'] + data['fighter_id']
    fighter_bout_inst_group = data.groupby(['fighter_bout_inst'])

    ssa_m = fighter_bout_inst_group.ssa_m.mean()
    date = fighter_bout_inst_group['Date'].max()
    fighter_id = fighter_bout_inst_group['fighter_id'].max()
    bout_id = fighter_bout_inst_group['bout_id'].max()

    fighter_bout_inst = pd.DataFrame(dict(bout_id = bout_id, fighter_id = fighter_id, date = date, ssa_m = ssa_m))
    return fighter_bout_inst

In [93]:
fighter_bout_inst = create_fighter_bout_instance_table(data)

In [94]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,ssa_m
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,5.866667
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,2.600000
0019ec81fd706ade326f94d6cfb1bf25,0019ec81fd706ade,326f94d6cfb1bf25,2019-10-18,6.466667
0019ec81fd706ade85073dbd1be65ed9,0019ec81fd706ade,85073dbd1be65ed9,2019-10-18,7.000000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,2.550000
...,...,...,...,...
ffe629a5232a878bb361180739bed4b0,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,14.522293
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,7.643312
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,7.866667


### Remove debut fights
There isn't any historical data for fighters with debut fights, so for now we will not use them in our analysis.

In [95]:
fighter_bout_inst = functions.remove_debut_bouts(fighter_bout_inst)

In [96]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,ssa_m
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,5.866667
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,2.600000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,2.550000
0027e179b743c86c91ea901c458e95dd,0027e179b743c86c,91ea901c458e95dd,2015-03-14,3.412500
002921976d27b7dab4ad3a06ee4d660c,002921976d27b7da,b4ad3a06ee4d660c,2014-12-13,5.928854
...,...,...,...,...
ffe629a5232a878bb361180739bed4b0,ffe629a5232a878b,b361180739bed4b0,2003-06-06,0.000000
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,14.522293
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,7.643312
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,7.866667


## Calculate metrics

The metrics I will using in this notebook are:
 - average successful significant strikes for each fighter (ASSS)
 - average significant strike accuracy (ASSA)
 - average significant strike defense (ASSD)

### Career Averages

In [97]:
ca_assa_m = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssa_m', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assa_m'] = ca_assa_m

ca_assde = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssde', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assde'] = ca_assde

ca_assdi = fighter_bout_inst.apply(lambda row: functions.calculate_metric_average('ssdi', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['ca_assdi'] = ca_assdi

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_assa_m'] = ca_assa_m
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_assde'] = ca_assde
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['ca_assdi'] = ca_assdi


### I need to remove the debut fights from this dataframe

load the previous blacklist

In [98]:
import pickle

In [99]:
debut_bouts = pickle.load(open('../../src/debut_bouts.pkl', 'rb'))

In [100]:
debut_bouts_2001 = list(fighter_bout_inst[fighter_bout_inst.ca_assa_m.isna()].bout_id.unique())

In [101]:
debut_bouts_2001 = debut_bouts+debut_bouts_2001

Save this blacklist for future notebooks

In [102]:
pickle.dump(debut_bouts_2001, open('../../src/debut_bouts_2001.pkl', 'wb'))

blacklist the new set of events

In [103]:
mask = fighter_bout_inst['bout_id'].map(lambda x: functions.black_list_entry(x, debut_bouts_2001))
fighter_bout_inst = fighter_bout_inst[mask]

In [104]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,ssa_m,ca_assa_m,ca_assde,ca_assdi
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,5.866667,7.340394,0.563305,0.750000
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,2.600000,5.455385,0.543474,1.200000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,2.550000,3.733333,0.561818,-14.333333
0027e179b743c86c91ea901c458e95dd,0027e179b743c86c,91ea901c458e95dd,2015-03-14,3.412500,5.235224,0.546091,7.700000
002921976d27b7dab4ad3a06ee4d660c,002921976d27b7da,b4ad3a06ee4d660c,2014-12-13,5.928854,7.832834,0.667550,14.666667
...,...,...,...,...,...,...,...
ffd3e3d37cba32da92a9aa9c93192871,ffd3e3d37cba32da,92a9aa9c93192871,2014-10-25,10.266667,9.590387,0.634401,3.363636
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,14.522293,10.637601,0.717127,3.000000
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,7.643312,5.866690,0.646161,-1.934426
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,7.866667,5.037644,0.572856,1.600000


### 3-Fight Averages

In [105]:
fa3_assa_m = fighter_bout_inst.apply(lambda row: functions.calculate_3_fight_average('ssa_m', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assa_m'] = fa3_assa_m

fa3_assde = fighter_bout_inst.apply(lambda row: functions.calculate_3_fight_average('ssde', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assde'] = fa3_assde

fa3_assdi = fighter_bout_inst.apply(lambda row: functions.calculate_3_fight_average('ssdi', row['fighter_id'], row['date'], data), axis=1)
fighter_bout_inst['fa3_assdi'] = fa3_assdi

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assa_m'] = fa3_assa_m
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assde'] = fa3_assde
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fighter_bout_inst['fa3_assdi'] = fa3_assdi


In [106]:
fighter_bout_inst

Unnamed: 0_level_0,bout_id,fighter_id,date,ssa_m,ca_assa_m,ca_assde,ca_assdi,fa3_assa_m,fa3_assde,fa3_assdi
fighter_bout_inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
000da3152b7b5ab16da99156486ed6c2,000da3152b7b5ab1,6da99156486ed6c2,2006-07-08,5.866667,7.340394,0.563305,0.750000,7.340394,0.563305,0.750000
000da3152b7b5ab1d1a1314976c50bef,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,2.600000,5.455385,0.543474,1.200000,5.455385,0.543474,1.200000
0027e179b743c86c3aa794cbe1e3484b,0027e179b743c86c,3aa794cbe1e3484b,2015-03-14,2.550000,3.733333,0.561818,-14.333333,3.733333,0.561818,-14.333333
0027e179b743c86c91ea901c458e95dd,0027e179b743c86c,91ea901c458e95dd,2015-03-14,3.412500,5.235224,0.546091,7.700000,4.478891,0.429391,8.857143
002921976d27b7dab4ad3a06ee4d660c,002921976d27b7da,b4ad3a06ee4d660c,2014-12-13,5.928854,7.832834,0.667550,14.666667,7.966498,0.707744,17.800000
...,...,...,...,...,...,...,...,...,...,...
ffd3e3d37cba32da92a9aa9c93192871,ffd3e3d37cba32da,92a9aa9c93192871,2014-10-25,10.266667,9.590387,0.634401,3.363636,11.684034,0.613337,-1.000000
ffea776913451b6d22a92d7f62195791,ffea776913451b6d,22a92d7f62195791,2015-02-28,14.522293,10.637601,0.717127,3.000000,7.297451,0.781969,3.500000
ffea776913451b6d75e5fec9f72910ef,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,7.643312,5.866690,0.646161,-1.934426,6.533333,0.623848,-4.111111
fffa21388cdd78b75d7bdab5e03e3216,fffa21388cdd78b7,5d7bdab5e03e3216,2013-10-19,7.866667,5.037644,0.572856,1.600000,6.237844,0.599334,0.625000


## Create the final dataframe

First I will get a list af all bout ids. Then I will create a dataframe with the first row and another dataframe with the second row. Then I will join those dataframes along the column axis.

In [107]:
model_df = functions.merge_fighter_instances(fighter_bout_inst)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  instances_df['inst_id'] = instances_df['bout_id'] + instances_df['fighter_id']


## Creating tsss_bout

tsss_bout: Total Successful Significant Strikes for the Bout. This metric measure the combined total number of significant strikes in a bout.

In [108]:
model_df['cssa_m'] = model_df['ssa_m_0'] + model_df['ssa_m_1']
model_df

Unnamed: 0,bout_id,fighter_id_0,date_0,ssa_m_0,ca_assa_m_0,ca_assde_0,ca_assdi_0,fa3_assa_m_0,fa3_assde_0,fa3_assdi_0,...,date_1,ssa_m_1,ca_assa_m_1,ca_assde_1,ca_assdi_1,fa3_assa_m_1,fa3_assde_1,fa3_assdi_1,inst_id_1,cssa_m
0,000da3152b7b5ab1,d1a1314976c50bef,2006-07-08,2.600000,5.455385,0.543474,1.200000,5.455385,0.543474,1.200000,...,2006-07-08,5.866667,7.340394,0.563305,0.750000,7.340394,0.563305,0.750000,000da3152b7b5ab16da99156486ed6c2,8.466667
1,0027e179b743c86c,91ea901c458e95dd,2015-03-14,3.412500,5.235224,0.546091,7.700000,4.478891,0.429391,8.857143,...,2015-03-14,2.550000,3.733333,0.561818,-14.333333,3.733333,0.561818,-14.333333,0027e179b743c86c3aa794cbe1e3484b,5.962500
2,002921976d27b7da,ebc1f40e00e0c481,2014-12-13,1.185771,8.788065,0.487594,-2.545455,9.102954,0.465318,-3.000000,...,2014-12-13,5.928854,7.832834,0.667550,14.666667,7.966498,0.707744,17.800000,002921976d27b7dab4ad3a06ee4d660c,7.114625
3,002c1562708ac307,44470bfd9483c7ad,2014-05-24,10.731707,3.200000,0.775809,2.333333,3.200000,0.775809,2.333333,...,2014-05-24,16.341463,12.864367,0.673899,2.666667,12.805102,0.642872,-0.714286,002c1562708ac30722a92d7f62195791,27.073171
4,002cb1bb411c5f60,d897897060f10a3a,2006-03-04,9.760000,11.630700,0.647542,15.090909,14.315380,0.788792,17.333333,...,2006-03-04,2.880000,5.500106,0.498367,-0.777778,5.758914,0.650062,4.800000,002cb1bb411c5f6022e47b53e4ceb27c,12.640000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3938,ff872fa3e9ec32a9,b7d524c77c27389b,2008-06-07,3.666667,6.067757,0.627876,4.545455,8.445440,0.501709,4.000000,...,2008-06-07,5.733333,10.791165,0.563326,2.615385,6.511111,0.513123,1.222222,ff872fa3e9ec32a99fe85152f351e737,9.400000
3939,ffbc12e4f821ec68,7a703c565ccaa18f,2014-02-15,3.133333,10.834736,0.811688,15.500000,10.834736,0.811688,15.500000,...,2014-02-15,4.733333,4.844875,0.617094,3.937500,4.955556,0.704265,1.777778,ffbc12e4f821ec683591d0d5d382a381,7.866667
3940,ffd3e3d37cba32da,92a9aa9c93192871,2014-10-25,10.266667,9.590387,0.634401,3.363636,11.684034,0.613337,-1.000000,...,2014-10-25,5.466667,14.107946,0.543519,-1.250000,14.441631,0.560847,0.571429,ffd3e3d37cba32da7413b80dbb0f8f9f,15.733333
3941,ffea776913451b6d,75e5fec9f72910ef,2015-02-28,7.643312,5.866690,0.646161,-1.934426,6.533333,0.623848,-4.111111,...,2015-02-28,14.522293,10.637601,0.717127,3.000000,7.297451,0.781969,3.500000,ffea776913451b6d22a92d7f62195791,22.165605


In [110]:
model_df = model_df.loc[:,['fa3_assa_m_0', 'fa3_assa_m_1', 
                           'fa3_assde_0', 'fa3_assde_1', 
                           'fa3_assdi_0', 'fa3_assdi_1',
                           'ca_assa_m_0', 'ca_assa_m_1',
                           'ca_assde_0', 'ca_assde_1', 
                           'ca_assdi_0', 'ca_assdi_1',
                           'cssa_m']]

In [111]:
model_df

Unnamed: 0,fa3_assa_m_0,fa3_assa_m_1,fa3_assde_0,fa3_assde_1,fa3_assdi_0,fa3_assdi_1,ca_assa_m_0,ca_assa_m_1,ca_assde_0,ca_assde_1,ca_assdi_0,ca_assdi_1,cssa_m
0,5.455385,7.340394,0.543474,0.563305,1.200000,0.750000,5.455385,7.340394,0.543474,0.563305,1.200000,0.750000,8.466667
1,4.478891,3.733333,0.429391,0.561818,8.857143,-14.333333,5.235224,3.733333,0.546091,0.561818,7.700000,-14.333333,5.962500
2,9.102954,7.966498,0.465318,0.707744,-3.000000,17.800000,8.788065,7.832834,0.487594,0.667550,-2.545455,14.666667,7.114625
3,3.200000,12.805102,0.775809,0.642872,2.333333,-0.714286,3.200000,12.864367,0.775809,0.673899,2.333333,2.666667,27.073171
4,14.315380,5.758914,0.788792,0.650062,17.333333,4.800000,11.630700,5.500106,0.647542,0.498367,15.090909,-0.777778,12.640000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3938,8.445440,6.511111,0.501709,0.513123,4.000000,1.222222,6.067757,10.791165,0.627876,0.563326,4.545455,2.615385,9.400000
3939,10.834736,4.955556,0.811688,0.704265,15.500000,1.777778,10.834736,4.844875,0.811688,0.617094,15.500000,3.937500,7.866667
3940,11.684034,14.441631,0.613337,0.560847,-1.000000,0.571429,9.590387,14.107946,0.634401,0.543519,3.363636,-1.250000,15.733333
3941,6.533333,7.297451,0.623848,0.781969,-4.111111,3.500000,5.866690,10.637601,0.646161,0.717127,-1.934426,3.000000,22.165605


In [112]:
model_df.to_csv('../../data/model_6_data.csv')