# Purpose
Analyze the distributions of our features and target as well as any correlations.

Features:
- Career Average Standing Significant Strike Attempts per 15 Minutes
    - This metric includes all significant strikes except those on the ground
- Career Average Takedown Attempts per 15 minutes
    
Target:
- Combined Average Standing Significant Strike Attempts per 15 Minutes
    - This is the sum of the per 15 minute rates of both fighters in the bout

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind

In [3]:
data = pd.read_csv('../../data/modelling_data/model_2_data.csv', index_col=0)
data.head()

Unnamed: 0,date_0,bout_id,fighter_id_0,ca_s_ss_a_p15m_0,ca_td_a_p15m_0,fighter_id_1,ca_s_ss_a_p15m_1,ca_td_a_p15m_1,c_s_ss_a_p15m
0,2006-07-08,000da3152b7b5ab1,d1a1314976c50bef,40.846154,2.587045,6da99156486ed6c2,77.963058,12.964286,108.0
1,2015-03-14,0027e179b743c86c,91ea901c458e95dd,58.128358,3.3,3aa794cbe1e3484b,56.0,0.0,67.375
2,2014-12-13,002921976d27b7da,ebc1f40e00e0c481,110.627907,0.61672,b4ad3a06ee4d660c,80.736725,1.455984,35.573123
3,2014-05-24,002c1562708ac307,44470bfd9483c7ad,43.0,4.0,22a92d7f62195791,189.508547,1.948718,406.097561
4,2006-03-04,002cb1bb411c5f60,d897897060f10a3a,150.972124,0.850746,22e47b53e4ceb27c,48.074099,1.851064,174.0


## Career Average Takedown Attempts per 15 Minutes

In [4]:
data.ca_td_a_p15m_0.describe()

count    3971.000000
mean        4.444201
std         3.884238
min         0.000000
25%         1.500000
50%         3.576923
75%         6.472534
max        34.615385
Name: ca_td_a_p15m_0, dtype: float64

In [5]:
data.ca_td_a_p15m_1.describe()

count    3971.000000
mean        4.372789
std         3.962695
min         0.000000
25%         1.456737
50%         3.597305
75%         6.474805
max        62.068966
Name: ca_td_a_p15m_1, dtype: float64

### Outliers
Lets check out the top performing fighters in each row

#### Fighter 0

In [6]:
data.sort_values('ca_td_a_p15m_0', ascending=False).head()

Unnamed: 0,date_0,bout_id,fighter_id_0,ca_s_ss_a_p15m_0,ca_td_a_p15m_0,fighter_id_1,ca_s_ss_a_p15m_1,ca_td_a_p15m_1,c_s_ss_a_p15m
3460,2014-09-20,de2dad48d95305d7,276c60b14b571dd4,0.0,34.615385,203c957eac95dd87,188.59979,0.472689,103.846154
3514,2002-01-11,e1bc211f43378ee7,73c7cfa551289285,76.767677,31.144781,44260175069b6276,66.787785,0.810559,93.0
2596,2009-04-01,a561360edf66ee6f,d35c3ed553b71fa2,93.103448,31.034483,054defd5420a551f,8.953926,7.06317,213.157895
1869,2008-10-18,757c11f17278b06a,58bc81376286b3d3,43.0,30.0,1208ffc5be6e31ad,77.647059,0.0,77.214953
2896,2008-04-02,b89dea8c3e6bba6f,cc6f9d1e89f3449a,64.186813,27.89011,c9bbf1a0285a8076,32.142857,8.035714,71.287129


If you add the fighter id to the end of this link 'http://www.ufcstats.com/fighter-details/', you can find out the names and records of each of these fighters.

Let's look at the first row: figher 0 in the first row has a career sig strike att. per 15m of about 940 on 2012-05-05. The fighter is Nick Denis who only had one UFC fight prior to this which was a first round knockout.

#### Fighter 1

In [12]:
data.sort_values('ca_s_ss_a_p15m_1', ascending=False).head()

Unnamed: 0,date_0,bout_id,fighter_id_0,ca_s_ss_a_p15m_0,fighter_id_1,ca_s_ss_a_p15m_1,c_s_ss_a_p15m
2796,2005-08-06,b2f604f2332cd8af,ea3ef6206c7907d5,66.0,9fe85152f351e737,585.0,368.181818
2086,2008-12-10,83406cd29d3c2d0f,cbf94e4c4af4ff6d,10.344828,89b8d1bf1ff09d1d,521.052632,279.0
832,2003-02-28,334cd8572d9842cb,50cc91ce2982785d,24.63222,2a542ee8a8b83559,507.480583,133.333333
1217,2007-09-22,4a7d39ebc9dce5db,7d21de9c6d7c98b2,70.967591,365fee2da473b177,484.615385,91.621622
3481,2018-11-03,df9aa51f6ccdfe02,f9b200db02b488d9,179.538732,dccb63727f2f5f74,450.0,337.0


Figher 1 in the first row has a career sig strike att. per 15m of 585 on 2005-08-06. The fighter is Mike Swick who only had one UFC fight prior to this which was a first round knockout.

### Correlation

In [14]:
data.corr()

Unnamed: 0,ca_s_ss_a_p15m_0,ca_s_ss_a_p15m_1,c_s_ss_a_p15m
ca_s_ss_a_p15m_0,1.0,0.142098,0.302473
ca_s_ss_a_p15m_1,0.142098,1.0,0.243444
c_s_ss_a_p15m,0.302473,0.243444,1.0


The correlations between the target and fighter 0 is higher than the correlation between the target and fighter 1. This is unexpected because the ordering of the ordering of the fighters was random.

# Conclusion
### Standing Significant Strike Rate
Measuring strike attempts per 15 minutes can be used as a measure of how much striking occurs in a fight. First round knockouts create some outliers that seem unrealistic because they are often the result of an extemely high rate of striking activity that is unsustainable. Therefore, Career Average Significant Strike Attempts per 15 Minutes cannot measure a fighters ability to sustain that rate. 


### Correlations
The career averages for fighter 0 have a pearson correlation coefficient of .3, while fighter 1 has .24. Ideally, we would see the almost exactly the same correlation between each fighter, because their order shouldn't matter.

This could be due to random chance but it may warrant some preprocessing that will randomize which order the fighters are listed.
