# Data Science 5K Capstone Proposal
In order to get your capstone approved, you must complete all of the following steps.

## 1) Get your data
You may use any data set(s) you like, so long as they meet these criteria:

* Your data cannot have _anything_ to do with your work at Booz Allen Hamilton.
* Your data must be publically available for free.
* Your data should be interesting to _you_. You want your capstone to be something you're proud of.
* Your data should be "big enough":
    - It should have at least 1,000 rows.
    - It should have enough of columns to be interesting.
    - If you have questions, contact a member of the instructional team.

## 2) Import your data
In the space below, import your data. If your data span multiple files, read them all in. If applicable, merge or append them as needed.

In [4]:
import pandas as pd
import numpy as np

In [11]:
boxing = pd.read_csv('bouts_out_new.csv', encoding='utf-8')
boxing.head()

Unnamed: 0,age_A,age_B,height_A,height_B,reach_A,reach_B,stance_A,stance_B,weight_A,weight_B,...,kos_A,kos_B,result,decision,judge1_A,judge1_B,judge2_A,judge2_B,judge3_A,judge3_B
0,35.0,27.0,179.0,175.0,178.0,179.0,orthodox,orthodox,160.0,160.0,...,33,34.0,draw,SD,110.0,118.0,115.0,113.0,114.0,114.0
1,26.0,31.0,175.0,185.0,179.0,185.0,orthodox,orthodox,164.0,164.0,...,34,32.0,win_A,UD,120.0,108.0,120.0,108.0,120.0,108.0
2,28.0,26.0,176.0,175.0,,179.0,orthodox,orthodox,154.0,154.0,...,13,33.0,win_B,KO,,,,,,
3,25.0,29.0,175.0,174.0,179.0,180.0,orthodox,orthodox,155.0,155.0,...,32,19.0,win_A,KO,47.0,48.0,49.0,46.0,48.0,47.0
4,25.0,35.0,175.0,170.0,179.0,170.0,orthodox,orthodox,155.0,,...,32,33.0,win_A,UD,118.0,110.0,119.0,109.0,117.0,111.0


## 3) Show me the head of your data.

In [12]:
boxing.head(10)

Unnamed: 0,age_A,age_B,height_A,height_B,reach_A,reach_B,stance_A,stance_B,weight_A,weight_B,...,kos_A,kos_B,result,decision,judge1_A,judge1_B,judge2_A,judge2_B,judge3_A,judge3_B
0,35.0,27.0,179.0,175.0,178.0,179.0,orthodox,orthodox,160.0,160.0,...,33,34.0,draw,SD,110.0,118.0,115.0,113.0,114.0,114.0
1,26.0,31.0,175.0,185.0,179.0,185.0,orthodox,orthodox,164.0,164.0,...,34,32.0,win_A,UD,120.0,108.0,120.0,108.0,120.0,108.0
2,28.0,26.0,176.0,175.0,,179.0,orthodox,orthodox,154.0,154.0,...,13,33.0,win_B,KO,,,,,,
3,25.0,29.0,175.0,174.0,179.0,180.0,orthodox,orthodox,155.0,155.0,...,32,19.0,win_A,KO,47.0,48.0,49.0,46.0,48.0,47.0
4,25.0,35.0,175.0,170.0,179.0,170.0,orthodox,orthodox,155.0,,...,32,33.0,win_A,UD,118.0,110.0,119.0,109.0,117.0,111.0
5,24.0,31.0,175.0,175.0,179.0,178.0,orthodox,orthodox,,,...,31,28.0,win_A,KO,,,,,,
6,23.0,31.0,175.0,175.0,179.0,188.0,orthodox,orthodox,155.0,155.0,...,31,12.0,win_A,SD,115.0,113.0,117.0,111.0,113.0,115.0
7,23.0,31.0,175.0,177.0,179.0,175.0,orthodox,orthodox,155.0,,...,30,18.0,win_A,TKO,89.0,82.0,88.0,83.0,89.0,82.0
8,36.0,23.0,173.0,175.0,183.0,179.0,orthodox,orthodox,152.0,,...,26,30.0,win_A,MD,116.0,112.0,114.0,114.0,117.0,111.0
9,27.0,22.0,177.0,175.0,183.0,179.0,southpaw,southpaw,,,...,14,30.0,win_B,UD,112.0,115.0,109.0,118.0,111.0,116.0


## 4) Show me the shape of your data

In [13]:
boxing.info()
print(boxing.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 387427 entries, 0 to 387426
Data columns (total 26 columns):
age_A       352888 non-null float64
age_B       257935 non-null float64
height_A    249246 non-null float64
height_B    134640 non-null float64
reach_A     112342 non-null float64
reach_B     37873 non-null float64
stance_A    231009 non-null object
stance_B    231009 non-null object
weight_A    135573 non-null float64
weight_B    130358 non-null float64
won_A       387427 non-null int64
won_B       387427 non-null int64
lost_A      387427 non-null int64
lost_B      387427 non-null int64
drawn_A     387427 non-null int64
drawn_B     387427 non-null int64
kos_A       387427 non-null int64
kos_B       387348 non-null float64
result      387427 non-null object
decision    387427 non-null object
judge1_A    52248 non-null float64
judge1_B    52060 non-null float64
judge2_A    70371 non-null float64
judge2_B    70032 non-null float64
judge3_A    61231 non-null float64
judge3_B    6

## 5) Give me a problem statement.
Below, write a problem statement. Keep in mind that your task is to tease out relationships in your data and eventually build a predictive model. Your problem statement can be vague, but you should have a goal in mind. Your problem statement should be between one sentence and one paragraph.

### My goal with this dataset is to be able to predict the outcome of a boxing bout. In this data we have metrics about the actual fighter (reach, weight, height) which could potentially be predictive of fight outcome. We also have data about previous fights (wins, losses, ties, KO's). Given some of these variables, the hope is to be able to use a ML algorithm to predict fight outcome (Win, Loss, Draw)