# Data Science 5K Capstone Proposal
In order to get your capstone approved, you must complete all of the following steps.

## 1) Get your data
You may use any data set(s) you like, so long as they meet these criteria:

* Your data cannot have _anything_ to do with your work at Booz Allen Hamilton.
* Your data must be publically available for free.
* Your data should be interesting to _you_. You want your capstone to be something you're proud of.
* Your data should be "big enough":
    - It should have at least 1,000 rows.
    - It should have enough of columns to be interesting.
    - If you have questions, contact a member of the instructional team.

## 2) Import your data
In the space below, import your data. If your data span multiple files, read them all in. If applicable, merge or append them as needed.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
data1 = './data_powerlifting/openpowerlifting.csv'
data2 = './data_powerlifting/meets.csv'

powerlift = pd.read_table(data1, sep=',')
meets = pd.read_table(data2, sep=',')

In [3]:
powerlift.columns

Index(['MeetID', 'Name', 'Sex', 'Equipment', 'Age', 'Division', 'BodyweightKg',
       'WeightClassKg', 'Squat4Kg', 'BestSquatKg', 'Bench4Kg', 'BestBenchKg',
       'Deadlift4Kg', 'BestDeadliftKg', 'TotalKg', 'Place', 'Wilks'],
      dtype='object')

In [4]:
meets.columns

Index(['MeetID', 'MeetPath', 'Federation', 'Date', 'MeetCountry', 'MeetState',
       'MeetTown', 'MeetName'],
      dtype='object')

In [5]:
powerlift_meets = pd.merge(powerlift, meets)

## 3) Show me the head of your data.

In [6]:
powerlift_meets.head()

Unnamed: 0,MeetID,Name,Sex,Equipment,Age,Division,BodyweightKg,WeightClassKg,Squat4Kg,BestSquatKg,...,TotalKg,Place,Wilks,MeetPath,Federation,Date,MeetCountry,MeetState,MeetTown,MeetName
0,0,Angie Belk Terry,F,Wraps,47.0,Mst 45-49,59.6,60.0,,47.63,...,138.35,1,155.05,365strong/1601,365Strong,2016-10-29,USA,NC,Charlotte,2016 Junior & Senior National Powerlifting Cha...
1,0,Dawn Bogart,F,Single-ply,42.0,Mst 40-44,58.51,60.0,,142.88,...,401.42,1,456.38,365strong/1601,365Strong,2016-10-29,USA,NC,Charlotte,2016 Junior & Senior National Powerlifting Cha...
2,0,Dawn Bogart,F,Single-ply,42.0,Open Senior,58.51,60.0,,142.88,...,401.42,1,456.38,365strong/1601,365Strong,2016-10-29,USA,NC,Charlotte,2016 Junior & Senior National Powerlifting Cha...
3,0,Dawn Bogart,F,Raw,42.0,Open Senior,58.51,60.0,,,...,95.25,1,108.29,365strong/1601,365Strong,2016-10-29,USA,NC,Charlotte,2016 Junior & Senior National Powerlifting Cha...
4,0,Destiny Dula,F,Raw,18.0,Teen 18-19,63.68,67.5,,,...,122.47,1,130.47,365strong/1601,365Strong,2016-10-29,USA,NC,Charlotte,2016 Junior & Senior National Powerlifting Cha...


## 4) Show me the shape of your data

In [7]:
powerlift_meets.shape

(386414, 24)

## 5) Show me the proportion of missing observations for each column of your data

In [8]:
powerlift_meets.describe(include='all')

Unnamed: 0,MeetID,Name,Sex,Equipment,Age,Division,BodyweightKg,WeightClassKg,Squat4Kg,BestSquatKg,...,TotalKg,Place,Wilks,MeetPath,Federation,Date,MeetCountry,MeetState,MeetTown,MeetName
count,386414.0,386414,386414,386414,147147.0,370571,384012.0,382602.0,1243.0,298071.0,...,363237.0,385322.0,362194.0,386414,386414,386414,386414,314271,292414,386414
unique,,136687,2,5,,4246,,51.0,,,...,,81.0,,8482,60,2652,45,80,1539,5217
top,,Sverre Paulsen,M,Raw,,Open,,90.0,,,...,,1.0,,usapl/NS-2016-04,USAPL,2016-01-23,USA,TX,Las Vegas,Åpent stevne
freq,,183,299045,186317,,68618,,35981.0,,,...,,194693.0,,1197,76773,4156,274855,77990,6979,3844
mean,5143.015804,,,,31.668237,,86.934912,,107.036404,176.569941,...,424.000249,,301.080601,,,,,,,
std,2552.099838,,,,12.900342,,23.140843,,166.97662,69.222785,...,196.355147,,116.360396,,,,,,,
min,0.0,,,,5.0,,15.88,,-440.5,-477.5,...,11.0,,13.73,,,,,,,
25%,2979.0,,,,22.0,,70.3,,87.5,127.5,...,272.16,,237.38,,,,,,,
50%,5960.0,,,,28.0,,83.2,,145.0,174.63,...,424.11,,319.66,,,,,,,
75%,7175.0,,,,39.0,,100.0,,212.5,217.72,...,565.0,,379.29,,,,,,,


## 6) Give me a problem statement.
Below, write a problem statement. Keep in mind that your task is to tease out relationships in your data and eventually build a predictive model. Your problem statement can be vague, but you should have a goal in mind. Your problem statement should be between one sentence and one paragraph.

### My Problem Statement:
There are certain columns I will have to drop but I think overall I will explore relationships between weight classes, different lifts, to determine if success in one lift translates to success in the meet and an athlete's placement.  I will look at this data for Male, Female, and both genders for trends in what the most contributes to success.