# Baseball Stats

In this notebook, I'll be loading in data from a dataset of Major League Baseball stats found on [Kaggle](https://www.kaggle.com/) and using the pandas library to hopefully find interesting information. The dataset is located [here](https://www.kaggle.com/kaggle/the-history-of-baseball).

## Loading libraries and data

In [39]:
import pandas as pd
import numpy as np

data = pd.read_csv('baseball/batting.csv').set_index('player_id')

I'll start with something simple, I'll use pandas to create a pivot table to aggregate each player's seasons and total up the homeruns.

In [89]:
hrs = data.pivot_table(index=data.index, values='hr', aggfunc=sum).sort_values(ascending=False).dropna()
print hrs.head()

player_id
bondsba01    762
aaronha01    755
ruthba01     714
rodrial01    687
mayswi01     660
Name: hr, dtype: float64


Now let's do something a bit more complicated, let's try to find the players with the highest and lowest homerun to stolen base ratio. I'll only use players with >= 300 homeruns.

In [100]:
sbs = data.pivot_table(index=data.index, values='sb', aggfunc=sum).sort_values(ascending=False).dropna()
hrs_sbs = pd.concat((hrs, sbs), axis=1).sort_values('hr', ascending=False)
hrs_sbs['hr_to_sb_ratio'] = hrs_sbs['hr'] / hrs_sbs['sb']
print hrs_sbs[hrs_sbs['hr'] > 300].sort_values('hr_to_sb_ratio', ascending=False)

            hr   sb  hr_to_sb_ratio
fieldce01  319    2      159.500000
buhneja01  310    6       51.666667
konerpa01  439    9       48.777778
mcgwima01  583   12       48.583333
howarfr01  382    8       47.750000
delgaca01  473   14       33.785714
ortizda01  503   15       33.533333
thomeji01  612   19       32.210526
killeha01  573   19       30.157895
howarry01  357   12       29.750000
stargwi01  475   17       27.941176
piazzmi01  427   17       25.117647
sievero01  318   14       22.714286
giambja01  440   20       22.000000
sexsori01  306   14       21.857143
willite01  521   24       21.708333
mccovwi01  521   26       20.038462
colavro01  374   19       19.684211
fieldpr01  311   18       17.277778
powelbo01  339   20       16.950000
adcocjo01  336   20       16.800000
kinerra01  369   22       16.772727
gonzaju03  434   26       16.692308
teixema01  394   24       16.416667
thomafr04  521   32       16.281250
hortowi01  325   20       16.250000
ramirma02  555   38       14

And our winner with a whopping 159.5:1 ratio of homeruns to stolen bases is Cecil Fielder. On the opposite end, we have Bobby Bonds with a .72:1 homeruns to stolen base ratio.

More to come...