# What's the Opposite of a Maddux?

## Data

Stats LLC started tracking pitch counts in 1988, therefore generations of pitchers could not be used for the following analysis. I've scraped all the game logs for the top 50 pitchers with the most complete games.

In [11]:
import pandas as pd

data = pd.read_csv('csvs/game_logs.csv')
players = pd.read_csv('csvs/player_names.csv', index_col='player_id')

## Check Data

Check to ensure data looks right and contains the correct number of players

In [12]:
print data.loc[0]
print len(data.pivot_table(index='player_id', values=['runs'], aggfunc='sum'))

player_id        abbotji01
team                   CAL
opp                    SEA
inngs                 GS-5
result               L,0-7
pitches                 83
ip                     4.2
runs                     6
year                  1989
month                    4
day                      8
entered       1t start tie
Name: 0, dtype: object
50


## Analysis

### Total number of complete games

In [13]:
complete_games = data[data['inngs'].isin(['SHO', 'CG'])]
shutouts = data[data['inngs'] == 'SHO']

### Average Pitches per Complete Game

In [14]:
complete_games.groupby('player_id').aggregate({'pitches': 'mean', 'inngs': 'count'})\
    .sort_values('pitches', ascending=True).rename(columns={'inngs':'Complete Games'}).join(players)

Unnamed: 0_level_0,Complete Games,pitches,player_name
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
tewksbo01,29,99.857143,Bob Tewksbury
buehrma01,33,106.606061,Mark Buehrle
maddugr01,107,106.924528,Greg Maddux
radkebr01,37,107.916667,Brad Radke
carpech01,33,108.454545,Chris Carpenter
hallaro01,67,108.58209,Roy Halladay
mulhote01,46,109.395349,Terry Mulholland
leecl02,29,109.551724,Cliff Lee
sabatc.01,38,109.736842,C.C. Sabathia
wellsda01,54,109.981481,David Wells


### Average Pitches per Shutout

In [15]:
avg_p_per_sho = shutouts.groupby('player_id').aggregate({'pitches': 'mean', 'inngs': 'count'})\
    .sort_values('pitches', ascending=True).rename(columns={'inngs':'sho', 'pitches': 'p'}).join(players)

print avg_p_per_sho

           sho           p       player_name
player_id                                   
tewksbo01    7   95.000000     Bob Tewksbury
bosioch01    8  100.500000       Chris Bosio
maddugr01   34  102.205882       Greg Maddux
radkebr01   10  105.800000        Brad Radke
moyerja01    9  106.222222       Jamie Moyer
carpech01   15  106.466667   Chris Carpenter
buehrma01   10  106.500000      Mark Buehrle
mulhote01   10  107.000000  Terry Mulholland
hallaro01   20  107.000000      Roy Halladay
colonba01   13  107.230769     Bartolo Colon
nagych01     6  108.500000      Charles Nagy
glavito02   25  108.708333       Tom Glavine
abbotji01    6  109.000000        Jim Abbott
drabedo01   20  109.222222       Doug Drabek
wellsda01   12  109.250000       David Wells
hershor01   14  110.142857    Orel Hershiser
leecl02     12  110.916667         Cliff Lee
morrija02    9  111.222222       Jack Morris
martide01   18  111.388889   Dennis Martinez
sabatc.01   12  111.583333     C.C. Sabathia
candito01 

Bob Tewksbury was the most efficient pitcher during his complete games and shutouts, averaging right around 100 pitches during his complete games and 95 during his shutouts. However, from 1988 to 1998, he threw only 29 complete games. Greg Maddux played for an additional ten years and threw 107 complete games from 1988 to 2008. He averaged right around 107 pitches per complete game.

Randy Johnson, having thrown the second most complete games (100) averaged 126 pitches per game. That's 20 more pitches per game!

### Who has thrown the most Madduxes??

In [16]:
under_100 = shutouts[(shutouts['pitches'] < 100) & (shutouts['ip'] >= 9)]

under_100 = under_100.groupby('player_id').aggregate({'pitches': 'mean', 'inngs': 'count'})\
    .sort_values('inngs', ascending=False).rename(columns={'inngs':'sho', 'pitches': 'p'}).join(players)
print under_100

           sho          p       player_name
player_id                                  
maddugr01   13  91.692308       Greg Maddux
tewksbo01    6  91.500000     Bob Tewksbury
hallaro01    5  95.000000      Roy Halladay
glavito02    5  92.600000       Tom Glavine
moyerja01    4  93.750000       Jamie Moyer
bosioch01    4  88.000000       Chris Bosio
colonba01    4  95.000000     Bartolo Colon
wellsda01    3  94.333333       David Wells
navarja01    3  93.333333     Jaime Navarro
radkebr01    3  95.000000        Brad Radke
schilcu01    3  95.333333    Curt Schilling
mulhote01    3  97.000000  Terry Mulholland
carpech01    3  97.000000   Chris Carpenter
mussimi01    2  96.000000      Mike Mussina
drabedo01    2  86.500000       Doug Drabek
clemero02    2  90.000000     Roger Clemens
hershor01    2  96.500000    Orel Hershiser
rogerke01    2  95.500000      Kenny Rogers
leecl02      2  96.000000         Cliff Lee
buehrma01    2  91.500000      Mark Buehrle
brownke01    2  88.500000       

### Who has thrown the most Johnsons??

In [17]:
over_125 = shutouts[(shutouts['pitches'] > 125) & (shutouts['ip'] >= 9)]
over_125 = over_125.groupby('player_id').aggregate({'pitches': 'mean', 'inngs': 'count'})\
    .sort_values('inngs', ascending=False).rename(columns={'inngs':'sho', 'pitches': 'p'}).join(players)

print over_125

           sho           p      player_name
player_id                                  
clemero02   20  137.400000    Roger Clemens
johnsra05   20  135.600000    Randy Johnson
coneda01    10  135.700000       David Cone
schilcu01    8  131.875000   Curt Schilling
wittbo01     7  132.285714       Bobby Witt
martira02    7  132.000000   Ramon Martinez
langsma01    6  133.666667    Mark Langston
martipe02    5  131.000000   Pedro Martinez
mussimi01    5  132.000000     Mike Mussina
hurstbr01    5  131.800000      Bruce Hurst
brownke01    4  127.750000      Kevin Brown
mcdowja01    4  133.750000    Jack McDowell
finlech01    4  137.250000     Chuck Finley
belchti01    3  131.666667      Tim Belcher
smoltjo01    3  132.666667      John Smoltz
drabedo01    3  132.666667      Doug Drabek
stewada01    3  130.333333     Dave Stewart
glavito02    3  127.666667      Tom Glavine
hernali01    3  133.333333  Livan Hernandez
swindgr01    3  133.666667    Greg Swindell
martide01    2  133.500000  Denn

## Conclusion

Given the available data, a Maddux is the best term for a complete game shutout under 100 pitches. If we were to name a highly inefficient complete game shutout, it'd be called a Johnson, named after Randy Johnson who averaged 126 pitches during his shutouts and never threw a Maddux in his entire career!

## Create json

In [24]:
# avg_p_per_sho
# under_100
# over_125
with open('maddux.json', 'wa') as f:
    f.write('{ "avg_p_per_sho":\n')
    f.write(avg_p_per_sho.to_json(orient='records'))
    f.write(",\n")

    f.write('"over_125":\n')
    f.write(over_125[:10].to_json(orient='records'))
    f.write(",\n")

    f.write('"under_100":\n')
    f.write(under_100[:10].to_json(orient='records'))
    f.write("}")

## Resources

* [https://en.wikipedia.org/wiki/Pitch_count](https://en.wikipedia.org/wiki/Pitch_count)
* [http://baseball-reference.com/](http://baseball-reference.com/)
* [http://www.fangraphs.com/](http://www.fangraphs.com/)