# MLB Analysis

![](https://images-eu.ssl-images-amazon.com/images/I/31G%2B-LmcS7L.png)

**Major League Baseball (MLB) is a professional baseball organization, the oldest of the four major professional sports leagues in the United States and Canada. A total of 30 teams play in the National League (NL) and American League (AL), with 15 teams in each league. The NL and AL were formed as separate legal entities in 1876 and 1901 respectively. After cooperating but remaining legally separate entities beginning in 1903, the leagues merged into a single organization led by the Commissioner of Baseball in 2000. The organization also oversees Minor League Baseball, which comprises 256 teams affiliated with the Major League clubs. With the World Baseball Softball Confederation, MLB manages the international World Baseball Classic tournament.**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
from plotly import tools
init_notebook_mode(connected=True)  
import plotly.figure_factory as ff
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
pitches = pd.read_csv('../input/pitches.csv')
pitches.head()

In [None]:
games = pd.read_csv('../input/games.csv')
games.head()

In [None]:
atbats = pd.read_csv('../input/atbats.csv')
atbats.head()

In [None]:
player_name = pd.read_csv('../input/player_names.csv')
player_name.head()

In [None]:
games.dtypes

In [None]:
atbats.dtypes

In [None]:
pitches.dtypes

In [None]:
pitches['ab_id'] = pitches['ab_id'].astype(int)

In [None]:
player_name.rename(columns={'id': 'batter_id'}, inplace=True)

In [None]:
new_df = pd.merge(pitches, atbats,  how='left', left_on='ab_id', right_on = 'ab_id')
new_df.head()

In [None]:
new_df1 = pd.merge(new_df, games,  how='left', left_on='g_id', right_on = 'g_id')
new_df1.head()

In [None]:
new_df2 = pd.merge(new_df1, player_name,  how='left', left_on='batter_id', right_on = 'batter_id')
new_df2.head()

In [None]:
new_df2['Batters Name'] = new_df2[['first_name', 'last_name']].apply(lambda x: ' '.join(x), axis=1)

In [None]:
new_df2.drop(['first_name', 'last_name'], axis=1, inplace=True)

In [None]:
player_name.rename(columns={'batter_id': 'pitcher_id'}, inplace=True)

In [None]:
final_df = pd.merge(new_df2, player_name,  how='left', left_on='pitcher_id', right_on = 'pitcher_id')
final_df.head()

In [None]:
final_df['Pitchers Name'] = final_df[['first_name', 'last_name']].apply(lambda x: ' '.join(x), axis=1)

In [None]:
final_df.drop(['first_name', 'last_name'], axis=1, inplace=True)

# Top pitcher's count

In [None]:
final_df['Pitchers Name'].value_counts()

In [None]:
final_df['pitch_type'] = final_df['pitch_type'].map({'FF': 'Four Seam Fastball', 'SL': 'Slider', 'FT': 'Two seam fastball', 'CH': 'Changeup', 'SI': 'Sinker', 'CU': 'Curveball', 'FC': 'Cutter', 'KC': 'Knuckle Curve', 'FS': 'Splitter','KN': 'Knuckleball', 'EP': 'Eephus', 'FO': 'Pitch Out', 'PO': 'Pitch Out', 'SC': 'Screwball', 'UN': 'Unidentified', 'FA': 'Fastball', 'IN': 'Intentional Ball'})

In [None]:
final_df['code'] = final_df['code'].map({'B': 'Ball', '*B': 'Ball in dirt', 'S': 'Swinging Strike', 'C': 'Called Strike', 'F': 'Foul', 'T': 'Foul Tip', 'L': 'Foul Bunt', 'I': 'Intentional Ball', 'W': 'Blocked','M': 'Missed Bunt', 'P': 'Pitch Out', 'Q': 'Swinging Pitch Out', 'R': 'Foul Pitch Out', 'X': 'In play out(s)', 'D': 'In play no out', 'E': 'In play runs'})

In [None]:
final_df.head()

In [None]:
grp = final_df.groupby(['Pitchers Name'])[["s_count"]].sum()

In [None]:
grp1 = final_df.groupby(['Batters Name'])[["b_count"]].sum()

In [None]:
grp.head()

In [None]:
ERA = grp.s_count / len (final_df)

In [None]:
BA = grp1.b_count / len (final_df) * 100

# Top performing pitchers

In [None]:
ERA.sort_values(ascending=False)

# Top performing batters

In [None]:
BA.sort_values(ascending=False)

# Max Scherzer

![](https://imagesvc.timeincapp.com/v3/mm/image?url=https%3A%2F%2Fcdn-s3.si.com%2Fstyles%2Fmarquee_large_2x%2Fs3%2Fimages%2Fmax-scherzer-nationals_2.jpg%3Fitok%3Dai-pGumk&w=1000&q=70)

**Maxwell M. Scherzer (born July 27, 1984) is an American professional baseball pitcher for the Washington Nationals of Major League Baseball (MLB). He made his MLB debut as a member of the Arizona Diamondbacks in 2008, and later played for the Detroit Tigers. He has been an important figure in the both the Tigers' and Nationals' playoff presence, including Detroit's four consecutive American League Central titles from 2011−2014 and two of Washington's National League East titles. A power pitcher with a low three-quarter-arm delivery, Scherzer has achieved numerous strikeout records, while becoming the tenth pitcher in history to be awarded at least three Cy Young Awards, the sixth to record two no-hitters in one season, the fifth to produce more than one immaculate inning, and the fourth to strike out at least 200 batters in a season seven years in a row.**

In [None]:
Max_Scherzer = final_df[final_df['Pitchers Name'] == 'Max Scherzer']
Max_Scherzer.head()

# Pitch type of Max Scherzer

In [None]:
Max_Scherzer['pitch_type'].value_counts() / len(Max_Scherzer) * 100

# What happened mostly when max scherzer pitched?

In [None]:
Max_Scherzer['event'].value_counts() / len(Max_Scherzer) * 100

# Relationship to pitches and event occurred

In [None]:
size = [20, 40, 60, 80, 100, 80, 60, 40, 20, 40]
data = [dict(
  type = 'scatter',
  x = Max_Scherzer['event'],
  y = Max_Scherzer['pitch_type'],
  mode='markers',
    marker=dict(
        size=size,
        sizemode='area',
        sizeref=2.*max(size)/(40.**2),
        sizemin=4
    ),
    transforms = [dict(
        type = 'groupby',
        groups = Max_Scherzer['pitch_type'],
   
  )]
)]

py.iplot({'data': data}, validate=False)

In [None]:
import random
def random_colors(number_of_colors):
    color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
                 for i in range(number_of_colors)]
    return color

# Where max scherzer's ball pitches on the plate?

In [None]:
trace0 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Four Seam Fastball'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Four Seam Fastball'],
    name = 'Four Seam FastBall',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(152, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(152, 0, 0, .8)'
        )
    )
)

trace1 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Slider'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Slider'],
    name = 'Slider',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(22, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(22, 0, 0, .8)'
        )
    )
)

trace2 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Changeup'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Changeup'],
    name = 'Changeup',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(224, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(224, 0, 0, .8)'
        )
    )
)

trace3 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Curveball'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Curveball'],
    name = 'Curveball',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(22, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(22, 1, 0, .8)'
        )
    )
)

trace4 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Cutter'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Cutter'],
    name = 'Cutter',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(2, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(2, 1, 0, .8)'
        )
    )
)

trace5 = go.Scatter(
    x = Max_Scherzer.px[Max_Scherzer.pitch_type == 'Two Seam Fastball'],
    y = Max_Scherzer.pz[Max_Scherzer.pitch_type == 'Two Seam Fastball'],
    name = 'Two Seam Fastball',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(222, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(222, 1, 0, .8)'
        )
    )
)



data = [trace0, trace1,trace2, trace3, trace4, trace5]

layout = dict(title = 'Pitch types of Max Scherzer ',
              plot_bgcolor='rgb(50,205,50)',
              yaxis = dict(zeroline = False),
              xaxis = dict(zeroline = False)
             )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-scatter')

# Does max scherzer's speed of pitching goes down as inning proceed?

In [None]:
ax = sns.lineplot(x="inning", y="start_speed", hue="pitch_type", data=Max_Scherzer)

# Justin Verlander

![](https://athletetypes.com/wp-content/uploads/sites/24/2018/04/Justin-Verlander-Astros-040918-1200x886.jpg)

**Justin Brooks Verlander (born February 20, 1983) is an American professional baseball pitcher for the Houston Astros of Major League Baseball (MLB). He previously played for the Detroit Tigers for 13 seasons, with whom he made his major league debut on July 4, 2005. A right-handed batter and thrower, Verlander stands 6 feet 5 inches (1.96 m) tall and weighs 225 pounds (102 kg).**

**From Manakin-Sabot, Virginia, Verlander attended Old Dominion University (ODU) and played college baseball for the Monarchs. He broke the Monarchs' and Colonial Athletic Association's career records for strikeouts. At the 2003 Pan American Games, Verlander helped lead the United States national team to a silver medal. The Tigers selected him in the first round and as the second overall pick of the 2004 first-year player draft. As a former ace in the Tigers' starting rotation, he was a key figure in four consecutive American League (AL) Central division championships from 2011−2014, and in the Astros' first World Series championship in 2017. He is among the career pitching leaders for the Tigers, including ranking second in strikeouts (2,373), seventh in wins (183), and eighth in innings pitched (2511).**

In [None]:
Justin_Verlander = final_df[final_df['Pitchers Name'] == 'Justin Verlander']
Justin_Verlander.head()

# Justin Verlander's pitching

In [None]:
Justin_Verlander['pitch_type'].value_counts() / len(Justin_Verlander) * 100

# What happens mostly when justin verlander pitches?

In [None]:
Justin_Verlander['event'].value_counts() / len(Justin_Verlander) * 100

# Relationship between pitching and event occurred

In [None]:
size = [20, 40, 60, 80, 100, 80, 60, 40, 20, 40]
data = [dict(
  type = 'scatter',
  x = Justin_Verlander['event'],
  y = Justin_Verlander['pitch_type'],
  mode='markers',
    marker=dict(
        size=size,
        sizemode='area',
        sizeref=2.*max(size)/(40.**2),
        sizemin=4
    ),
    transforms = [dict(
        type = 'groupby',
        groups = Justin_Verlander['pitch_type'],
   
  )]
)]

py.iplot({'data': data}, validate=False)

# Where does Justin verlander mostly piches on the plate?

In [None]:
trace0 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Four Seam Fastball'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Four Seam Fastball'],
    name = 'Four Seam FastBall',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(152, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(152, 0, 0, .8)'
        )
    )
)

trace1 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Slider'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Slider'],
    name = 'Slider',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(22, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(22, 0, 0, .8)'
        )
    )
)

trace2 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Changeup'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Changeup'],
    name = 'Changeup',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(224, 0, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(224, 0, 0, .8)'
        )
    )
)

trace3 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Curveball'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Curveball'],
    name = 'Curveball',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(22, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(22, 1, 0, .8)'
        )
    )
)

trace4 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Cutter'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Cutter'],
    name = 'Cutter',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(2, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(2, 1, 0, .8)'
        )
    )
)

trace5 = go.Scatter(
    x = Justin_Verlander.px[Justin_Verlander.pitch_type == 'Two Seam Fastball'],
    y = Justin_Verlander.pz[Justin_Verlander.pitch_type == 'Two Seam Fastball'],
    name = 'Two Seam Fastball',
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgba(222, 1, 0, .8)',
        line = dict(
            width = 2,
            color = 'rgba(222, 1, 0, .8)'
        )
    )
)



data = [trace0, trace1,trace2, trace3, trace4, trace5]

layout = dict(title = 'Pitch types of Justin Verlander ',
              plot_bgcolor='rgb(50,205,50)',
              yaxis = dict(zeroline = False),
              xaxis = dict(zeroline = False)
             )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-scatter')

# Pitching speed as inning proceed

In [None]:
ax = sns.lineplot(x="inning", y="start_speed", hue="pitch_type", data=Justin_Verlander)

# Max vs Justin spin comparision

In [None]:
data = [
    go.Scatterpolargl(
      r = Justin_Verlander.pz,
      theta = Justin_Verlander.spin_dir,
      mode = "markers",
      name = "Justin Verlander",
      marker = dict(
        color = "rgb(27,158,119)",
        size = 15,
        line = dict(
          color = "white"
        ),
        opacity = 0.7
      )
    ),
    go.Scatterpolargl(
      r = Max_Scherzer.pz,
      theta = Max_Scherzer.spin_dir,
      mode = "markers",
      name = "Max Schrezer",
      marker = dict(
        color = "rgb(217,95,2)",
        size = 20,
        line = dict(
          color = "white"
        ),
        opacity = 0.7
      )
    ),
]

layout = go.Layout(
    title = "Justin Verlander vs Max Scherzer pitch spin",
    font = dict(
      size = 15
    ),
    showlegend = False,
    polar = dict(
      bgcolor = "rgb(223, 223, 223)",
      angularaxis = dict(
        tickwidth = 2,
        linewidth = 3,
        layer = "below traces"
      ),
      radialaxis = dict(
        side = "counterclockwise",
        showline = True,
        linewidth = 2,
        tickwidth = 2,
        gridcolor = "white",
        gridwidth = 2
      )
    ),
    paper_bgcolor = "rgb(223, 223, 223)"
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='polar-webgl')

# Top batters

In [None]:
Joey_Votto = final_df[final_df['Batters Name'] == 'Joey Votto']
Joey_Votto.head()

# Joey Votto

![](https://cdn-images-1.medium.com/max/1600/1*fqPCiV7YKyXORMHz172Fig@2x.jpeg)

**Joseph Daniel Votto (born September 10, 1983) is a Canadian professional baseball first baseman for the Cincinnati Reds of Major League Baseball (MLB). He made his MLB debut with the Reds in 2007.**

**Votto is a six-time MLB All-Star, a seven-time Tip O'Neill Award winner, and two-time Lou Marsh Trophy winner as Canada's athlete of the year. In 2010, he won the National League (NL) MVP Award and the NL Hank Aaron Award. At the end of the 2018 season, among all active players he was first in career on-base percentage (.427), second in OPS (.957) and walks (1,104), and fourth in batting average (.311).**

# Joey Votto striking efficiency

In [None]:
Joey_Votto['code'].value_counts() / len(Joey_Votto) * 100

# Joey Votto batting events 

In [None]:
Joey_Votto['event'].value_counts() / len(Joey_Votto) * 100

# Relationship between striking event and game event

In [None]:
size = [20, 40, 60, 80, 100, 80, 60, 40, 20, 40]
data = [dict(
  type = 'scatter',
  x = Joey_Votto['event'],
  y = Joey_Votto['code'],
  mode='markers',
    marker=dict(
        size=size,
        sizemode='area',
        sizeref=2.*max(size)/(40.**2),
        sizemin=4
    ),
    transforms = [dict(
        type = 'groupby',
        groups = Joey_Votto['code'],
   
  )]
)]

py.iplot({'data': data}, validate=False)

# How good joey performes as inning proceed?

In [None]:
ax = sns.lineplot(x="inning", y="p_score", hue="code", data=Joey_Votto)

# Joey votto's home run zone 

In [None]:
trace1 = go.Scatter3d(
    x = Joey_Votto.x[Joey_Votto['event'] == 'Home Run'],
    y = Joey_Votto.y[Joey_Votto['event'] == 'Home Run'],
    z = Joey_Votto.z0[Joey_Votto['event'] == 'Home Run'],
    text = 'Home Run',
    mode = 'markers',
    marker = dict(
        sizemode = 'diameter',
        sizeref = 750, # info on sizeref: https://plot.ly/python/reference/#scatter-marker-sizeref
        color = random_colors(1000),
        )
)
data=[trace1]

layout=go.Layout(width=800, height=800, title = 'Joey Votto Home Run Zone',
              scene = dict(xaxis=dict(title='X axis',
                                      titlefont=dict(color='Orange')),
                            yaxis=dict(title='Y axis',
                                       titlefont=dict(color='rgb(220, 220, 220)')),
                            zaxis=dict(title='Z axis',
                                       titlefont=dict(color='rgb(220, 220, 220)')),
                            bgcolor = 'rgb(50,205,50)'
                           )
             )

fig=go.Figure(data=data, layout=layout)
py.iplot(fig, filename='solar_system_planet_size')

# Paul Goldschmidt

![](https://www.gannett-cdn.com/-mm-/dcae4d8e5b18ea312916e61841962128789aaec8/c=37-0-2962-3900/local/-/media/2017/09/29/Phoenix/Phoenix/636423165083338291-USATSI-10315181.jpg?width=534&height=712&fit=crop)

**Paul Edward Goldschmidt (born September 10, 1987), nicknamed "Goldy", is an American professional baseball first baseman for the St. Louis Cardinals of Major League Baseball (MLB). He made his MLB debut with the Arizona Diamondbacks in 2011. Prior to playing professionally, Goldschmidt played baseball for The Woodlands High School and Texas State Bobcats.**

**Goldschmidt was lightly recruited out of The Woodlands. After playing at Texas State, the Diamondbacks selected him in the eighth round of the 2009 MLB draft. He rose through the minor leagues, reaching the major leagues on August 1, 2011. The Diamondbacks traded him to the Cardinals during the 2018–19 offseason.**

**Goldschmidt is a six-time MLB All-Star. He led the National League in home runs and runs batted in during the 2013 season. He has won the National League (NL) Hank Aaron Award, Gold Glove Award, and Silver Slugger Award. Goldschmidt has also twice finished runner-up for the NL Major League Baseball Most Valuable Player Award, in 2013 and 2015.**

In [None]:
PaulGoldschmidt = final_df[final_df['Batters Name'] == 'Paul Goldschmidt']
PaulGoldschmidt.head()

# Paul Goldschmidt's striking efficiency

In [None]:
PaulGoldschmidt['code'].value_counts() / len(PaulGoldschmidt) * 100

# Paul Goldschmidt's batting events

In [None]:
PaulGoldschmidt['event'].value_counts() / len(PaulGoldschmidt) * 100

# Relationship between striking event and game event of Paul

In [None]:
size = [20, 40, 60, 80, 100, 80, 60, 40, 20, 40]
data = [dict(
  type = 'scatter',
  x = PaulGoldschmidt['event'],
  y = PaulGoldschmidt['code'],
  mode='markers',
    marker=dict(
        size=size,
        sizemode='area',
        sizeref=2.*max(size)/(40.**2),
        sizemin=4
    ),
    transforms = [dict(
        type = 'groupby',
        groups = PaulGoldschmidt['code'],
   
  )]
)]

py.iplot({'data': data}, validate=False)

# How good paul performes as inning proceed?

In [None]:
ax = sns.lineplot(x="inning", y="p_score", hue="code", data=PaulGoldschmidt)

# Paul goldschmidt's home run zone 

In [None]:
trace1 = go.Scatter3d(
    x = PaulGoldschmidt.x[PaulGoldschmidt['event'] == 'Home Run'],
    y = PaulGoldschmidt.y[PaulGoldschmidt['event'] == 'Home Run'],
    z = PaulGoldschmidt.z0[PaulGoldschmidt['event'] == 'Home Run'],
    text = 'Home Run',
    mode = 'markers',
    marker = dict(
        sizemode = 'diameter',
        sizeref = 750, # info on sizeref: https://plot.ly/python/reference/#scatter-marker-sizeref
        color = random_colors(1000),
        )
)
data=[trace1]

layout=go.Layout(width=800, height=800, title = 'Paul Goldschmidt Home Run Zone',
              scene = dict(xaxis=dict(title='X axis',
                                      titlefont=dict(color='Orange')),
                            yaxis=dict(title='Y axis',
                                       titlefont=dict(color='rgb(220, 220, 220)')),
                            zaxis=dict(title='Z axis',
                                       titlefont=dict(color='rgb(220, 220, 220)')),
                            bgcolor = 'rgb(50,205,50)'
                           )
             )

fig=go.Figure(data=data, layout=layout)
py.iplot(fig, filename='solar_system_planet_size')

# More updates soon