# Import All Necessary Modules And Setup Project

If you get any errors when importing these, ensure you run the command:
```bash
$ python -m pip install -r requirements.txt
```
to install all necessary modules for this project. This command must be run from inside of this project directory.

It is recommended to use virtual environments for this project to ensure there is no conflicting package versions on your system.

Activate the virtual environment (if needed), run the pip install command, and then launch Jupyter Lab inside this project to get this project running.

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Load Raw Data Files With Pandas

If any of these fail, visit the data directory and ensure that you've got the data sources downloaded and unzipped using the correct conventions. See [data/README.md](data/README.md) for guidance on how to setup the `data` directory properly.

If it still fails, ensure that you launched Jupyter Lab/Notebook from the root of this project.

In [None]:
allBatting = pd.read_csv("data/baseballdatabank-2022.2/core/Batting.csv")
allPlayers = pd.read_csv("data/baseballdatabank-2022.2/core/People.csv")
allTeams = pd.read_csv('data/baseballdatabank-2022.2/core/Teams.csv')
allPlayerBattingData=pd.merge(allBatting,allPlayers, on='playerID')

# RBI compared to AB

Find the players who bring in the most runs, compared to their at bat chances.

Minimum at bat threshold of 100 at bats to account for players that didn't bat much.

Plot the RBI ratio in a histogram to see how often most players get an RBI from an at bat.

In [None]:
playerRBIToAtBats = allPlayerBattingData.groupby(['playerID']).agg(
    {
        'AB':'sum',
        'RBI':'sum',
        'R':'sum',
        'H':'sum',
        'nameFirst': 'last',
        'nameLast': 'last'
    }
    )
playerRBIToAtBats = playerRBIToAtBats[ playerRBIToAtBats.AB >= 73]
playerRBIToAtBats['RBIAtBatsRatio'] = playerRBIToAtBats.RBI / playerRBIToAtBats.AB
playerRBIToAtBats.sort_values('RBIAtBatsRatio', inplace=True, ascending=False)
playerRBIToAtBats['playerName'] = playerRBIToAtBats.nameFirst + ' ' + playerRBIToAtBats.nameLast
playerRBIToAtBats = playerRBIToAtBats.reset_index()
display(playerRBIToAtBats.head(10))
playerRBIToAtBats['RBIAtBatsRatio'].mean(), playerRBIToAtBats['RBIAtBatsRatio'].median()

### Create The Histogram Of Player RBI to At Bats Ratio

In [None]:
player_rbi_ab_ratio = sns.histplot(
    data=playerRBIToAtBats, 
    x='RBIAtBatsRatio', binwidth=.005
    )

player_rbi_ab_ratio.set(xlabel='Ratio of RBI to At Bats')

player_rbi_ab_ratio.set_title("Player Distribution of RBI/At Bats Ratio", fontsize=20, pad=20)

player_rbi_ab_ratio.figure.savefig(
    'Player_RBI_To_At_Bats_Ratio.pdf',
    bbox_inches='tight'
    )

In [None]:
# The following .sort_values technically isn't necessary, but remains here in the event someone changes the sorting of the dataframe above
top_player_rbi_ab_ratio = sns.barplot(
    data=playerRBIToAtBats.sort_values('RBIAtBatsRatio', ascending=False).head(10), 
    x='RBIAtBatsRatio', y='playerName'
    )
top_player_rbi_ab_ratio.set(
    ylabel='Player Name', 
    xlabel='RBI To At Bats Ratio'
    )
top_player_rbi_ab_ratio.set_title(
    "Top 10 Players by RBI/AB Ratio", 
    fontsize=24, pad=20, loc='center',
    x=.4
    )

top_player_rbi_ab_ratio.figure.savefig('Top_Player_RBI_To_At_Bats_Ratio.pdf', bbox_inches='tight')

# World Series Results

Show the world series wins and losses for each team that has made it to the world series.

World series started in 1903, ignore all data before 1903. Additionally, it wasn't played in 1904 (Giant's refused to play against the American League) and 1994 (player's strike), so remove those years.

In [None]:
wsInfo = allTeams[
    ['yearID', 'LgWin', 'WSWin', 'franchID', 'name']
    ]

wsInfo = wsInfo[ 
    (wsInfo.yearID >= 1903) & 
    (wsInfo.yearID != 1904) & 
    (wsInfo.yearID != 1994) 
    ]

wsLosses = wsInfo[ 
    (wsInfo.LgWin == 'Y') & 
    (wsInfo.WSWin == 'N')
    ].copy()

wsWins = wsInfo[ 
    (wsInfo.LgWin == 'Y') & 
    (wsInfo.WSWin == 'Y')
    ].copy()

wsLosses['losses'] = 0
wsWins['wins'] = 0


wsLosses = wsLosses.groupby('franchID').agg(
        {'losses': 'count'}
    )

wsWins = wsWins.groupby('franchID').agg(
        {'wins': 'count'}
    )
teamNames = wsInfo.groupby('franchID').agg(
        {'name':'last'}
    )

wsResults = pd.merge(
    wsWins, wsLosses, on='franchID', how='outer'
    ).merge(teamNames, on='franchID')


# Because we did an outer join to merge the wins/losses tables, some NaN's appear, which can safely be replaced with 0's
wsResults['wins'] = wsResults['wins'].fillna(0)
wsResults['losses'] = wsResults['losses'].fillna(0)

wsResults['appearances'] = wsResults.wins + wsResults.losses
wsResults['win_ratio'] = wsResults.wins / wsResults.appearances
wsResults['loss_ratio'] = wsResults.losses / wsResults.appearances

wsResults = wsResults.reindex(columns=['name', 'appearances', 'wins', 'losses', 'win_ratio', 'loss_ratio'])

wsResults.sort_values('appearances', inplace=True, ascending=False)

wsResults

## Make Pie-plot of World Series Results

Show the wins and losses of each team in a pie-plot.

In [None]:
wsResults_figs, wsResults_axs = plt.subplots(nrows=2, figsize=(10, 10))

wsResults_axs[0].pie(data=wsResults, x='wins', shadow=True)
wsResults_axs[0].set_title('Wins')

wsResults_axs[1].pie(data=wsResults, x='losses', shadow=True)
wsResults_axs[1].set_title('Losses')

wsResults_figs.suptitle('World Series Results', size=24, x=.68)

wsResults_figs.legend(labels=wsResults['name'], bbox_to_anchor=(1.0, .82))

wsResults_figs.savefig('World_Series_Results.pdf', bbox_inches='tight')

## Rank the rate of wins/appearances

In [None]:
win_ratio_plt = sns.barplot(data=wsResults.sort_values('win_ratio', ascending=False), x='win_ratio', y='name')
win_ratio_plt.set(ylabel='Team Name', xlabel='World Series Win Ratio')
win_ratio_plt.set_title('Ratio Of Wins At World Series', fontsize=24, pad=20, x=.35)

win_ratio_plt.figure.savefig('World_Series_Win_Ratio.pdf', bbox_inches='tight')

In [None]:
loss_ratio_plt = sns.barplot(data=wsResults.sort_values('loss_ratio', ascending=False), x='loss_ratio', y='name')
loss_ratio_plt.set(ylabel='Team Name', xlabel='World Series Loss Ratio')
loss_ratio_plt.set_title('Ratio Of Losses At World Series', fontsize=24, pad=20, x=.35)

loss_ratio_plt.figure.savefig('World_Series_Loss_Ratio.pdf', bbox_inches='tight')