# Pitchers Used to Throw More than They Do Today:
### or, a Diamond Dispatch from Captain Obvious

## What We're Looking For
The modern professional baseball player comes from a different planet than did their old-timey, horse-and-buggy forebears. Today's game is faster, higher-scoring, and split-second timed. Most players' skill sets and repertoire are specialized and weaponized so as to befuddle, speed past, and overpower the suckers wearing the other-colored caps.

This is undisputed.

Also undisputed is that the "professional" baseball player of ~150 years ago was not a full-time, this-is-all-I-do professional. For its first few decades, baseball was a seasonal job played before crowds of maybe a few hundred spectators for the biggest games. 

And owing to the vocation's part-time nature, almost nobody crafted their game with a specialized approach — beyond some penchant, preference, or aptitude for one position or another.

Of all the positions on the diamond, the starting pitcher's role has changed the most.

Relief pitchers weren't really a structured part of the game in its fledgling years. The starting pitcher was the pitcher, and the pitcher usually pitched the entire game.

So what changed?

In [None]:
import pybaseball as pb
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport
import math
import statistics as stats
import ipywidgets

pb.cache.enable()

In [None]:
df_pitching = pd.read_csv("../lahman/core/pitching.csv")

In [None]:
df_pitching.columns

In [None]:
df = df_pitching  # group pitcher data into per-year dicts, throw to list
year_groups = df.groupby(['yearID']).apply(
    lambda x: [
        list(x['teamID']), 
        list(x['CG']), 
        list(x['IPouts']), 
        list(x['BFP']), 
        list(x['BAOpp']), 
        list(x['BB']), 
        list(x['SO']), 
        list(x['ERA'])
        ]
    ).apply(pd.Series)
year_groups.columns = ['team_ID', 'CG', 'IP_Outs', 'BFP', 'Opp_BA', 'BB', 'SO', 'ERA']
year_groups = year_groups.reset_index()

In [None]:
year = 1871
season_list = []
counter = 0
for year in range(1871, 2021):
    season = year_groups['yearID']
    season = {}
    season['year'] = year_groups['yearID'][counter]

    season['CG_max'] = max(year_groups['CG'][counter])
    season['CG_min'] = min(year_groups['CG'][counter])
    season['CG_mean'] = stats.mean(year_groups['CG'][counter])
    season['CG_median'] = stats.median(year_groups['CG'][counter])
    season['CG_var'] = stats.variance(year_groups['CG'][counter])
    
    season['IP_outs_max'] = max(year_groups['IP_Outs'][counter])
    season['IP_outs_min'] = min(year_groups['IP_Outs'][counter])
    season['IP_outs_mean'] = stats.mean(year_groups['IP_Outs'][counter])
    season['IP_outs_median'] = stats.median(year_groups['IP_Outs'][counter])
    season['IP_outs_var'] = stats.variance(year_groups['IP_Outs'][counter])

    season['batters_faced_max'] = max(year_groups['BFP'][counter])
    season['batters_faced_min'] = min(year_groups['BFP'][counter])
    season['batters_faced_mean'] = stats.mean(year_groups['BFP'][counter])
    season['batters_faced_median'] = stats.median(year_groups['BFP'][counter])
    # season['batters_faced_var'] = stats.variance(year_groups['BFP'][counter])

    season['Opp_BA_max'] = max(year_groups['Opp_BA'][counter])
    season['Opp_BA_min'] = min(year_groups['Opp_BA'][counter])
    season['Opp_BA_mean'] = stats.mean(year_groups['Opp_BA'][counter])
    season['Opp_BA_median'] = stats.median(year_groups['Opp_BA'][counter])
    # season['Opp_BA_var'] = stats.variance(year_groups['Opp_BA'][counter])

    season['BB_max'] = max(year_groups['BB'][counter])
    season['BB_min'] = min(year_groups['BB'][counter])
    season['BB_mean'] = stats.mean(year_groups['BB'][counter])
    season['BB_median'] = stats.median(year_groups['BB'][counter])
    season['BB_var'] = stats.variance(year_groups['BB'][counter])

    season['SO_max'] = max(year_groups['SO'][counter])
    season['SO_min'] = min(year_groups['SO'][counter])
    season['SO_mean'] = stats.mean(year_groups['SO'][counter])
    season['SO_median'] = stats.median(year_groups['SO'][counter])
    season['SO_var'] = stats.variance(year_groups['SO'][counter])

    season['ERA_max'] = max(year_groups['ERA'][counter])
    season['ERA_min'] = min(year_groups['ERA'][counter])
    season['ERA_mean'] = stats.mean(year_groups['ERA'][counter])
    season['ERA_median'] = stats.median(year_groups['ERA'][counter])
    # season['ERA_var'] = stats.variance(year_groups['ERA'][counter])
    season_list.append(season)
    counter += 1
    

In [None]:
season_list[0]  

In [None]:
df = df_pitching  # group pitchers by team and year to maybe normalize later
team_by_year = df.groupby(['teamID']).apply(
    lambda x: [
        list(x['yearID']), 
        list(x['CG']), 
        list(x['IPouts']), 
        list(x['BFP']), 
        list(x['BAOpp']), 
        list(x['BB']), 
        list(x['SO']), 
        list(x['ERA'])
        ]
    ).apply(pd.Series)
team_by_year.columns = ['yearID', 'CG', 'IP_Outs', 'BFP', 'Opp_BA', 'BB', 'SO', 'ERA']
team_by_year = team_by_year.reset_index()

In [None]:
team_by_year.profile_report() # I'M GOING TO BED EVERYTHING UNTIL HERE IS GREAT 

## Now let's build a structure of each team's info parsed by year

In [None]:
len(team_by_year.loc[0, 'yearID'])

In [None]:
all_teams = []
for team in range(len(team_by_year['teamID'].unique())):
    all_teams.append(team_by_year['teamID'].unique())
all_teams = (team_by_year['teamID'].unique()).tolist()  # init list 'all_teams' comprising all 149 teams.

In [None]:
team_builder_counter = 0
for team in team_by_year:
    # team_name = dict(team_by_year.loc[team_builder_counter, 'teamID'])
    for season in range(len(team_by_year.loc[team_builder_counter, 'yearID'])):
        team_season_counter = 0
        team_season_total = len(team_by_year.loc[team_builder_counter, 'yearID'])
        for club in range(team_season_total):
            all_teams[team_builder_counter]['season'] = dict(str(team_by_year.loc[team_builder_counter, 'yearID'][team_season_counter]))
            # club_season_counter = 0
            # for stat in len(team_name['season']):
            #     team_name['season']['CG'] = sum(team_by_year[team_builder_counter]['CG'])
        team_season_counter += 1
team_builder_counter += 1

In [None]:
all_teams

In [None]:
all_teams_dicts = []
all_teams_counter = 0
for team in all_teams:
    all_teams[all_teams_counter] = {}
    for year in 
    all_teams_dicts.append(team)

In [None]:
year = 1871
all_teams_dicts = []
counter = 0
for team in range(len(all_teams)):
    str(all_teams[counter]) = {}
    counter += 1
    
teams_list

In [None]:
    team['season']
    season['year'] = year_groups['yearID'][counter]

    season['CG_max'] = max(year_groups['CG'][counter])
    season['CG_min'] = min(year_groups['CG'][counter])
    season['CG_mean'] = stats.mean(year_groups['CG'][counter])
    season['CG_median'] = stats.median(year_groups['CG'][counter])
    season['CG_var'] = stats.variance(year_groups['CG'][counter])
    
    season['IP_outs_max'] = max(year_groups['IP_Outs'][counter])
    season['IP_outs_min'] = min(year_groups['IP_Outs'][counter])
    season['IP_outs_mean'] = stats.mean(year_groups['IP_Outs'][counter])
    season['IP_outs_median'] = stats.median(year_groups['IP_Outs'][counter])
    season['IP_outs_var'] = stats.variance(year_groups['IP_Outs'][counter])

    season['batters_faced_max'] = max(year_groups['BFP'][counter])
    season['batters_faced_min'] = min(year_groups['BFP'][counter])
    season['batters_faced_mean'] = stats.mean(year_groups['BFP'][counter])
    season['batters_faced_median'] = stats.median(year_groups['BFP'][counter])
    # season['batters_faced_var'] = stats.variance(year_groups['BFP'][counter])

    season['Opp_BA_max'] = max(year_groups['Opp_BA'][counter])
    season['Opp_BA_min'] = min(year_groups['Opp_BA'][counter])
    season['Opp_BA_mean'] = stats.mean(year_groups['Opp_BA'][counter])
    season['Opp_BA_median'] = stats.median(year_groups['Opp_BA'][counter])
    # season['Opp_BA_var'] = stats.variance(year_groups['Opp_BA'][counter])

    season['BB_max'] = max(year_groups['BB'][counter])
    season['BB_min'] = min(year_groups['BB'][counter])
    season['BB_mean'] = stats.mean(year_groups['BB'][counter])
    season['BB_median'] = stats.median(year_groups['BB'][counter])
    season['BB_var'] = stats.variance(year_groups['BB'][counter])

    season['SO_max'] = max(year_groups['SO'][counter])
    season['SO_min'] = min(year_groups['SO'][counter])
    season['SO_mean'] = stats.mean(year_groups['SO'][counter])
    season['SO_median'] = stats.median(year_groups['SO'][counter])
    season['SO_var'] = stats.variance(year_groups['SO'][counter])

    season['ERA_max'] = max(year_groups['ERA'][counter])
    season['ERA_min'] = min(year_groups['ERA'][counter])
    season['ERA_mean'] = stats.mean(year_groups['ERA'][counter])
    season['ERA_median'] = stats.median(year_groups['ERA'][counter])
    # season['ERA_var'] = stats.variance(year_groups['ERA'][counter])
    season_list.append(season)
    counter += 1