# Red Sox Notebook

This notebook explores the data of Red Sox players from 1908 to 2020, including visualizations and analyses of various player statistics.

## Data Loading

In [None]:
import pandas as pd 

# Load the clean data
red_sox_data = pd.read_csv('../data/cleaned/Boston_Red_Sox_Roster_Data_cleaned.csv')

# Display the first few rows of the dataset
print(red_sox_data.head())

## Data Analysis

### List of Red Sox Hall of Famers

The following code lists all the Red Sox players who have been inducted into the Hall of Fame.

In [None]:
# Filter the DataFrame to get all records where 'HOF' is 'Yes'
hof_players = red_sox_data[red_sox_data['HOF'] == 'Yes']

# Print the distinct names of Hall of Fame players
distinct_hof_players = hof_players['Name'].drop_duplicates().reset_index(drop=True)
print(distinct_hof_players)

### Count and List of Red Sox All Stars

The following code gives thet total and lists all Red Sox players who have appeared in an All Star Game

In [None]:
all_stars = red_sox_data[red_sox_data['All-Star'] == 'Yes']
print("Total All Stars: ", all_stars['Name'].nunique(),'\n')

distinct_all_stars = all_stars['Name'].drop_duplicates().reset_index(drop=True)
print('Red Sox All Stars:\n',distinct_all_stars)

### Count of Red Sox Players by Country

The following code provides all distinct countries where Red Sox players have been born, as well as a count of distinct players per country.

In [None]:
countries = red_sox_data['Born'].unique()
print(countries)

countries_count = red_sox_data['Born'].value_counts()
print(countries_count)

### Average Age of Red Sox Player by Decade

The following code calculates the average age of a Red Sox player and groups by decdade.

In [None]:
import math

decades=(red_sox_data['Season']//10)*10

avg_age_by_decade = red_sox_data.groupby(decades)['Age'].mean()
avg_age_by_decade_rounded_up = avg_age_by_decade.apply(math.ceil)
avg_age_by_decade_rounded_up_sorted = avg_age_by_decade_rounded_up.sort_values(ascending=False)


print(avg_age_by_decade_rounded_up_sorted)

### World Series Player Statistics

The following code calculates and prints the average age of Red Sox players who have won the World Series, rounded up, and grouped by season.

In [None]:
import math 

world_series_winners = red_sox_data[red_sox_data['Season'].isin([1903, 1912, 1915, 1916, 1918, 2004, 2007, 2013, 2018])]

print('World Series Winners:\n', world_series_winners[['Name', 'Season', 'Age', 'Born', 'All-Star', 'HOF']])

print('Number of Red Sox Player that have won the World Series:\n', world_series_winners['Name'].nunique(),'\n')
print('Average Age of Red Sox Players that have won the World Series (rounded up):\n', math.ceil(world_series_winners['Age'].mean()),'\n')

world_series_winners['Age'] = world_series_winners['Age'].apply(math.ceil)
avg_age_by_season = world_series_winners.groupby('Season')['Age'].mean()
print('Average Age of Red Sox Players that have won the World Series by Season (rounded up):\n', avg_age_by_season)

### **Performance Trends**

1. <u>Average WAR by Season</u>

In [None]:
red_sox_data['WAR'] = pd.to_numeric(red_sox_data['WAR'], errors='coerce')

avg_war = red_sox_data.groupby('Season')['WAR'].mean()
print('Average WAR per Season\n',avg_war)

## Visualizations

In [None]:
import os

os.chdir('/Users/joeybiotti/Workspace/red_sox_notebook/notebooks')

print(os.getcwd())

%run ../scripts/visualization.py