In [1]:
# Imports & Settings
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
from prophet import Prophet

from helpers.analysis_inputs import weather_analysis_input

pd.set_option("display.max_columns", None)

%load_ext autoreload
%autoreload 2

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
wai = weather_analysis_input()

  games_2010_2016 = pd.read_csv("/Users/jakemfeldman/sports_analytics/neo/helpers/games/mlb-game-logs-2010-2016.csv")


## Do Hotter Outside Temperatures Lead to Greater Offensive Productivity in Major League Baseball?

In order to answer the above, we will take a look at the 4 main offensive productivity metrics: `Runs Scored`, `Batting Average`, `On Base Percentage`, & `Slugging Percentage`. However these 4 statistics may be skewed by the ballpark in which the game is played in (i.e. the confines of The Great American Ballpark are sure to allow greater offensive productivity than those of Oracle Park).

![alt-text-1](https://ballparkpal.com/Images/cin_diagramL-01.png "Great American Ballpark")![alt-text-2](https://ballparkpal.com/Images/sf_diagramL-01.png "Oracle Park")

To counteract this, we will be measuring these 4 metrics in differential against the ballpark norm. For example, if the average batting average at Citi Field is .240 and the total batting average in a single game at Citi Field between the 2 teams is .265, the batting average differential for that game will be +.025. We will then be plotting these 4 differentials against the max temperature at that day and location. Now, let's take a look and see if hotter tempersatures lead to more offensive efficiency in the last 3 years (2020 - 2022).

### Assumptions:
- Day games take place during that day's max temperature
- Weather only affects offensive production in outdoor environments (even retractable domes are removed due to unclear data on if dome was open or not)

In [3]:
fig = px.scatter(wai, 
                 x='max_temp_f', 
                 y='diff_runs', 
                 color='month', 
                 labels={"max_temp_f": "Game Temperature (F)", 
                         "diff_runs": "Delta in Runs Scored VS. Ballpark Average"},
                 hover_data=['v_full_name', 'h_full_name', 'date', 't_score', 'bl_runs'], 
                # marginal_y='histogram', marginal_x='histogram',
                 trendline="ols", 
                 title="Ballpark-Specific Run Differential VS. Game Temp. (F)"
                 )
fig.show()

In [4]:
fig = px.scatter(wai, 
                 x='max_temp_f', 
                 y='diff_avg', 
                 color='month', 
                 labels={"max_temp_f": "Game Temperature (F)", 
                         "diff_avg": "Delta in Batting Avg VS. Ballpark Average"},
                 hover_data=['v_full_name', 'h_full_name', 'date', 'gs_avg', 'bl_avg'], 
                # marginal_y='histogram', marginal_x='histogram',
                 trendline="ols", 
                 title="Ballpark-Specific Batting Avg. Differential VS. Game Temp. (F)"
                 )
fig.show()

In [5]:
fig = px.scatter(wai, 
                 x='max_temp_f', 
                 y='diff_obp', 
                 color='month', 
                 labels={"max_temp_f": "Game Temperature (F)", 
                         "diff_obp": "Delta in OBP VS. Ballpark Average"},
                 hover_data=['v_full_name', 'h_full_name', 'date', 'gs_obp', 'bl_obp'], 
                # marginal_y='histogram', marginal_x='histogram',
                 trendline="ols", 
                 title="Ballpark-Specific OBP Differential VS. Game Temp. (F)"
                 )
fig.show()

In [6]:
fig = px.scatter(wai, 
                 x='max_temp_f', 
                 y='diff_slg', 
                 color='month', 
                 labels={"max_temp_f": "Game Temperature (F)", 
                         "diff_slg": "Delta in Slugging VS. Ballpark Average"},
                 hover_data=['v_full_name', 'h_full_name', 'date', 'gs_slg', 'bl_slg'], 
                # marginal_y='histogram', marginal_x='histogram',
                 trendline="ols", 
                 title="Ballpark-Specific Slugging Differential VS. Game Temp. (F)"
                 )
fig.show()

So, do hotter temperatures equate to more offense? At least over the last 3 years, the answer seems to be a resounding `yes`. In all 4 major metrics, as game temperature increases, so does the positive metric differential.