<a href="https://colab.research.google.com/github/flyaflya/persuasive/blob/main/demoNotebooks/CoorsFieldWalkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/Coors_Field_panorama_2022.jpg/400px-Coors_Field_panorama_2022.jpg)

According to Wikipedia [https://en.wikipedia.org/wiki/Coors_Field](https://en.wikipedia.org/wiki/Coors_Field), Coors Field is a baseball stadium in Denver, Colorado with a reputation for being a home-run friendly baseball stadium:

> At 5,200 feet (1,580 m) above sea level, Coors Field is by far the highest park in the majors. The next-highest, Chase Field in Phoenix, stands at 1,100 feet (340 m). Designers knew that the stadium would give up a lot of home runs, as the lower air density at such a high elevation would result in balls traveling farther than in other parks. To compensate for this, the outfield fences were placed at an unusually far distance from home plate, thus creating the largest outfield in Major League Baseball.[15] In spite of the pushed-back fences, for many years Coors Field not only gave up the most home runs in baseball, but due to the resultant large field area, the most doubles and triples as well.

Wikipedia goes on to say:

> Although the number of home runs hit per season at Coors Field is decreasing, Coors Field still remains the most hitter friendly ballpark in the Major Leagues by a wide margin. From 2012 to 2015, the Colorado Rockies led the league in runs scored in home games, while being last in the league for runs scored in away games. This demonstrates the extreme benefit that Coors Field's low air density provides to hitters.

Nothing more recent than 2015 is mentioned in the Wikipedia article.  And since I have access to baseball data from the 2010 and 2021 seasons, let's investigate whether Colorado was and still is the most home-run friendly baseball stadium?

## Stage 1: Purpose Identified
Our purpose is to investigate the Colorado stadium and showcase how run-friendly of a baseball stadium it really is

## Stage 2: Get Content

We need some baseball data.  Luckily we can get every game in 2010 (or 2021).  Note that plotting 2,000+ baseball games is too much information, so we summarize average runs scored to get just one piece of data per stadium.



In [None]:
# !pip install matplotlib --upgrade
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## get 2010 baseball season data - source: https://www.retrosheet.org/gamelogs/index.html
df2010 = pd.read_csv("https://raw.githubusercontent.com/flyaflya/persuasive/main/baseball10.csv")

## get 2021 baseball season data - source: https://www.retrosheet.org/gamelogs/index.html
#df2021 = pd.read_csv("https://raw.githubusercontent.com/flyaflya/persuasive/main/baseball21.csv")


In [None]:
## aggregate data to just be average homeruns by the visiting and home team for each stadium
## (i.e. the "Home" team's stadium)

avgDF = (df2010.assign(totalHR = lambda df: df.visHR + df.homeHR)
          .assign(totalRuns = lambda df: df.homeScore + df.visScore)
          .drop(columns = ['date','visiting'])
          .groupby(['home'], as_index=False)
          .mean()
)  
avgDF.head(5)

## Stage 3: Structure - Map the content to the right aesthetic


In [None]:
## here is the code for a scatter plot
## show plot
fig, ax = plt.subplots(figsize = [8,6])
ax.scatter(x = avgDF.home, y = avgDF.totalRuns)

## notice the mapping of stadium to the x-axis ends up crowding the labels
## notice that since 0 has meaning, we should probably include that too

In [None]:
## here is the code for a scatter plot
## show plot
fig, ax = plt.subplots(figsize = [8,6])
ax.scatter(y = avgDF.home, x = avgDF.totalRuns)
ax.set_xlim([0,11])
## notice the mapping of stadium to the x-axis ends up crowding the labels

In [None]:
## show plot
fig, ax = plt.subplots(figsize = [8,6])
ax.barh(y = avgDF.home, width = avgDF.totalRuns)

## Stage 4:  Formatting
You could spend your entire life here, be careful.


The above plot might be good enough for you as an analyst.  But it stinks for external consumption.  Let's fix that by

* Highlighting Colorado Using Color (need to create more content here)
* Sorting by totals
* Titling that Indicates Purpose
* Audience-friendly Labels for Axes
* A nicer plot style (plt.style.use("seaborn-whitegrid"))
* Color-coded Legend (this is sort of ooptional, but shown for completeness)

In [None]:
# create color data with list comprehension
avgDF["barColor"] = ["darkorchid" if stadium == "COL" else "lightgrey" for stadium in avgDF.home]
avgDF.head(10)

In [None]:
## highlight Colorado
fig, ax = plt.subplots(figsize = [8,6])
ax.barh(y = avgDF.home, width = avgDF.totalRuns, color = avgDF.barColor)

In [None]:
## sort by runs and add title/labels
## inplace = True overwrites the values in avgDF using new sort
avgDF.sort_values('totalRuns', inplace = True)

fig, ax = plt.subplots(figsize = [8,6])
ax.barh(y = avgDF.home, width = avgDF.totalRuns, color = avgDF.barColor)
ax.set_title("Colorado (COL) is the Most Run-Friendly Ballpark in 2010")
ax.set_xlabel("Average Runs Per Game")
ax.set_ylabel("Three-Letter Stadium Code")

In [None]:
## use a nicer plot style
plt.style.use("seaborn-whitegrid")

fig, ax = plt.subplots(figsize = [8,6])
ax.barh(y = avgDF.home, width = avgDF.totalRuns, color = avgDF.barColor)
ax.set_title("Colorado (COL) is the Most Run-Friendly Ballpark in 2010")
ax.set_xlabel("Average Runs Per Game")
ax.set_ylabel("Three-Letter Stadium Code")

In [None]:
## add a legend
plt.style.use("seaborn-whitegrid")

fig, ax = plt.subplots(figsize = [8,6])

for index, row in avgDF.iterrows():
    if row.home == "COL":
        coloradoBar = ax.barh(y = row.home, width = row.totalRuns, color = row.barColor, label = "Colorado")
    else:
        otherBar = ax.barh(y = row.home, width = row.totalRuns, color = row.barColor, label = "Other Stadium")

ax.set_title("Colorado (COL) is the Most Run-Friendly Ballpark in 2010")
ax.set_xlabel("Average Runs Per Game")
ax.set_ylabel("Three-Letter Stadium Code")
ax.legend(handles = [coloradoBar, otherBar], loc = (0.75,0.2), frameon = True)
fig.show()

# Your Turn

Make a copy of this notebook for modifying.  Go through the code to modify the above plot which was made for "total runs" in 2010 to produce a similar plot for total **home runs** in 2021.  Change labels and titles according to what the data says.  Does Colorado appear to be the most homerun friendly ballpark in 2021?