# **Introduction to Visualization**
## Connecticut Sports Analytics Symposium 2025

Date: April 11, 2025 

Time: 3:40 - 4:50 

Workshop Leader: Rahul Manna



We will be using `matplotlib`in this workshop to create visualizations.

### **Installation**

In [None]:
! pip install matplotlib

In [None]:
! pip install pandas nba_api numpy==2.2.4 pybaseball

### **Importing Packges**

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pybaseball as pyball

## **Matplotlib Commands**

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/anatomy.webp" width=500></center>

### Basic Plotting Commands


- **Line plot →** `plt.plot(x, y, color , linestyle, marker, label, alpha, linewidth)`  
- **Scatter plot →** `plt.scatter(x, y, marker, s (marker size), c (marker color), alpha)`  
- **Bar plot →** `plt.bar(x, y, height, width, color, align)`  
- **Histogram →** `plt.hist(data, bins=10, bins, range, density, color, alpha)`
- **Pie chart →** `plt.pie(sizes, labels=labels, sizes, labels, colors, startangle, autopct)`
- **Fill area →** `plt.fill_between(x, y1, y2, color, alpha)`

#### **Table 1: Common Matplotlib Markers and Linestyles**

| Marker | Description | Line Style | Description |
|--------|------------|------------|-------------|
| `.`    | Point      | `-`        | Solid       |
| `o`    | Circle     | `--`       | Dashed      |
| `v`    | Triangle Down | `-.`   | Dash-dot    |
| `^`    | Triangle Up | `:`      | Dotted      |
| `<`    | Triangle Left | None  | No line     |
| `>`    | Triangle Right |  | |
| `s`    | Square     |  | |
| `p`    | Pentagon   |  | |
| `*`    | Star       |  | |
| `+`    | Plus       |  | |
| `x`    | Cross      |  | |
| `D`    | Diamond    |  | |
| `h`    | Hexagon1   |  | |
| `H`    | Hexagon2   |  | |

[More Linestyles](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html) 

[More Markers](https://matplotlib.org/stable/api/markers_api.html)
### **Customization Commands**

- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Axis limits →** `plt.xlim(min, max, left, right)`, `plt.ylim(min, max, bottom, top)`
- **Custom tick labels →** `plt.xticks(ticks, labels)`, `plt.yticks(ticks, labels, ticks, labels)`
- **Add grid →** `plt.grid(True, linestyle='--', linewidth)`
- **Add legend →** `plt.legend(loc, fontsize, title, frameon, bbox_to_anchor)`
- **Add colorbar →** `plt.colorbar(label, orientation, shrink, aspect, pad)`


### **Multiple Plots & Subplots**

- **New figure →** `plt.figure(figsize=(width, height))`
- **Create subplots →** `plt.subplot(rows, cols, index)`
- **Object-oriented subplot approach →** `fig, ax = plt.subplots(nrows, ncols, figsize, sharex, sharey)`
- **Plot using axes →** `ax.plot(x, y)`

### **Saving & Displaying Plots**

- **Save figure →** `plt.savefig("plot.png", filename, dpi, format)`
- **Display plot →** `plt.show()`
- **Close figure →** `plt.close()`


## Example 1: Standard Normal Distribution

Let's plot the standard normal distribution $N(\mu=1,\sigma=1)$ pdf given below. 

$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}} = \frac{1}{\sqrt{2\pi}}e^{\frac{-x^2}{2}}$$


<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/stand_norm_distribution.png" width=500></center>





### Creating Function and Variables

In [None]:
def stand_norm(z):
    return 1/np.sqrt(2*np.pi)*np.exp(-z**2/2)

z = np.linspace(-4,4,1000) # range of x values in plot

f = stand_norm(z)
print(f)

### Plotting a Fuction

- **Line plot →** `plt.plot(x, y, color , linestyle, marker, label, alpha, linewidth)`  
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Add legend →** `plt.legend(loc, fontsize, title, frameon, bbox_to_anchor)`
- **Display plot →** `plt.show()`

### Coloring an Area and Saving a Figure

- **Fill area →** `plt.fill_between(x, y1, y2, color, alpha)`
- **Save figure →** `plt.savefig("plot.png", filename, dpi)`



In [None]:
z1 = np.linspace(1,4,1000)
f1 = stand_norm(z1)



## Example 2: Ball Position Over Home Plate



<center><img src="https://www.andschneider.dev/images/mlb-gameday-1.png"><center>


### Getting Baseball data

In [None]:
judge_data = pyball.statcast_batter(start_dt='2024-03-28',end_dt='2024-09-29',player_id=592450)
platex = judge_data['plate_x']
platey = judge_data['plate_z']
judge_data.head()

### **Scatter Plot of Hit Locations**

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/aaron_judge_scatter.png" width=500></center>


- **Scatter plot →** `plt.scatter(x, y, marker, s (marker size), c (marker color), alpha)`  
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Display plot →** `plt.show()`

### **Histogram of Aaron Judge's Exit Velocity**

- **Histogram →** `plt.hist(data, bins=10, bins, range, density, color, alpha)`
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Display plot →** `plt.show()`

### **Scatter Plot of Hit Locations Colored by Exit Velocty**

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/aaron_judge_scatter_exit_velocity.png" width=500></center>

- **Scatter plot →** `plt.scatter(x, y, marker, s (marker size), c (marker color), alpha)`
- **Add colorbar →** `plt.colorbar(label)`
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Display plot →** `plt.show()`

#### Available Colormaps

| Colormap Name  | Description                  | Type      |
|---------------|------------------------------|----------|
| viridis       | Perceptually uniform, good for general use | Sequential |
| plasma        | High contrast, perceptually uniform | Sequential |
| inferno       | Dark-to-light, good for visibility | Sequential |
| magma         | Dark purple to yellow gradient | Sequential |
| coolwarm      | Diverging colormap from blue to red | Diverging |

[More Colormaps](https://matplotlib.org/stable/users/explain/colors/colormaps.html)

### Scatter Plot of Hit Locations by Zone

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/baseball_zones.png" width=300><center/>

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/aaron_judge_scatter_zones.png" width=500><center/>

In [None]:
judge_data['in_zone'] = judge_data['zone'].apply(lambda x: f'Zone: {str(int(x))}' if x in range(0,10) else 'Out of Zone')

plt.figure(figsize=(9,7))

for category, group in judge_data.groupby('in_zone'):
    "plt.scatter goes here"



## Example 3: Array of Plots: Bar Chart of Top NBA Teams Win Percentages

<center><img src="https://raw.githubusercontent.com/ram200010/CSAS_2025_Data_Visualization/refs/heads/main/images/bar_chart_example.png" width=700></center>


- **Scatter plot →** `plt.scatter(x, y, marker, s (marker size), c (marker color), alpha)`  
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Display plot →** `plt.show()`

### Getting Basketball Data

In [None]:
from nba_api.stats.endpoints import LeagueStandings

# Fetch data for each season
data_2023 = LeagueStandings(season='2022-23').get_data_frames()[0]
data_2024 = LeagueStandings(season='2023-24').get_data_frames()[0]

# Compute win percentages
data_2023['WinPct'] = data_2023['WINS'] / (data_2023['WINS'] + data_2023['LOSSES'])
data_2024['WinPct'] = data_2024['WINS'] / (data_2024['WINS'] + data_2024['LOSSES'])

# Get top 5 teams for each season
top_2023 = data_2023.nlargest(5, 'WinPct')[['TeamName','TeamID','WINS','LOSSES','WinPct']]
top_2024 = data_2024.nlargest(5, 'WinPct')[['TeamName','TeamID','WINS','LOSSES','WinPct']]

print('2022-23 Season','\n',top_2023.head(),'\n')
print('2024-25 Season','\n',top_2024.head())

### Single Bar Chart

- **Bar plot →** `plt.bar(x, y, height, width, color, align)`
- **Axis labels →** `plt.xlabel("X-axis Label", label, fontsize, color)`, `plt.ylabel("Y-axis Label", label, fontsize, color)`
- **Title →** `plt.title("Plot Title", left, right, bottom, top)`
- **Display plot →** `plt.show()`


In [None]:
team_23 = top_2023['TeamName']
pct_23 = top_2023['WinPct']

# Plot Bars


plt.xlabel("Team", fontsize=12)
plt.ylabel("Win Percentage", fontsize=12)
plt.title("Season 2022-23", fontsize=14)
plt.show()

## Multiple Bar Charts

- **Multiple Plots →** `fig, ax = plt.subplots(nrows, ncols, figsize, sharex, sharey)`
- **Figure title →** `fig.suptitle("Title", fontsize, fontweight)`
- **Bar plot →** `ax[i, j].bar(x, y, color, width, height, align, alpha)`  
- **Subplot title →** `ax[i, j].set_title("Title", fontsize, loc)`  
- **X-axis label →** `ax[i, j].set_xlabel("Label", fontsize, color)`  
- **Y-axis label →** `ax[i, j].set_ylabel("Label", fontsize, color)`  
- **Axis limits →** `ax[i, j].set_xlim(min, max)`, `ax[i, j].set_ylim(min, max)`  
- **Rotate tick labels →** `ax[i, j].tick_params(axis, rotation, labelsize, length, width, colors)`  

*where `i` and `j` are the row and column indices, respectively, for an `nrows` x `ncols` subplot grid*

*If there is only 1 column or row, one index will suffice*


In [None]:
team_24 = top_2024['TeamName']
pct_24 = top_2024['WinPct']

## Using Style Sheets

[List of Style Sheets](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html)

In [None]:
plt.style.available