# TriKirby Index

The TriKirby Index is a methodology inspired by the Kirby Index that quantifies the command and effectiveness of each UCSD Pitcher's pitches ahead of the 2026 season.

## Methodology: TriKirby Index (Percentile-Based Command Metric)

### Overview
The **TriKirby Index** is a pitch-level command metric designed to quantify a pitcher’s ability to consistently repeat release direction and release location. The metric extends the original Kirby Index framework by operating at the **pitch-type level**, normalizing performance **relative to NCAA Division I pitchers**, and producing an interpretable **percentile-based score**.

---

## 1. Pitch-Level Data Construction
All calculations begin from pitch-level TrackMan data. Each pitch includes measurements of:

- Vertical Release Angle (VRA)
- Horizontal Release Angle (HRA)
- Vertical Release Location ($vRel$)
- Horizontal Release Location ($hRel$)

Pitch-level data is retained until aggregation to preserve within-pitch variability.

---

## 2. Pitcher–Pitch Type Aggregation
For each pitcher $p$ and pitch type $t$, release consistency is summarized using the **standard deviation** of each release component:

$$
\sigma_{\text{VRA},p,t}, \quad
\sigma_{\text{HRA},p,t}, \quad
\sigma_{\text{vRel},p,t}, \quad
\sigma_{\text{hRel},p,t}
$$

Lower standard deviation indicates greater command consistency.

---

## 3. Percentile-Based Release Consistency Metrics

For each pitcher \( p \) and pitch type \( t \), release consistency is quantified using the
within-pitch-type **standard deviation** of four release components:

- Vertical Release Angle (VRA)
- Horizontal Release Angle (HRA)
- Vertical Release Location (vRel)
- Horizontal Release Location (hRel)

Lower variability indicates greater repeatability and improved pitch command.

To ensure comparability across NCAA Division I pitchers **within each pitch type**, each
standard deviation is converted into a percentile-based consistency score:

$$
\text{SD}_{p,t}^{\text{pct}} = 1 - \operatorname{rank}_{\text{pct}}\left(\text{SD}_{p,t}\right)
$$

This transformation ensures that:

- Scores lie in the interval \( [0,1] \)
- Higher values correspond to **better command**
- Each pitch type is normalized independently
- Metrics are robust to scale differences across pitch types

The resulting percentile-based release consistency metrics are:

$$
\text{sd\_vra\_pct}, \quad
\text{sd\_hra\_pct}, \quad
\text{sd\_vrel\_pct}, \quad
\text{sd\_hrel\_pct}
$$

---

## 4. Linear Component Weights

Each release component contributes differently to overall pitch command.
To reflect this, a set of **fixed linear weights** \( \beta_i \) is applied to each percentile-based
metric.

The weights are normalized such that:

$$
\sum_{i=1}^{4} \beta_i = 1
$$

These weights encode the relative importance of directional consistency
(release angles) versus spatial consistency (release location).

---
## 5. Feature Importance via Random Forest Regression

Each release component does not contribute equally to overall pitch command. To estimate the **relative importance** of each release dimension, the TriKirby Index uses a **Random Forest regression model** rather than fixed linear coefficients.

A Random Forest model is chosen because it:
- Captures **nonlinear relationships** between release mechanics and pitch location outcomes  
- Accounts for **interactions** between release variables  
- Avoids restrictive linear assumptions  

---

#### **Random Forest Model**

Let the feature vector for each pitch be:

$$
\mathbf{x} =
\left(
\text{VRA},\;
\text{HRA},\;
\text{vRel},\;
\text{hRel}
\right)
$$

A Random Forest regressor is trained to predict pitch location outcomes using these release features. After training, **feature importances** are extracted from the ensemble.

---

#### **Feature Importance Weights**

The Random Forest produces a set of feature importance weights:

$$
\boldsymbol{\beta}
=
\left(
\beta_{\text{VRA}},
\beta_{\text{HRA}},
\beta_{\text{vRel}},
\beta_{\text{hRel}}
\right)
$$

Each coefficient represents the **relative contribution** of a release component to pitch command, measured as the average reduction in prediction error across all trees.

By construction, the feature importances satisfy:

$$
\sum_{i=1}^{4} \beta_i = 1
$$

Thus, the weights are:
- **Already normalized**
- **Data-driven**
- **Nonlinear-aware**

No averaging or collapsing across plate dimensions is performed.

---

#### **Interpretation of Weights**

- Larger $\beta_i$ values indicate greater influence on pitch command  
- Weights reflect nonlinear and interaction-based effects  
- Directional consistency (release angles) and spatial consistency (release location) are allowed to contribute unequally  

These Random Forest–derived weights are used directly in the final TriKirby Index formulation.

---

## 6. TriKirby Index (Final Composite Score)

The **TriKirby Index** for pitcher $p$ and pitch type $t$ is computed as a weighted linear combination of percentile-based release consistency scores:

$$
\text{TriKirby}_{p,t}
=
\beta_{\text{VRA}} \cdot SD^{\text{pct}}_{\text{VRA},p,t}
+
\beta_{\text{HRA}} \cdot SD^{\text{pct}}_{\text{HRA},p,t}
+
\beta_{\text{vRel}} \cdot SD^{\text{pct}}_{\text{vRel},p,t}
+
\beta_{\text{hRel}} \cdot SD^{\text{pct}}_{\text{hRel},p,t}
$$

where:
- $SD^{\text{pct}} \in [0,1]$ denotes percentile-based consistency scores  
- Higher values correspond to better pitch command  
- Feature importance weights $\boldsymbol{\beta}$ are learned from the Random Forest model  


## 7. TriKirby Index (Final Score)

The **TriKirby Index** for pitcher \( p \) and pitch type \( t \) is computed as a weighted linear
combination of percentile-based release consistency metrics:

$$
\text{TriKirby}_{p,t}
= \beta_1 \cdot \text{sd\_vra\_pct}_{p,t}
+ \beta_2 \cdot \text{sd\_hra\_pct}_{p,t}
+ \beta_3 \cdot \text{sd\_vrel\_pct}_{p,t}
+ \beta_4 \cdot \text{sd\_hrel\_pct}_{p,t}
$$

---

## 8. Interpretation

- **Higher TriKirby values indicate better pitch command**
- Reflects tighter release direction and release point consistency
- Fully comparable **within pitch type** across NCAA Division I pitchers
- Designed for pitch-level evaluation rather than cross-pitch comparisons
- NCAA D1 Pitching Kirby Index Average Score across all pitch types is 0.5. Any values below 0.5 indicates a below average command on that pitch. Any values above 0.5 is above average command for that pitch across NCAA D1 baseball.


## 1. Data Preparation & Filtering

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor

DATA_DIR = Path("Trackman CSVs")   

In [2]:
import pandas as pd
import numpy as np

# -------------------------------
# Load the two datasets
# -------------------------------
ncaa_df = pd.read_csv("all_games.csv")
ucsd_df = pd.read_csv("ucsd_fall_pitching_data.csv")  # <-- match your exact filename on the left

# -------------------------------
# Backward-compatible aliases
# (so the rest of the notebook works)
# -------------------------------
all_df = ncaa_df.copy()      # NCAA-wide (used for baselines / percentiles / betas)
ucsd_all = ucsd_df.copy()    # UCSD-only (used for your tables / rankings)

print("Loaded NCAA rows:", len(all_df), "| UCSD rows:", len(ucsd_all))
display(all_df.head(3))
display(ucsd_all.head(3))

  ncaa_df = pd.read_csv("all_games.csv")


Loaded NCAA rows: 1501313 | UCSD rows: 7268


Unnamed: 0,PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,...,ThrowTrajectoryZc1,ThrowTrajectoryZc2,PitchReleaseConfidence,PitchLocationConfidence,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence
0,1,2025-02-14,18:04:47.71,1,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,,,High,High,High,,,,,
1,2,2025-02-14,18:05:00.16,1,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,,,High,High,High,,,,,
2,3,2025-02-14,18:05:14.48,1,3,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,-78.35097,12.46061,High,High,High,,,Medium,Medium,Low


Unnamed: 0,PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,...,ThrowTrajectoryZc1,ThrowTrajectoryZc2,PitchReleaseConfidence,PitchLocationConfidence,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence
0,1,2025-11-01,11:35:12.81,1.0,1.0,"Gregson, Niccolas",1000039282,Right,CSD_TRI,"Crossland, Michael",...,,,High,High,High,,,,,
1,2,2025-11-01,11:35:34.61,1.0,2.0,"Gregson, Niccolas",1000039282,Right,CSD_TRI,"Crossland, Michael",...,,,High,High,High,,,,,
2,3,2025-11-01,11:36:01.08,1.0,3.0,"Gregson, Niccolas",1000039282,Right,CSD_TRI,"Crossland, Michael",...,,,High,High,High,,,,,


## 2. Pitch-Type–Specific Release Variability Metrics

In [3]:
df = all_df.copy()

def pick_col(candidates):
    for c in candidates:
        if c in df.columns:
            return c
    return None

PITCHER_COL = pick_col(["Pitcher", "PitcherName", "PitcherNameFull"])
TEAM_COL    = pick_col(["PitcherTeam", "Team", "Pitcher Team", "PitcherTeamAbbrev"])
PITCHTYPE_COL = pick_col(["TaggedPitchType", "PitchType", "AutoPitchType"])

HRA_COL = pick_col(["HorzRelAngle"])
VRA_COL = pick_col(["VertRelAngle"])

# Horizontal + vertical release POINTS (usually X and Z)
RELX_COL = pick_col(["RelSide"])
RELZ_COL = pick_col(["RelHeight"])

print("Pitcher:", PITCHER_COL)
print("Team:", TEAM_COL)
print("PitchType:", PITCHTYPE_COL)
print("HRA:", HRA_COL)
print("VRA:", VRA_COL)
print("RelX:", RELX_COL)
print("RelZ:", RELZ_COL)

req = [PITCHER_COL, TEAM_COL, PITCHTYPE_COL, HRA_COL, VRA_COL, RELX_COL, RELZ_COL]
missing = [r for r in req if r is None]
if missing:
    raise ValueError("Missing required columns. Fix candidates list above. Missing: " + str(missing))

# Keep only what we need + clean
df = df.dropna(subset=[PITCHER_COL, TEAM_COL, PITCHTYPE_COL, HRA_COL, VRA_COL, RELX_COL, RELZ_COL]).copy()
df[PITCHTYPE_COL] = df[PITCHTYPE_COL].astype(str).str.strip()
df[PITCHER_COL]   = df[PITCHER_COL].astype(str).str.strip()
df[TEAM_COL]      = df[TEAM_COL].astype(str).str.strip()

df.head()

Pitcher: Pitcher
Team: PitcherTeam
PitchType: TaggedPitchType
HRA: HorzRelAngle
VRA: VertRelAngle
RelX: RelSide
RelZ: RelHeight


Unnamed: 0,PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,...,ThrowTrajectoryZc1,ThrowTrajectoryZc2,PitchReleaseConfidence,PitchLocationConfidence,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence
0,1,2025-02-14,18:04:47.71,1,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,,,High,High,High,,,,,
1,2,2025-02-14,18:05:00.16,1,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,,,High,High,High,,,,,
2,3,2025-02-14,18:05:14.48,1,3,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,-78.35097,12.46061,High,High,High,,,Medium,Medium,Low
3,4,2025-02-14,18:05:39.46,2,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,,,High,High,High,,,,,
4,5,2025-02-14,18:05:53.55,2,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,,,High,High,High,,,,,


## UC San Diego Pitching

In [4]:
UCSD_CODE = "CSD_TRI"   # <-- based on what you saw in your output

ucsd_pitchers = sorted(df.loc[df[TEAM_COL] == UCSD_CODE, PITCHER_COL].unique())
print("UCSD pitchers found:", len(ucsd_pitchers))
ucsd_pitchers[:30]

UCSD pitchers found: 19


['Cazares, Julian',
 'Custer, Julian',
 'Dalquist, Matthew',
 'Davidson, Garrett',
 'Ernisse, Zach',
 'Gregson, Niccolas',
 'Hasegawa, Sam',
 'Huy, Nathan',
 'King, Devon',
 'Marchetti, Landon',
 'Murdock, Steele',
 'Nickerson, Trevor',
 'Patterson, Garrett',
 'Pelzman, Harry',
 'Remmers, Ethan',
 'Ries, Nathan',
 'Seid, Spencer',
 'Villar, Jake',
 'Weber, Chapman']

In [5]:
# === UCSD-only filter (do this BEFORE building spread) ===
TEAM_COL = "PitcherTeam"  # change if your df uses a different team column
UCSD_TEAM_NAMES = ["CSD_TRI", "UCSD", "SAN_DIEGO", "UC San Diego"]

print("Unique teams (sample):", df[TEAM_COL].dropna().astype(str).unique()[:20])

df = df[df[TEAM_COL].isin(UCSD_TEAM_NAMES)].copy()
print("Rows after UCSD filter:", len(df))

Unique teams (sample): ['SOU_LIO' 'LIN_UNI' 'OKL_COW' 'CLE_TIG' 'COL_CHA' 'VCU_RAM' 'ARK_RAZ'
 'WAS_COU' 'BAY_BEA' 'YSU_PEN' 'LON_DIR' 'NOR_CAT' 'GEO_BUL' 'QUI_BOB'
 'DIX_STE' 'NCB' 'CAL_BEA' 'NEV_WOL' 'ECU_PIR' 'GEO_PAT']
Rows after UCSD filter: 4459


## NCAA-Wide Spread Table

In [6]:
MIN_PITCHES_PER_TYPE = 0  # adjust if you want (prevents noisy tiny samples)

g = df.groupby([PITCHER_COL, PITCHTYPE_COL])

spread = g.agg(
    n_pitches=(PITCHTYPE_COL, "size"),
    sd_hra=(HRA_COL, "std"),
    sd_vra=(VRA_COL, "std"),
    sd_relx=(RELX_COL, "std"),
    sd_relz=(RELZ_COL, "std"),
).reset_index()

spread = spread[spread["n_pitches"] >= MIN_PITCHES_PER_TYPE].dropna()
spread.head()

Unnamed: 0,Pitcher,TaggedPitchType,n_pitches,sd_hra,sd_vra,sd_relx,sd_relz
0,"Cazares, Julian",ChangeUp,3,0.723429,0.262306,0.175095,0.146535
2,"Cazares, Julian",Fastball,38,0.721784,0.87741,0.146281,0.153933
3,"Cazares, Julian",FourSeamFastBall,3,2.076283,0.754681,0.086328,0.045497
4,"Cazares, Julian",Slider,29,1.095653,1.129368,0.143436,0.118094
5,"Cazares, Julian",TwoSeamFastBall,2,0.351325,0.062335,0.053669,0.023349


## NCAA Percentile-Based Command Scores

In [7]:
# Percentile-based command (higher = better)
for col in ["sd_hra", "sd_vra", "sd_relx", "sd_relz"]:
    spread[f"{col}_pct"] = 1 - spread.groupby(PITCHTYPE_COL)[col].rank(pct=True)

## Regression Model for Beta Weights

In [8]:
# Select Features + Target (Plate Location)

# Features (predictors)
FEATURES = [
    "VertRelAngle",   # VRA
    "HorzRelAngle",   # HRA
    "RelHeight",      # vRel
    "RelSide"         # hRel
]

# Targets (plate location)
TARGETS = [
    "PlateLocHeight",  # Z location
    "PlateLocSide"     # X location
]

model_df = df.dropna(subset=FEATURES + TARGETS)

X = model_df[FEATURES]
y = model_df[TARGETS]


In [9]:
# Test/Train Split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [10]:
# Linear Regression (Kirby-style)

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

y_pred = lin_reg.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))


MSE: 0.5178465982081192
R²: 0.41230259350869847


In [11]:
# Extract Beta Weights

beta_df = pd.DataFrame(
    lin_reg.coef_,
    columns=FEATURES,
    index=["PlateSide", "PlateHeight"]
)

beta_df

Unnamed: 0,VertRelAngle,HorzRelAngle,RelHeight,RelSide
PlateSide,0.365545,-0.046137,0.602566,-0.023966
PlateHeight,-0.046994,0.607446,0.542632,0.727966


In [12]:
beta_weights = beta_df.abs().mean(axis=0)
beta_weights = beta_weights / beta_weights.sum()  # normalize

beta_weights

VertRelAngle    0.139218
HorzRelAngle    0.220563
RelHeight       0.386467
RelSide         0.253752
dtype: float64

## Build Pitch Command Spread (Raw SDs)

In [13]:
import numpy as np

# --- 1) Identify the correct release point columns in YOUR CSV ---
# Trackman commonly uses RelHeight (vertical release) and RelSide (horizontal release)
CAND_VREL = ["RelHeight", "ReleaseHeight", "release_pos_z", "RelZ", "vRel"]
CAND_HREL = ["RelSide", "ReleaseSide", "release_pos_x", "RelX", "hRel"]

vrel_col = next((c for c in CAND_VREL if c in df.columns), None)
hrel_col = next((c for c in CAND_HREL if c in df.columns), None)

print("Using vRel column:", vrel_col)
print("Using hRel column:", hrel_col)

if vrel_col is None or hrel_col is None:
    raise ValueError(
        f"Could not find release point columns. "
        f"Columns in df include: {list(df.columns)[:40]} ..."
    )

# --- 2) Compute SDs per (Pitcher, PitchType) ---
PITCHER_COL = "Pitcher"
PITCHTYPE_COL = "TaggedPitchType"

spread = (
    df.groupby([PITCHER_COL, PITCHTYPE_COL])
      .agg(
          sd_vra=("VertRelAngle", lambda s: s.std(ddof=1)),
          sd_hra=("HorzRelAngle", lambda s: s.std(ddof=1)),
          sd_vrel=(vrel_col, lambda s: s.std(ddof=1)),
          sd_hrel=(hrel_col, lambda s: s.std(ddof=1)),
          n=("VertRelAngle", "size")
      )
      .reset_index()
)

# (Optional but recommended) drop tiny sample sizes to reduce noise
MIN_PITCHES_PER_TYPE = 0
spread = spread[spread["n"] >= MIN_PITCHES_PER_TYPE].copy()

# --- 3) Z-score each SD metric within pitch type (NCAA-wide benchmark) ---
def z_by_pitchtype(series):
    mu = series.mean()
    sd = series.std(ddof=0)
    return (series - mu) / sd if sd != 0 else np.nan

spread["z_sd_vra"]  = spread.groupby(PITCHTYPE_COL)["sd_vra"].transform(z_by_pitchtype)
spread["z_sd_hra"]  = spread.groupby(PITCHTYPE_COL)["sd_hra"].transform(z_by_pitchtype)
spread["z_sd_vrel"] = spread.groupby(PITCHTYPE_COL)["sd_vrel"].transform(z_by_pitchtype)
spread["z_sd_hrel"] = spread.groupby(PITCHTYPE_COL)["sd_hrel"].transform(z_by_pitchtype)

spread.head()

Using vRel column: RelHeight
Using hRel column: RelSide


Unnamed: 0,Pitcher,TaggedPitchType,sd_vra,sd_hra,sd_vrel,sd_hrel,n,z_sd_vra,z_sd_hra,z_sd_vrel,z_sd_hrel
0,"Cazares, Julian",ChangeUp,0.262306,0.723429,0.146535,0.175095,3,-2.078778,-0.441253,1.374577,-0.278438
1,"Cazares, Julian",Cutter,,,,,1,,,,
2,"Cazares, Julian",Fastball,0.87741,0.721784,0.153933,0.146281,38,-1.117055,-0.743044,0.675611,-0.536474
3,"Cazares, Julian",FourSeamFastBall,0.754681,2.076283,0.045497,0.086328,3,-1.030274,2.022747,-0.737004,-0.608437
4,"Cazares, Julian",Slider,1.129368,1.095653,0.118094,0.143436,29,-0.401259,-0.111984,0.227484,-0.39148


## NCAA-Normalized Release Consistency Percentiles

In [14]:
# ============================================================
# Convert release SDs to percentiles (lower SD = better)
# Done per pitch type (NCAA-normalized)
# Creates: sd_vra_pct, sd_hra_pct, sd_vrel_pct, sd_hrel_pct
# ============================================================

PCT_COLS = {
    "sd_vra":  "sd_vra_pct",
    "sd_hra":  "sd_hra_pct",
    "sd_vrel": "sd_vrel_pct",
    "sd_hrel": "sd_hrel_pct",
}

missing_raw = [c for c in PCT_COLS.keys() if c not in spread.columns]
if missing_raw:
    raise ValueError(f"Missing raw SD columns needed for percentiles: {missing_raw}")

for raw_col, pct_col in PCT_COLS.items():
    # rank within each pitch type; invert so lower SD = higher percentile (better)
    spread[pct_col] = 1 - spread.groupby(PITCHTYPE_COL)[raw_col].rank(pct=True)

# round for display (hundredths)
for c in PCT_COLS.values():
    spread[c] = spread[c].round(2)

display(spread[list(PCT_COLS.values())].describe())

Unnamed: 0,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
count,84.0,84.0,84.0,84.0
mean,0.434524,0.434524,0.434524,0.434524
std,0.296507,0.296507,0.296507,0.296507
min,0.0,0.0,0.0,0.0
25%,0.1675,0.1675,0.1675,0.1675
50%,0.44,0.44,0.44,0.44
75%,0.6725,0.6725,0.6725,0.6725
max,0.95,0.95,0.95,0.95


## Random Forest -> Feature Importance Betas

In [15]:
# ================================
# Random Forest: Feature Importances (Beta Weights)
# (nonlinear; already normalized to sum to 1)
# ================================

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import pandas as pd

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

rf_regressor = RandomForestRegressor(
    n_estimators=300,
    random_state=42,
    n_jobs=-1,
    min_samples_leaf=5,
    max_features="sqrt"   # <-- FIX: don't use "auto"
)

rf_regressor.fit(X_train, y_train)

rf_importance = pd.Series(
    rf_regressor.feature_importances_,
    index=FEATURES
).sort_values(ascending=False)

# already sums to 1, but keep for safety
beta_weights = rf_importance / rf_importance.sum()

print("RF importances sum:", rf_importance.sum())
display(rf_importance)

print("Beta weights sum:", beta_weights.sum())
display(beta_weights)

RF importances sum: 1.0000000000000002


VertRelAngle    0.347710
HorzRelAngle    0.317594
RelSide         0.181851
RelHeight       0.152846
dtype: float64

Beta weights sum: 1.0


VertRelAngle    0.347710
HorzRelAngle    0.317594
RelSide         0.181851
RelHeight       0.152846
dtype: float64

## TriKirby Equation - Composite Pitch Command Metric

In [19]:
# TriKirby Equation (percentile-based, higher = better)

betas = {
    "sd_vra_pct": beta_weights["VertRelAngle"],
    "sd_hra_pct": beta_weights["HorzRelAngle"],
    "sd_vrel_pct": beta_weights["RelHeight"],
    "sd_hrel_pct": beta_weights["RelSide"],
}

spread["TriKirby"] = (
    betas["sd_vra_pct"] * spread["sd_vra_pct"] +
    betas["sd_hra_pct"] * spread["sd_hra_pct"] +
    betas["sd_vrel_pct"] * spread["sd_vrel_pct"] +
    betas["sd_hrel_pct"] * spread["sd_hrel_pct"]
).round(3)

display(
    spread[
        ["Pitcher", "TaggedPitchType", "TriKirby"]
    ].head()
)

Unnamed: 0,Pitcher,TaggedPitchType,TriKirby
0,"Cazares, Julian",ChangeUp,0.649
1,"Cazares, Julian",Cutter,
2,"Cazares, Julian",Fastball,0.681
3,"Cazares, Julian",FourSeamFastBall,0.482
4,"Cazares, Julian",Slider,0.513


## UCSD TriKirby Index Score Across all Pitch Types (NCAA D1 AVG TriKirby Index Score = 0.5)

In [20]:
# ================================
# UCSD TriKirby by Pitch Type
# Percentile-based (0–1), higher = better
# ================================

ucsd_spread = spread.copy()

pitch_types = sorted(ucsd_spread[PITCHTYPE_COL].dropna().unique())
print("UCSD pitch types found:", pitch_types)

for pt in pitch_types:
    df_pt = (
        ucsd_spread[ucsd_spread[PITCHTYPE_COL] == pt]
        .sort_values("TriKirby", ascending=False)
        .copy()
    )

    display_cols = [
        PITCHER_COL,
        PITCHTYPE_COL,
        "n",
        "TriKirby"
    ]
    display_cols = [c for c in display_cols if c in df_pt.columns]

    print(f"\n=== {pt} (UCSD only) — TriKirby Command Percentile ===")
    display(df_pt[display_cols].reset_index(drop=True))

UCSD pitch types found: ['ChangeUp', 'Curveball', 'Cutter', 'Fastball', 'FourSeamFastBall', 'Knuckleball', 'Sinker', 'Slider', 'Splitter', 'Sweeper', 'TwoSeamFastBall']

=== ChangeUp (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Marchetti, Landon",ChangeUp,10,0.658
1,"Dalquist, Matthew",ChangeUp,43,0.655
2,"Cazares, Julian",ChangeUp,3,0.649
3,"King, Devon",ChangeUp,4,0.639
4,"Gregson, Niccolas",ChangeUp,6,0.557
5,"Villar, Jake",ChangeUp,101,0.464
6,"Davidson, Garrett",ChangeUp,126,0.441
7,"Remmers, Ethan",ChangeUp,29,0.411
8,"Pelzman, Harry",ChangeUp,3,0.405
9,"Nickerson, Trevor",ChangeUp,7,0.403



=== Curveball (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"King, Devon",Curveball,9,0.654
1,"Dalquist, Matthew",Curveball,75,0.632
2,"Gregson, Niccolas",Curveball,58,0.622
3,"Remmers, Ethan",Curveball,3,0.592
4,"Marchetti, Landon",Curveball,14,0.448
5,"Hasegawa, Sam",Curveball,4,0.362
6,"Davidson, Garrett",Curveball,13,0.324
7,"Ries, Nathan",Curveball,55,0.222
8,"Villar, Jake",Curveball,16,0.145
9,"Murdock, Steele",Curveball,1,



=== Cutter (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Nickerson, Trevor",Cutter,2,0.855
1,"Dalquist, Matthew",Cutter,9,0.693
2,"Murdock, Steele",Cutter,7,0.489
3,"Hasegawa, Sam",Cutter,44,0.378
4,"Gregson, Niccolas",Cutter,6,0.376
5,"Davidson, Garrett",Cutter,5,0.367
6,"Ernisse, Zach",Cutter,3,0.32
7,"Seid, Spencer",Cutter,34,0.278
8,"King, Devon",Cutter,94,0.243
9,"Cazares, Julian",Cutter,1,



=== Fastball (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Patterson, Garrett",Fastball,9,0.874
1,"Custer, Julian",Fastball,5,0.82
2,"Cazares, Julian",Fastball,38,0.681
3,"Dalquist, Matthew",Fastball,298,0.678
4,"Pelzman, Harry",Fastball,82,0.528
5,"Seid, Spencer",Fastball,342,0.527
6,"Gregson, Niccolas",Fastball,251,0.518
7,"Ries, Nathan",Fastball,309,0.5
8,"Weber, Chapman",Fastball,199,0.477
9,"Villar, Jake",Fastball,159,0.45



=== FourSeamFastBall (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Hasegawa, Sam",FourSeamFastBall,11,0.536
1,"Cazares, Julian",FourSeamFastBall,3,0.482
2,"Murdock, Steele",FourSeamFastBall,31,0.473
3,"Villar, Jake",FourSeamFastBall,6,0.46
4,"King, Devon",FourSeamFastBall,5,0.386
5,"Davidson, Garrett",FourSeamFastBall,4,0.164



=== Knuckleball (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Dalquist, Matthew",Knuckleball,7,0.0



=== Sinker (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Villar, Jake",Sinker,32,0.388
1,"Murdock, Steele",Sinker,7,0.329
2,"Remmers, Ethan",Sinker,30,0.283
3,"Custer, Julian",Sinker,1,
4,"Hasegawa, Sam",Sinker,1,
5,"Ries, Nathan",Sinker,1,



=== Slider (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Dalquist, Matthew",Slider,128,0.731
1,"Gregson, Niccolas",Slider,31,0.633
2,"Huy, Nathan",Slider,16,0.627
3,"Weber, Chapman",Slider,25,0.622
4,"Ries, Nathan",Slider,65,0.62
5,"Pelzman, Harry",Slider,13,0.607
6,"Patterson, Garrett",Slider,10,0.534
7,"Murdock, Steele",Slider,117,0.527
8,"Cazares, Julian",Slider,29,0.513
9,"Ernisse, Zach",Slider,17,0.487



=== Splitter (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Seid, Spencer",Splitter,18,0.0



=== Sweeper (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Villar, Jake",Sweeper,24,0.333
1,"Seid, Spencer",Sweeper,5,0.167



=== TwoSeamFastBall (UCSD only) — TriKirby Command Percentile ===


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby
0,"Cazares, Julian",TwoSeamFastBall,2,0.424
1,"King, Devon",TwoSeamFastBall,2,0.076


## TriKirby Index Score for Each Pitch Per UCSD Pitcher 

In [18]:
# ============================
# UCSD Pitcher Arsenal Tables
# (one DataFrame per pitcher)
# ============================

import pandas as pd

# --- 1) Identify the pitcher + pitchtype columns robustly ---
def pick_col(df, candidates):
    for c in candidates:
        if c in df.columns:
            return c
    return None

PITCHER_COL = pick_col(spread, ["Pitcher", "PitcherName", "PitcherNameFull"])
PITCHTYPE_COL = pick_col(spread, ["TaggedPitchType", "PitchType", "AutoPitchType"])
TEAM_COL = pick_col(spread, ["PitcherTeam", "Team", "Pitcher Team", "PitcherTeamAbbrev"])

# --- 2) UCSD filter (use TEAM_COL if available; otherwise fall back to name list if you have it) ---
if TEAM_COL is not None:
    UCSD_TEAM_NAMES = {"UCSD", "SAN_DIEGO", "UC San Diego", "UCSD_BASEBALL", "UCSD Baseball"}
    ucsd_spread = spread[spread[TEAM_COL].astype(str).str.strip().isin(UCSD_TEAM_NAMES)].copy()
else:
    # If you don't have a team column, use an explicit pitcher list you already have:
    # ucsd_spread = spread[spread[PITCHER_COL].isin(ucsd_pitchers)].copy()
    ucsd_spread = spread.copy()
    print("Warning: TEAM_COL not found. Using entire spread. If you have ucsd_pitchers list, filter here.")

# --- 3) Columns to show in each pitcher table ---
required = [PITCHER_COL, PITCHTYPE_COL, "n", "TriKirby"]
missing_req = [c for c in required if c not in ucsd_spread.columns or c is None]
if missing_req:
    raise ValueError(f"Missing required columns for pitcher tables: {missing_req}")

component_cols = [c for c in ["sd_vra_pct", "sd_hra_pct", "sd_vrel_pct", "sd_hrel_pct"] if c in ucsd_spread.columns]
show_cols = [PITCHTYPE_COL, "n", "TriKirby"] + component_cols

# --- 4) Build one DataFrame per pitcher (sorted by best TriKirby) ---
pitcher_tables = {}

for pitcher, df_p in ucsd_spread.groupby(PITCHER_COL):
    df_p = df_p.copy()

    # Sort: best TriKirby first, then more pitches
    df_p = df_p.sort_values(["TriKirby", "n"], ascending=[False, False])

    # Keep only columns we want (and make clean index)
    table = df_p[show_cols].reset_index(drop=True)

    # Optional: round for display
    if "TriKirby" in table.columns:
        table["TriKirby"] = table["TriKirby"].round(3)
    for c in component_cols:
        table[c] = table[c].round(2)

    pitcher_tables[pitcher] = table

# --- 5) Display each pitcher arsenal table ---
print(f"Built {len(pitcher_tables)} pitcher tables.")
for pitcher in sorted(pitcher_tables.keys()):
    print(f"\n=== {pitcher} — Arsenal (UCSD) ===")
    display(pitcher_tables[pitcher])

# --- 6) Optional: one combined long-form table (easy to export)
ucsd_by_pitcher_long = (
    ucsd_spread[[PITCHER_COL] + show_cols]
    .sort_values([PITCHER_COL, "TriKirby", "n"], ascending=[True, False, False])
    .reset_index(drop=True)
)

# Uncomment if you want to see it:
# display(ucsd_by_pitcher_long.head(50))

Built 19 pitcher tables.

=== Cazares, Julian — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Fastball,38,0.681,0.84,0.79,0.21,0.58
1,ChangeUp,3,0.649,0.92,0.69,0.08,0.54
2,Slider,29,0.513,0.58,0.47,0.37,0.58
3,FourSeamFastBall,3,0.482,0.83,0.0,0.67,0.5
4,TwoSeamFastBall,2,0.424,0.5,0.5,0.0,0.5
5,Cutter,1,,,,,



=== Custer, Julian — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Fastball,5,0.82,0.63,0.89,0.95,0.95
1,Slider,3,0.281,0.16,0.16,0.95,0.16
2,Sinker,1,,,,,



=== Dalquist, Matthew — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,128,0.731,0.79,0.84,0.68,0.47
1,Cutter,9,0.693,0.78,0.67,0.44,0.78
2,Fastball,298,0.678,0.79,0.63,0.58,0.63
3,ChangeUp,43,0.655,0.69,0.62,0.69,0.62
4,Curveball,75,0.632,0.56,0.67,0.67,0.67
5,Knuckleball,7,0.0,0.0,0.0,0.0,0.0



=== Davidson, Garrett — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,ChangeUp,126,0.441,0.62,0.31,0.38,0.38
1,Cutter,5,0.367,0.56,0.0,0.33,0.67
2,Slider,29,0.356,0.74,0.11,0.11,0.26
3,Curveball,13,0.324,0.22,0.0,0.56,0.89
4,Fastball,77,0.323,0.53,0.21,0.16,0.26
5,FourSeamFastBall,4,0.164,0.17,0.33,0.0,0.0



=== Ernisse, Zach — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,17,0.487,0.0,0.95,0.21,0.84
1,Cutter,3,0.32,0.11,0.44,0.67,0.22
2,Fastball,60,0.319,0.11,0.32,0.11,0.89
3,ChangeUp,1,,,,,



=== Gregson, Niccolas — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,31,0.633,0.42,0.74,0.84,0.68
1,Curveball,58,0.622,0.44,0.78,0.78,0.56
2,ChangeUp,6,0.557,0.08,0.77,0.85,0.85
3,Fastball,251,0.518,0.21,0.74,0.74,0.53
4,Cutter,6,0.376,0.0,0.89,0.22,0.33



=== Hasegawa, Sam — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,FourSeamFastBall,11,0.536,0.67,0.17,0.83,0.67
1,Fastball,271,0.414,0.16,0.68,0.68,0.21
2,Slider,45,0.393,0.26,0.53,0.63,0.21
3,Cutter,44,0.378,0.33,0.56,0.56,0.0
4,Curveball,4,0.362,0.33,0.44,0.44,0.22
5,Sinker,1,,,,,



=== Huy, Nathan — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,16,0.627,0.95,0.37,0.79,0.32
1,Fastball,22,0.009,0.0,0.0,0.0,0.05



=== King, Devon — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Curveball,9,0.654,0.78,0.33,0.89,0.78
1,ChangeUp,4,0.639,0.54,0.92,0.77,0.23
2,Slider,62,0.415,0.47,0.58,0.0,0.37
3,FourSeamFastBall,5,0.386,0.0,0.5,0.5,0.83
4,Fastball,69,0.318,0.05,0.58,0.32,0.37
5,Cutter,94,0.243,0.44,0.22,0.0,0.11
6,TwoSeamFastBall,2,0.076,0.0,0.0,0.5,0.0



=== Marchetti, Landon — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,ChangeUp,10,0.658,0.23,0.85,0.92,0.92
1,Curveball,14,0.448,0.67,0.56,0.11,0.11
2,Fastball,150,0.428,0.32,0.26,0.53,0.84
3,Slider,22,0.323,0.11,0.32,0.26,0.79



=== Murdock, Steele — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,117,0.527,0.63,0.68,0.47,0.11
1,Cutter,7,0.489,0.67,0.11,0.78,0.56
2,FourSeamFastBall,31,0.473,0.5,0.67,0.17,0.33
3,Fastball,137,0.433,0.47,0.53,0.47,0.16
4,Sinker,7,0.329,0.0,0.33,0.67,0.67
5,ChangeUp,51,0.305,0.46,0.15,0.46,0.15
6,Curveball,1,,,,,



=== Nickerson, Trevor — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Cutter,2,0.855,0.89,0.78,0.89,0.89
1,Slider,4,0.428,0.84,0.0,0.89,0.0
2,ChangeUp,7,0.403,0.31,0.23,0.54,0.77
3,Fastball,30,0.281,0.37,0.05,0.89,0.0
4,Curveball,1,,,,,



=== Patterson, Garrett — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Fastball,9,0.874,0.89,0.95,0.84,0.74
1,Slider,10,0.534,0.89,0.26,0.05,0.74



=== Pelzman, Harry — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,13,0.607,0.68,0.42,0.42,0.95
1,Fastball,82,0.528,0.74,0.16,0.63,0.68
2,ChangeUp,3,0.405,0.85,0.0,0.62,0.08
3,Curveball,1,,,,,



=== Remmers, Ethan — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Curveball,3,0.592,0.89,0.89,0.0,0.0
1,Fastball,67,0.425,0.95,0.11,0.26,0.11
2,ChangeUp,29,0.411,0.77,0.38,0.15,0.0
3,Sinker,30,0.283,0.67,0.0,0.33,0.0
4,Slider,20,0.067,0.05,0.05,0.16,0.05



=== Ries, Nathan — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,65,0.62,0.53,0.79,0.58,0.53
1,Fastball,309,0.5,0.68,0.47,0.37,0.32
2,Curveball,55,0.222,0.11,0.22,0.22,0.44
3,ChangeUp,35,0.129,0.0,0.08,0.31,0.31
4,Sinker,1,,,,,



=== Seid, Spencer — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Fastball,342,0.527,0.58,0.37,0.42,0.79
1,Slider,25,0.406,0.37,0.21,0.32,0.89
2,ChangeUp,35,0.282,0.15,0.46,0.0,0.46
3,Cutter,34,0.278,0.22,0.33,0.11,0.44
4,Sweeper,5,0.167,0.0,0.0,0.5,0.5
5,Splitter,18,0.0,0.0,0.0,0.0,0.0



=== Villar, Jake — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,ChangeUp,101,0.464,0.38,0.54,0.23,0.69
1,FourSeamFastBall,6,0.46,0.33,0.83,0.33,0.17
2,Fastball,159,0.45,0.26,0.84,0.05,0.47
3,Slider,123,0.43,0.21,0.63,0.53,0.42
4,Sinker,32,0.388,0.33,0.67,0.0,0.33
5,Sweeper,24,0.333,0.5,0.5,0.0,0.0
6,Curveball,16,0.145,0.0,0.11,0.33,0.33



=== Weber, Chapman — Arsenal (UCSD) ===


Unnamed: 0,TaggedPitchType,n,TriKirby,sd_vra_pct,sd_hra_pct,sd_vrel_pct,sd_hrel_pct
0,Slider,25,0.622,0.32,0.89,0.74,0.63
1,Fastball,199,0.477,0.42,0.42,0.79,0.42
2,ChangeUp,1,,,,,
