# TriKirby Index

### The TriKirby Index is a methodology inspired by the Kirby Index that quantifies the command and effectiveness of each UCSD Pitcher's pitches ahead of the 2026 season.

## TriKirby Index: Components and Methodology

The **TriKirby Index** is a pitch-level statistic designed to quantify **pitch command** for **UC San Diego (UCSD) pitchers**, evaluated **relative to all NCAA Division I pitching**.

In this framework, **command** is defined as a pitcher’s ability to **consistently reproduce both release direction and release location** across pitches. Lower variability indicates tighter mechanical repeatability and stronger command.

---

## 1. Raw Variability Components (Pitch-Level)

For each pitcher $p$ and pitch type $t$, we compute the **standard deviation** of four release-related variables.

### Release Angles
- **Horizontal Release Angle (HRA)**  
$$
\sigma_{\text{HRA},p,t}
$$

- **Vertical Release Angle (VRA)**  
$$
\sigma_{\text{VRA},p,t}
$$

### Release Points
- **Horizontal Release Location (RelSide)**  
$$
\sigma_{\text{hRel},p,t}
$$

- **Vertical Release Location (RelHeight)**  
$$
\sigma_{\text{vRel},p,t}
$$

Lower standard deviation values indicate **tighter clustering**, greater mechanical repeatability, and stronger pitch command.

---

## 2. NCAA-Wide Normalization (Z-Scores)

To make variability comparable across pitchers and pitch types, each component is normalized using the **NCAA Division I population**.

For each component  
$k \in \{\text{HRA}, \text{VRA}, \text{hRel}, \text{vRel}\}$:

$$
z_{k,p,t}
=
\frac{
\sigma_{k,p,t} - \mu_{k,\text{NCAA}}
}{
\sigma_{k,\text{NCAA}}
}
$$

Where:
- $\mu_{k,\text{NCAA}}$ is the NCAA-wide mean variability  
- $\sigma_{k,\text{NCAA}}$ is the NCAA-wide standard deviation  

**Negative z-scores indicate better-than-average command** relative to NCAA Division I pitchers.

These normalized values appear in the final table as:
- `z_sd_hra`
- `z_sd_vra`
- `z_sd_hrel`
- `z_sd_vrel`

---

## 3. Empirically Derived Linear Weights ($\beta$ Coefficients)

Following the original **Kirby Index methodology**, each normalized component is assigned a **linear weight** representing its relative importance.

The weights  
$$
\beta_1, \beta_2, \beta_3, \beta_4
$$
are learned using **regression models trained on NCAA-wide data**, where release mechanics are used to predict **pitch location consistency**.

Each weight captures how strongly a specific release component contributes to pitch command.

---

### 4. Pitch-Level TriKirby Index (Final Score)

The **TriKirby Index** for pitcher $p$ and pitch type $t$ is computed as a weighted linear combination of NCAA-normalized release variability metrics:

$$
\text{TriKirby}_{p,t}
=
\beta_1 \cdot z_{\mathrm{VRA},p,t}
+
\beta_2 \cdot z_{\mathrm{HRA},p,t}
+
\beta_3 \cdot z_{\mathrm{vRel},p,t}
+
\beta_4 \cdot z_{\mathrm{hRel},p,t}
$$

Where:

- $z_{\mathrm{VRA}}$: Z-score of **vertical release angle** variability  
- $z_{\mathrm{HRA}}$: Z-score of **horizontal release angle** variability  
- $z_{\mathrm{vRel}}$: Z-score of **vertical release location** variability  
- $z_{\mathrm{hRel}}$: Z-score of **horizontal release location** variability  
- $\beta_1, \beta_2, \beta_3, \beta_4$: empirically derived linear weights  

---

### Interpretation

- **Lower $\text{TriKirby}$ values = better command**
- Reflects tighter release direction and release point consistency
- Fully comparable across all NCAA Division I pitchers

This value appears in the table as:

- **`TriKirby_pitch`**

---

### 5. Presentation-Friendly Version (Sign-Flipped)

For visualization and coach-facing interpretation, a sign-flipped version of the score is included:

$$
\text{TriKirby\_pos}_{p,t} = -\,\text{TriKirby}_{p,t}
$$

So that:

- **Higher values = better command**
- Contains identical information with inverted sign for clarity

This value appears in the table as:

- **`TriKirby_pitch_pos`**

---

### 6. 0–1 Normalized TriKirby Score (Player-Facing)

To match the clean, intuitive presentation of the original Kirby Index, the score is normalized to a **0–1 scale**, computed **separately for each pitch type**.

---

#### Step 1 — Define pitch-type extrema

For each pitch type $t$, using all NCAA Division I pitchers:

$$
m_t = \min_{p} \left( \text{TriKirby\_pos}_{p,t} \right),
\qquad
M_t = \max_{p} \left( \text{TriKirby\_pos}_{p,t} \right)
$$

---

#### Step 2 — Min–max normalization

$$
\text{TriKirby}_{0\text{–}1,p,t}
=
\frac{\text{TriKirby\_pos}_{p,t} - m_t}{M_t - m_t}
$$

This produces a bounded, interpretable score:

- $1.00$ → best command in the NCAA sample for that pitch type  
- $0.00$ → worst command in the NCAA sample for that pitch type  

Normalizing **by pitch type** ensures fair comparison across pitches with different inherent variability (e.g., fastballs vs. breaking balls).

This value appears in the table as:

- **`TriKirby_0_1`**

---

### 7. NCAA Division I Pitch-Type Average (Benchmark)

For context, the NCAA Division I average command score for each pitch type is computed on the same 0–1 scale:

$$
\overline{\text{TriKirby}}_{0\text{–}1,t}
=
\frac{1}{N_t}
\sum_{p=1}^{N_t}
\text{TriKirby}_{0\text{–}1,p,t}
$$

This allows UCSD pitchers to directly compare their command to the NCAA Division I baseline for each pitch type.

This value appears in the table as:

- **`NCAA_D1_Avg_0_1_byPitch`**

---

### 8. Final Table Columns

| Column | Description |
|------|-------------|
| `Pitcher` | Pitcher name |
| `TaggedPitchType` | Pitch type |
| `n` | Number of pitches thrown |
| `sd_hra` | SD of horizontal release angle |
| `sd_vra` | SD of vertical release angle |
| `sd_hrel` | SD of horizontal release location |
| `sd_vrel` | SD of vertical release location |
| `z_sd_*` | NCAA-normalized variability metrics |
| `TriKirby_pitch` | Raw TriKirby score (lower = better) |
| `TriKirby_pitch_pos` | Sign-flipped score (higher = better) |
| `TriKirby_0_1` | 0–1 normalized command score |
| `NCAA_D1_Avg_0_1_byPitch` | NCAA Division I average for that pitch type |

The UC San Diego Triton Pitchers' TriKirby Index Score was then exported onto a master CSV file which includes all scores for each pitch type with a comparison to the average TriKirby Index Score across all NCAA D1 Pitchers.

## Summary

The **TriKirby Index** is a statistically grounded metric that quantifies **pitch command for all UCSD pitchers**, benchmarked against **all NCAA Division I pitching**. It extends the original Kirby Index by incorporating both **release direction** and **release location**, combined using empirically derived linear weights.

# Import Libraries + Dataset Upload

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor

DATA_DIR = Path("Trackman CSVs")   

In [2]:
csv_paths = sorted(DATA_DIR.glob("*.csv"))
print("CSV files found:", len(csv_paths))

dfs = []
for p in csv_paths:
    try:
        d = pd.read_csv(p)
        d["source_file"] = p.name
        dfs.append(d)
    except Exception as e:
        print(f"Skipped {p.name}: {e}")

all_df = pd.concat(dfs, ignore_index=True)
print("Merged rows:", len(all_df))
print("Merged cols:", len(all_df.columns))
all_df.head()

CSV files found: 974
Merged rows: 297652
Merged cols: 171


Unnamed: 0,PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,...,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence,source_file,Runner1st,Runner2nd,Runner3rd
0,1,2025-02-14,18:04:47.71,1,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,,,,20250214-AlumniField-1.csv,,,
1,2,2025-02-14,18:05:00.16,1,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,,,,20250214-AlumniField-1.csv,,,
2,3,2025-02-14,18:05:14.48,1,3,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,Medium,Medium,Low,20250214-AlumniField-1.csv,,,
3,4,2025-02-14,18:05:39.46,2,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,High,,,,,,20250214-AlumniField-1.csv,,,
4,5,2025-02-14,18:05:53.55,2,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,High,,,,,,20250214-AlumniField-1.csv,,,


# Extract The Columns Needed to Build the TriKirby Index

In [3]:
df = all_df.copy()

def pick_col(candidates):
    for c in candidates:
        if c in df.columns:
            return c
    return None

PITCHER_COL = pick_col(["Pitcher", "PitcherName", "PitcherNameFull"])
TEAM_COL    = pick_col(["PitcherTeam", "Team", "Pitcher Team", "PitcherTeamAbbrev"])
PITCHTYPE_COL = pick_col(["TaggedPitchType", "PitchType", "AutoPitchType"])

HRA_COL = pick_col(["HorzRelAngle"])
VRA_COL = pick_col(["VertRelAngle"])

# Horizontal + vertical release POINTS (usually X and Z)
RELX_COL = pick_col(["RelSide"])
RELZ_COL = pick_col(["RelHeight"])

print("Pitcher:", PITCHER_COL)
print("Team:", TEAM_COL)
print("PitchType:", PITCHTYPE_COL)
print("HRA:", HRA_COL)
print("VRA:", VRA_COL)
print("RelX:", RELX_COL)
print("RelZ:", RELZ_COL)

req = [PITCHER_COL, TEAM_COL, PITCHTYPE_COL, HRA_COL, VRA_COL, RELX_COL, RELZ_COL]
missing = [r for r in req if r is None]
if missing:
    raise ValueError("Missing required columns. Fix candidates list above. Missing: " + str(missing))

# Keep only what we need + clean
df = df.dropna(subset=[PITCHER_COL, TEAM_COL, PITCHTYPE_COL, HRA_COL, VRA_COL, RELX_COL, RELZ_COL]).copy()
df[PITCHTYPE_COL] = df[PITCHTYPE_COL].astype(str).str.strip()
df[PITCHER_COL]   = df[PITCHER_COL].astype(str).str.strip()
df[TEAM_COL]      = df[TEAM_COL].astype(str).str.strip()

df.head()

Pitcher: Pitcher
Team: PitcherTeam
PitchType: TaggedPitchType
HRA: HorzRelAngle
VRA: VertRelAngle
RelX: RelSide
RelZ: RelHeight


Unnamed: 0,PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,...,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence,source_file,Runner1st,Runner2nd,Runner3rd
0,1,2025-02-14,18:04:47.71,1,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,,,,20250214-AlumniField-1.csv,,,
1,2,2025-02-14,18:05:00.16,1,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,,,,20250214-AlumniField-1.csv,,,
2,3,2025-02-14,18:05:14.48,1,3,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Isom-McCall, Charlie",...,High,,,Medium,Medium,Low,20250214-AlumniField-1.csv,,,
3,4,2025-02-14,18:05:39.46,2,1,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,High,,,,,,20250214-AlumniField-1.csv,,,
4,5,2025-02-14,18:05:53.55,2,2,"Stuprich, Brennan",1000099000.0,Right,SOU_LIO,"Edwards, Kameron",...,High,,,,,,20250214-AlumniField-1.csv,,,


# UC San Diego Pitching

In [4]:
# See the most common team codes (so you can confirm UCSD’s exact label)
df[TEAM_COL].value_counts().head(30)

PitcherTeam
CSD_TRI    5871
CAL_FUL    5647
SAN_GAU    5472
CAL_MUS    5457
CSU_BAK    2358
CAL_MAT    2287
LON_DIR    2184
LOY_LIO    2002
ARI_SUN    1976
CAM_CAM    1947
SAN_TOR    1937
CAL_ANT    1883
STM_GAE    1824
SAC_HOR    1761
FLO_PAN    1736
STA_CAR    1727
CAL_BEA    1715
SAN_BRO    1710
WCC        1677
VIR_TEC    1625
TEX_RAI    1623
SOU_GAM    1618
ELO_PHO    1617
CIN_BEA    1610
NOR_TAR    1568
LIB_FLA    1567
NEV_WOL    1546
JAC_GAM    1542
PEP_WAV    1509
ORE_BEA    1507
Name: count, dtype: int64

In [5]:
UCSD_CODE = "CSD_TRI"   # <-- based on what you saw in your output

ucsd_pitchers = sorted(df.loc[df[TEAM_COL] == UCSD_CODE, PITCHER_COL].unique())
print("UCSD pitchers found:", len(ucsd_pitchers))
ucsd_pitchers[:30]


UCSD pitchers found: 20


['Cazares, Julian',
 'Custer, Julian',
 'Dalquist, Matthew',
 'Davidson, Garrett',
 'Ernisse, Zach',
 'Gregson, Niccolas',
 'Hasegawa, Sam',
 'Huy, Nathan',
 'King, Devon',
 'Marchetti, Landon',
 'Murdock, Steele',
 'Nickerson, Trevor',
 'Patterson, Garrett',
 'Pelzman, Harry',
 'Rector, Trevor',
 'Remmers, Ethan',
 'Ries, Nathan',
 'Seid, Spencer',
 'Villar, Jake',
 'Weber, Chapman']

# NCAA-Wide Spread Table

In [6]:
MIN_PITCHES_PER_TYPE = 25  # adjust if you want (prevents noisy tiny samples)

g = df.groupby([PITCHER_COL, PITCHTYPE_COL])

spread = g.agg(
    n_pitches=(PITCHTYPE_COL, "size"),
    sd_hra=(HRA_COL, "std"),
    sd_vra=(VRA_COL, "std"),
    sd_relx=(RELX_COL, "std"),
    sd_relz=(RELZ_COL, "std"),
).reset_index()

spread = spread[spread["n_pitches"] >= MIN_PITCHES_PER_TYPE].dropna()
spread.head()

Unnamed: 0,Pitcher,TaggedPitchType,n_pitches,sd_hra,sd_vra,sd_relx,sd_relz
3,"Abbadessa, Jude",Fastball,30,0.872953,0.657802,0.156656,0.175217
11,"Abell, Mark",Fastball,45,0.844191,1.183978,0.098417,0.087015
12,"Abell, Mark",Slider,25,1.091851,1.095487,0.082446,0.086948
14,"Abernathy, Jackson",Fastball,25,1.026161,0.876033,0.128931,0.072267
20,"Abler, Andrew",Fastball,33,0.876774,0.921468,0.176832,0.126333


# NCAA Normalized Z-Scores Per Pitch Type

In [7]:
def z_by_pitchtype(series):
    s = series.astype(float)
    return (s - s.mean()) / s.std(ddof=1)

for col in ["sd_hra", "sd_vra", "sd_relx", "sd_relz"]:
    spread[f"z_{col}"] = spread.groupby(PITCHTYPE_COL)[col].transform(z_by_pitchtype)

spread[[PITCHER_COL, PITCHTYPE_COL, "n_pitches", "sd_hra", "sd_vra", "sd_relx", "sd_relz",
        "z_sd_hra", "z_sd_vra", "z_sd_relx", "z_sd_relz"]].head()

Unnamed: 0,Pitcher,TaggedPitchType,n_pitches,sd_hra,sd_vra,sd_relx,sd_relz,z_sd_hra,z_sd_vra,z_sd_relx,z_sd_relz
3,"Abbadessa, Jude",Fastball,30,0.872953,0.657802,0.156656,0.175217,-0.20539,-1.80299,-0.250714,0.376734
11,"Abell, Mark",Fastball,45,0.844191,1.183978,0.098417,0.087015,-0.33836,1.120145,-0.59425,-0.298866
12,"Abell, Mark",Slider,25,1.091851,1.095487,0.082446,0.086948,0.187913,0.187639,-0.636583,-0.354143
14,"Abernathy, Jackson",Fastball,25,1.026161,0.876033,0.128931,0.072267,0.502905,-0.590624,-0.414257,-0.411836
20,"Abler, Andrew",Fastball,33,0.876774,0.921468,0.176832,0.126333,-0.187725,-0.338215,-0.1317,0.002297


# Regression Model for Beta Weights

In [8]:
# Select Features + Target (Plate Location)

# Features (predictors)
FEATURES = [
    "VertRelAngle",   # VRA
    "HorzRelAngle",   # HRA
    "RelHeight",      # vRel
    "RelSide"         # hRel
]

# Targets (plate location)
TARGETS = [
    "PlateLocHeight",  # Z location
    "PlateLocSide"     # X location
]

model_df = df.dropna(subset=FEATURES + TARGETS)

X = model_df[FEATURES]
y = model_df[TARGETS]


In [9]:
# Test/Train Split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [10]:
# Linear Regression (Kirby-style)

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

y_pred = lin_reg.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))


MSE: 0.6461756986898715
R²: 0.37601933497068646


In [11]:
# Extract Beta Weights

beta_df = pd.DataFrame(
    lin_reg.coef_,
    columns=FEATURES,
    index=["PlateSide", "PlateHeight"]
)

beta_df

Unnamed: 0,VertRelAngle,HorzRelAngle,RelHeight,RelSide
PlateSide,0.387337,-0.032768,0.695292,-0.038802
PlateHeight,-0.057397,0.515832,-0.064188,0.693014


In [12]:
beta_weights = beta_df.abs().mean(axis=0)
beta_weights = beta_weights / beta_weights.sum()  # normalize

beta_weights

VertRelAngle    0.178994
HorzRelAngle    0.220798
RelHeight       0.305671
RelSide         0.294537
dtype: float64

# Random Forest (Feature Importance Check)

In [13]:
rf = RandomForestRegressor(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)

rf_importance = pd.Series(
    rf.feature_importances_,
    index=FEATURES
).sort_values(ascending=False)

rf_importance

HorzRelAngle    0.296633
VertRelAngle    0.279754
RelSide         0.224334
RelHeight       0.199279
dtype: float64

# Adding Linear Weights

In [14]:
# ==============================
# Adding Linear Weights (TriKirby)
# ==============================

# 1. Inspect available z-score columns (sanity check)
print("Available columns:")
print([c for c in spread.columns if "z_" in c])

# EXPECTED columns (adjust if names differ):
# z_sd_vra   -> Vertical Release Angle SD (z-score)
# z_sd_hra   -> Horizontal Release Angle SD (z-score)
# z_sd_vrel  -> Vertical Release Point SD (z-score)
# z_sd_hrel  -> Horizontal Release Point SD (z-score)

# 2. Define beta weights from your RF feature importances
# (already computed earlier as rf_importance)

betas = {
    "z_sd_hra": rf_importance["HorzRelAngle"],
    "z_sd_vra": rf_importance["VertRelAngle"],
    "z_sd_hrel": rf_importance["RelSide"],
    "z_sd_vrel": rf_importance["RelHeight"]
}

betas

Available columns:
['z_sd_hra', 'z_sd_vra', 'z_sd_relx', 'z_sd_relz']


{'z_sd_hra': 0.29663344570934724,
 'z_sd_vra': 0.2797536367753751,
 'z_sd_hrel': 0.2243335882844589,
 'z_sd_vrel': 0.19927932923081876}

In [15]:
import numpy as np

# --- 1) Identify the correct release point columns in YOUR CSV ---
# Trackman commonly uses RelHeight (vertical release) and RelSide (horizontal release)
CAND_VREL = ["RelHeight", "ReleaseHeight", "release_pos_z", "RelZ", "vRel"]
CAND_HREL = ["RelSide", "ReleaseSide", "release_pos_x", "RelX", "hRel"]

vrel_col = next((c for c in CAND_VREL if c in df.columns), None)
hrel_col = next((c for c in CAND_HREL if c in df.columns), None)

print("Using vRel column:", vrel_col)
print("Using hRel column:", hrel_col)

if vrel_col is None or hrel_col is None:
    raise ValueError(
        f"Could not find release point columns. "
        f"Columns in df include: {list(df.columns)[:40]} ..."
    )

# --- 2) Compute SDs per (Pitcher, PitchType) ---
PITCHER_COL = "Pitcher"
PITCHTYPE_COL = "TaggedPitchType"

spread = (
    df.groupby([PITCHER_COL, PITCHTYPE_COL])
      .agg(
          sd_vra=("VertRelAngle", lambda s: s.std(ddof=1)),
          sd_hra=("HorzRelAngle", lambda s: s.std(ddof=1)),
          sd_vrel=(vrel_col, lambda s: s.std(ddof=1)),
          sd_hrel=(hrel_col, lambda s: s.std(ddof=1)),
          n=("VertRelAngle", "size")
      )
      .reset_index()
)

# (Optional but recommended) drop tiny sample sizes to reduce noise
MIN_PITCHES_PER_TYPE = 25
spread = spread[spread["n"] >= MIN_PITCHES_PER_TYPE].copy()

# --- 3) Z-score each SD metric within pitch type (NCAA-wide benchmark) ---
def z_by_pitchtype(series):
    mu = series.mean()
    sd = series.std(ddof=0)
    return (series - mu) / sd if sd != 0 else np.nan

spread["z_sd_vra"]  = spread.groupby(PITCHTYPE_COL)["sd_vra"].transform(z_by_pitchtype)
spread["z_sd_hra"]  = spread.groupby(PITCHTYPE_COL)["sd_hra"].transform(z_by_pitchtype)
spread["z_sd_vrel"] = spread.groupby(PITCHTYPE_COL)["sd_vrel"].transform(z_by_pitchtype)
spread["z_sd_hrel"] = spread.groupby(PITCHTYPE_COL)["sd_hrel"].transform(z_by_pitchtype)

spread.head()

Using vRel column: RelHeight
Using hRel column: RelSide


Unnamed: 0,Pitcher,TaggedPitchType,sd_vra,sd_hra,sd_vrel,sd_hrel,n,z_sd_vra,z_sd_hra,z_sd_vrel,z_sd_hrel
3,"Abbadessa, Jude",Fastball,0.657802,0.872953,0.175217,0.156656,30,-1.803487,-0.205446,0.376838,-0.250783
11,"Abell, Mark",Fastball,1.183978,0.844191,0.087015,0.098417,45,1.120453,-0.338453,-0.298949,-0.594414
12,"Abell, Mark",Slider,1.095487,1.091851,0.086948,0.082446,25,0.187755,0.188029,-0.354361,-0.636975
14,"Abernathy, Jackson",Fastball,0.876033,1.026161,0.072267,0.128931,25,-0.590787,0.503043,-0.411949,-0.414371
20,"Abler, Andrew",Fastball,0.921468,0.876774,0.126333,0.176832,33,-0.338308,-0.187777,0.002298,-0.131737


In [16]:
# Example betas (replace with your learned beta_weights if you have them)
betas = {
    "z_sd_vra":  rf_importance["VertRelAngle"],
    "z_sd_hra":  rf_importance["HorzRelAngle"],
    "z_sd_vrel": rf_importance["RelHeight"],
    "z_sd_hrel": rf_importance["RelSide"],
}

# Weighted sum (lower = better command / tighter release)
spread["TriKirby_pitch"] = (
    betas["z_sd_vra"]  * spread["z_sd_vra"]  +
    betas["z_sd_hra"]  * spread["z_sd_hra"]  +
    betas["z_sd_vrel"] * spread["z_sd_vrel"] +
    betas["z_sd_hrel"] * spread["z_sd_hrel"]
)

# Presentation-friendly (higher = better)
spread["TriKirby_pitch_pos"] = -spread["TriKirby_pitch"]

spread.sort_values("TriKirby_pitch").head(10)

Unnamed: 0,Pitcher,TaggedPitchType,sd_vra,sd_hra,sd_vrel,sd_hrel,n,z_sd_vra,z_sd_hra,z_sd_vrel,z_sd_hrel,TriKirby_pitch,TriKirby_pitch_pos
2581,"Cohen, Daniel",Fastball,0.623062,0.437876,0.081799,0.092364,25,-1.996537,-2.217397,-0.338917,-0.630128,-1.425191,1.425191
8668,"Macchiarola, Danny",Cutter,0.492381,0.700035,0.085795,0.087612,34,-2.679833,-1.106085,-0.290954,-1.058103,-1.373144,1.373144
7882,"Kuromoto, Matthew",Curveball,0.991308,0.566267,0.064338,0.105688,29,-1.084703,-2.077509,-0.787369,-0.77805,-1.251157,1.251157
1312,"Bouchard, Brady",Sinker,0.668204,0.633537,0.04533,0.12123,25,-1.181103,-1.894733,-1.265203,-0.462031,-1.248237,1.248237
3189,"Darden, Ethan",TwoSeamFastBall,0.673058,0.583988,0.069583,0.161503,26,-1.380292,-2.216053,-0.659289,-0.076178,-1.191969,1.191969
8860,"Marks, Caleb",Curveball,0.954597,0.683336,0.044784,0.111746,28,-1.228493,-1.528318,-1.091406,-0.715473,-1.175025,1.175025
4375,"Finley, Leighton",FourSeamFastBall,0.589924,0.779877,0.054985,0.096043,32,-2.485197,-0.489411,-0.801383,-0.668491,-1.150083,1.150083
3657,"Douglas, Ryan",Slider,0.677893,0.702816,0.055525,0.10063,25,-1.644542,-1.481278,-0.590006,-0.540882,-1.138377,1.138377
9113,"Mattox, Seth",Sinker,0.603165,0.788082,0.060191,0.067876,28,-1.601767,-0.878707,-0.963231,-1.028875,-1.131517,1.131517
13539,"Stagliano, Dominic",Slider,0.823835,0.58615,0.053041,0.084023,27,-1.004185,-1.981878,-0.608638,-0.628645,-1.131131,1.131131


# TriKirby Index: UCSD Spread Per Pitch Type

In [17]:
ucsd_spread = spread[spread[PITCHER_COL].isin(ucsd_pitchers)].copy()
print("UCSD rows in spread:", len(ucsd_spread))
sorted(ucsd_spread[PITCHTYPE_COL].unique())

UCSD rows in spread: 48


['ChangeUp',
 'Curveball',
 'Cutter',
 'Fastball',
 'FourSeamFastBall',
 'Sinker',
 'Slider',
 'Sweeper']

In [18]:
spread["TriKirby_0_1"] = (
    (spread["TriKirby_pitch_pos"] - spread["TriKirby_pitch_pos"].min()) /
    (spread["TriKirby_pitch_pos"].max() - spread["TriKirby_pitch_pos"].min())
)

In [27]:
import numpy as np
import pandas as pd

# ============================================
# 0–1 normalization BY PITCH TYPE (rounded)
# Higher = better command (player-facing)
# ============================================

# --- Choose base score (higher = better) ---
if "TriKirby_pitch_pos" in spread.columns:
    base_col = "TriKirby_pitch_pos"
elif "TriKirby_pitch" in spread.columns:
    base_col = "TriKirby_pitch"
else:
    raise ValueError(
        "Need TriKirby_pitch_pos or TriKirby_pitch in `spread` before running this cell."
    )

# --- Clean up if re-running ---
for df_name in ["spread", "ucsd_spread"]:
    df = globals().get(df_name)
    if df is not None:
        drop_cols = [
            c for c in [
                "_score_for_norm_pt",
                "pt_min", "pt_max",
                "TriKirby_0_1_byPitch",
                "NCAA_D1_Avg_0_1_byPitch"
            ] if c in df.columns
        ]
        if drop_cols:
            globals()[df_name] = df.drop(columns=drop_cols)

# --- Flip sign if needed so higher = better ---
if base_col == "TriKirby_pitch":
    spread["_score_for_norm_pt"] = -spread["TriKirby_pitch"]
    ucsd_spread["_score_for_norm_pt"] = -ucsd_spread["TriKirby_pitch"]
else:
    spread["_score_for_norm_pt"] = spread["TriKirby_pitch_pos"]
    ucsd_spread["_score_for_norm_pt"] = ucsd_spread["TriKirby_pitch_pos"]

# --- Min–max normalize WITHIN pitch type ---
spread["pt_min"] = spread.groupby(PITCHTYPE_COL)["_score_for_norm_pt"].transform("min")
spread["pt_max"] = spread.groupby(PITCHTYPE_COL)["_score_for_norm_pt"].transform("max")

den = spread["pt_max"] - spread["pt_min"]
spread["TriKirby_0_1_byPitch"] = np.where(
    den > 0,
    (spread["_score_for_norm_pt"] - spread["pt_min"]) / den,
    np.nan
)

# --- Apply NCAA scaling to UCSD players ---
ucsd_spread = ucsd_spread.merge(
    spread[[PITCHTYPE_COL, "pt_min", "pt_max"]].drop_duplicates(),
    on=PITCHTYPE_COL,
    how="left"
)

den_ucsd = ucsd_spread["pt_max"] - ucsd_spread["pt_min"]
ucsd_spread["TriKirby_0_1_byPitch"] = np.where(
    den_ucsd > 0,
    (ucsd_spread["_score_for_norm_pt"] - ucsd_spread["pt_min"]) / den_ucsd,
    np.nan
)

# --- NCAA D1 average per pitch type (0–1 scale) ---
ncaa_pitch_avg = (
    spread
    .groupby(PITCHTYPE_COL, as_index=False)["TriKirby_0_1_byPitch"]
    .mean()
    .rename(columns={"TriKirby_0_1_byPitch": "NCAA_D1_Avg_0_1_byPitch"})
)

ucsd_spread = ucsd_spread.merge(
    ncaa_pitch_avg,
    on=PITCHTYPE_COL,
    how="left"
)

# --- ROUND player-facing values to hundredths ---
ROUND_COLS = ["TriKirby_0_1_byPitch", "NCAA_D1_Avg_0_1_byPitch"]
for c in ROUND_COLS:
    if c in spread.columns:
        spread[c] = spread[c].round(2)
    if c in ucsd_spread.columns:
        ucsd_spread[c] = ucsd_spread[c].round(2)

print("✅ TriKirby normalized to 0–1 by pitch type and rounded to hundredths")


✅ TriKirby normalized to 0–1 by pitch type and rounded to hundredths


In [36]:
from IPython.display import display
import pandas as pd
import numpy as np

# -------------------------------
# Settings
# -------------------------------
SCORE_COL = "TriKirby_0_1_byPitch"
AVG_COL   = "NCAA_D1_Avg_0_1_byPitch"
COUNT_COL_CANDIDATES = ["n_pitches", "n", "count", "N"]

# pick pitch-count column if it exists
COUNT_COL = next((c for c in COUNT_COL_CANDIDATES if c in ucsd_spread.columns), None)

# -------------------------------
# 1) Detailed tables: one per pitch type (sorted best -> worst)
# -------------------------------
print("=== UCSD Pitch-Type Tables (best → worst command) ===")
for pt in sorted(ucsd_spread[PITCHTYPE_COL].dropna().unique()):
    df_pt = ucsd_spread[ucsd_spread[PITCHTYPE_COL].astype(str).str.lower() == str(pt).lower()].copy()

    # keep only the columns that exist
    cols = [PITCHER_COL, PITCHTYPE_COL]
    if COUNT_COL is not None: cols += [COUNT_COL]
    cols += [SCORE_COL, AVG_COL]

    # (optional) keep diagnostics if you want them visible
    diag_cols = [
        "TriKirby_pitch", "TriKirby_pitch_pos",
        "sd_hra","sd_vra","sd_hrel","sd_vrel",
        "z_sd_hra","z_sd_vra","z_sd_hrel","z_sd_vrel"
    ]
    cols += [c for c in diag_cols if c in df_pt.columns]

    cols = [c for c in cols if c in df_pt.columns]

    df_pt = df_pt.sort_values(SCORE_COL, ascending=False)  # higher = better
    df_pt[cols] = df_pt[cols].copy()

    print(f"\n--- {pt} ---")
    display(df_pt[cols].reset_index(drop=True))

# -------------------------------
# 2) Player-friendly table: only scores + NCAA avg (easy to read)
#    One table per pitch type (best -> worst)
# -------------------------------
print("\n=== Player-Friendly Tables (ONLY score + NCAA avg) ===")
for pt in sorted(ucsd_spread[PITCHTYPE_COL].dropna().unique()):
    df_pt = ucsd_spread[ucsd_spread[PITCHTYPE_COL].astype(str).str.lower() == str(pt).lower()].copy()

    cols_simple = [PITCHER_COL, PITCHTYPE_COL]
    if COUNT_COL is not None: cols_simple += [COUNT_COL]
    cols_simple += [SCORE_COL, AVG_COL]
    cols_simple = [c for c in cols_simple if c in df_pt.columns]

    df_pt = df_pt.sort_values(SCORE_COL, ascending=False)

    # rename for readability
    df_simple = df_pt[cols_simple].rename(columns={
        SCORE_COL: "TriKirby Index Score",
        AVG_COL: "NCAA D1 Average",
        COUNT_COL: "Pitches" if COUNT_COL is not None else COUNT_COL
    }).reset_index(drop=True)

    print(f"\n--- {pt} (Player-Friendly) ---")
    display(df_simple)

=== UCSD Pitch-Type Tables (best → worst command) ===

--- ChangeUp ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Dalquist, Matthew",ChangeUp,94,0.89,0.84,-0.333932,0.333932,0.864645,0.956062,0.142175,0.103596,-0.389044,-0.426965,-0.281876,-0.179894
1,"Davidson, Garrett",ChangeUp,155,0.86,0.84,-0.185074,0.185074,0.96088,0.9485,0.181891,0.111599,-0.030938,-0.464396,-0.096029,-0.12263
2,"Villar, Jake",ChangeUp,116,0.86,0.84,-0.135505,0.135505,0.876966,1.102005,0.118472,0.108936,-0.343197,0.295438,-0.392794,-0.141681
3,"Murdock, Steele",ChangeUp,69,0.8,0.84,0.208046,-0.208046,0.996279,1.077641,0.360613,0.102905,0.100788,0.174838,0.74029,-0.184839
4,"Seid, Spencer",ChangeUp,68,0.78,0.84,0.346894,-0.346894,0.913161,1.309879,0.163261,0.184375,-0.208507,1.324391,-0.183204,0.398132
5,"Ries, Nathan",ChangeUp,33,0.68,0.84,1.005943,-1.005943,1.161489,1.647281,0.187865,0.108564,0.71556,2.994491,-0.068072,-0.144346



--- Curveball ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Dalquist, Matthew",Curveball,123,0.81,0.57,-0.704088,0.704088,0.796939,1.022684,0.126453,0.110676,-0.995384,-0.961808,-0.563571,-0.066874
1,"Gregson, Niccolas",Curveball,55,0.62,0.57,-0.147311,0.147311,0.86475,1.417225,0.165573,0.091134,-0.677268,0.583539,-0.159499,-0.370723
2,"Ries, Nathan",Curveball,47,0.34,0.57,0.669579,-0.669579,1.069294,1.825453,0.178514,0.10887,0.282285,2.182494,-0.025829,-0.094957



--- Cutter ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Hasegawa, Sam",Cutter,87,0.63,0.62,-0.046965,0.046965,0.731625,1.090688,0.216512,0.060794,-0.907029,0.590663,1.03731,-0.882448
1,"Seid, Spencer",Cutter,62,0.62,0.62,0.008349,-0.008349,0.883291,1.020317,0.117072,0.112139,0.048651,0.205994,-0.579193,0.33231
2,"King, Devon",Cutter,121,0.48,0.62,0.507345,-0.507345,0.940734,1.107507,0.194821,0.106786,0.410616,0.6826,0.684695,0.205655



--- Fastball ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Dalquist, Matthew",Fastball,422,0.87,0.84,-0.251302,0.251302,0.829073,0.870258,0.262365,0.100111,-0.40836,-0.622877,0.37294,-0.198612
1,"Seid, Spencer",Fastball,494,0.85,0.84,-0.109468,0.109468,0.880645,1.018761,0.131826,0.108638,-0.169877,0.20235,-0.39729,-0.133278
2,"Gregson, Niccolas",Fastball,215,0.85,0.84,-0.092595,0.092595,0.774337,1.122305,0.159905,0.085431,-0.661481,0.777738,-0.231614,-0.311088
3,"Pelzman, Harry",Fastball,101,0.85,0.84,-0.068727,0.068727,0.973987,0.969858,0.148982,0.086379,0.261774,-0.069404,-0.296059,-0.303823
4,"Marchetti, Landon",Fastball,112,0.84,0.84,0.008016,-0.008016,0.890595,1.103306,0.108839,0.11049,-0.123861,0.672163,-0.53292,-0.119085
5,"King, Devon",Fastball,86,0.84,0.84,1.9e-05,-1.9e-05,0.741643,1.141869,0.197807,0.122682,-0.812667,0.886454,-0.007979,-0.025672
6,"Ries, Nathan",Fastball,264,0.84,0.84,-0.050367,0.050367,0.903905,0.970331,0.186615,0.128261,-0.062314,-0.066774,-0.074013,0.01707
7,"Weber, Chapman",Fastball,241,0.84,0.84,0.012183,-0.012183,0.885827,1.088168,0.165135,0.084114,-0.145913,0.588038,-0.200754,-0.321175
8,"Hasegawa, Sam",Fastball,360,0.83,0.84,0.03561,-0.03561,0.808797,1.132852,0.218967,0.076497,-0.502123,0.836345,0.116874,-0.379534
9,"Ernisse, Zach",Fastball,71,0.83,0.84,0.062059,-0.062059,0.734747,1.26326,0.126732,0.10753,-0.84456,1.561022,-0.427341,-0.141768



--- FourSeamFastBall ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Dalquist, Matthew",FourSeamFastBall,25,0.93,0.81,-0.728242,0.728242,0.70379,0.861074,0.116827,0.060494,-0.902657,-0.7838,-0.458504,-0.694276
1,"Murdock, Steele",FourSeamFastBall,31,0.91,0.81,-0.629685,0.629685,0.771384,0.83568,0.116487,0.069528,-0.53554,-0.943141,-0.461934,-0.518628
2,"Seid, Spencer",FourSeamFastBall,48,0.87,0.81,-0.375724,0.375724,0.793222,0.965126,0.110375,0.070923,-0.416934,-0.130896,-0.523686,-0.49151
3,"Gregson, Niccolas",FourSeamFastBall,37,0.84,0.81,-0.176367,0.176367,0.575295,1.290184,0.142517,0.04692,-1.600535,1.908762,-0.198939,-0.958199
4,"Weber, Chapman",FourSeamFastBall,38,0.75,0.81,0.346788,-0.346788,0.977176,1.165843,0.162968,0.059208,0.582158,1.128549,0.007681,-0.719282



--- Sinker ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Villar, Jake",Sinker,47,0.71,0.67,-0.140898,0.140898,0.660449,0.969095,0.186896,0.132731,-1.717805,0.76504,0.23561,0.510752
1,"Remmers, Ethan",Sinker,25,0.49,0.67,0.678673,-0.678673,0.79791,0.70619,0.688351,0.091257,-0.8141,-0.935408,5.563168,-0.331993



--- Slider ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Dalquist, Matthew",Slider,227,0.89,0.82,-0.434601,0.434601,0.755876,0.977729,0.269615,0.100936,-1.253601,-0.328937,0.352121,-0.249461
1,"Ries, Nathan",Slider,64,0.86,0.82,-0.201698,0.201698,0.893721,1.141969,0.155047,0.09536,-0.662125,0.391708,-0.253313,-0.291275
2,"Weber, Chapman",Slider,30,0.82,0.82,0.060467,-0.060467,0.79921,1.43191,0.145387,0.120797,-1.067658,1.663894,-0.304364,-0.100517
3,"Seid, Spencer",Slider,55,0.81,0.82,0.067336,-0.067336,1.022567,1.150174,0.128936,0.179619,-0.109258,0.427708,-0.3913,0.340602
4,"Murdock, Steele",Slider,194,0.8,0.82,0.134811,-0.134811,1.030566,1.046989,0.393601,0.092759,-0.074936,-0.025043,1.007332,-0.310783
5,"Hasegawa, Sam",Slider,74,0.79,0.82,0.256531,-0.256531,1.048252,1.297261,0.216355,0.094183,0.00095,1.073089,0.070669,-0.300104
6,"Villar, Jake",Slider,142,0.78,0.82,0.293618,-0.293618,0.991485,1.420529,0.158146,0.112274,-0.242628,1.613958,-0.236936,-0.164433
7,"King, Devon",Slider,71,0.77,0.82,0.34805,-0.34805,1.055804,1.202855,0.241538,0.206555,0.033357,0.658858,0.20375,0.5426
8,"Davidson, Garrett",Slider,32,0.76,0.82,0.438272,-0.438272,1.189681,1.148081,0.306202,0.146596,0.60781,0.418525,0.545467,0.092957
9,"Cazares, Julian",Slider,42,0.65,0.82,1.162013,-1.162013,1.544702,0.939791,0.761661,0.138293,2.131166,-0.495401,2.952354,0.030689



--- Sweeper ---


Unnamed: 0,Pitcher,TaggedPitchType,n,TriKirby_0_1_byPitch,NCAA_D1_Avg_0_1_byPitch,TriKirby_pitch,TriKirby_pitch_pos,sd_hra,sd_vra,sd_hrel,sd_vrel,z_sd_hra,z_sd_vra,z_sd_hrel,z_sd_vrel
0,"Villar, Jake",Sweeper,48,0.88,0.64,-0.543081,0.543081,0.870896,1.025608,0.136077,0.081095,-1.324842,0.116704,-0.284717,-0.596473



=== Player-Friendly Tables (ONLY score + NCAA avg) ===

--- ChangeUp (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Dalquist, Matthew",ChangeUp,94,0.89,0.84
1,"Davidson, Garrett",ChangeUp,155,0.86,0.84
2,"Villar, Jake",ChangeUp,116,0.86,0.84
3,"Murdock, Steele",ChangeUp,69,0.8,0.84
4,"Seid, Spencer",ChangeUp,68,0.78,0.84
5,"Ries, Nathan",ChangeUp,33,0.68,0.84



--- Curveball (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Dalquist, Matthew",Curveball,123,0.81,0.57
1,"Gregson, Niccolas",Curveball,55,0.62,0.57
2,"Ries, Nathan",Curveball,47,0.34,0.57



--- Cutter (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Hasegawa, Sam",Cutter,87,0.63,0.62
1,"Seid, Spencer",Cutter,62,0.62,0.62
2,"King, Devon",Cutter,121,0.48,0.62



--- Fastball (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Dalquist, Matthew",Fastball,422,0.87,0.84
1,"Seid, Spencer",Fastball,494,0.85,0.84
2,"Gregson, Niccolas",Fastball,215,0.85,0.84
3,"Pelzman, Harry",Fastball,101,0.85,0.84
4,"Marchetti, Landon",Fastball,112,0.84,0.84
5,"King, Devon",Fastball,86,0.84,0.84
6,"Ries, Nathan",Fastball,264,0.84,0.84
7,"Weber, Chapman",Fastball,241,0.84,0.84
8,"Hasegawa, Sam",Fastball,360,0.83,0.84
9,"Ernisse, Zach",Fastball,71,0.83,0.84



--- FourSeamFastBall (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Dalquist, Matthew",FourSeamFastBall,25,0.93,0.81
1,"Murdock, Steele",FourSeamFastBall,31,0.91,0.81
2,"Seid, Spencer",FourSeamFastBall,48,0.87,0.81
3,"Gregson, Niccolas",FourSeamFastBall,37,0.84,0.81
4,"Weber, Chapman",FourSeamFastBall,38,0.75,0.81



--- Sinker (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Villar, Jake",Sinker,47,0.71,0.67
1,"Remmers, Ethan",Sinker,25,0.49,0.67



--- Slider (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Dalquist, Matthew",Slider,227,0.89,0.82
1,"Ries, Nathan",Slider,64,0.86,0.82
2,"Weber, Chapman",Slider,30,0.82,0.82
3,"Seid, Spencer",Slider,55,0.81,0.82
4,"Murdock, Steele",Slider,194,0.8,0.82
5,"Hasegawa, Sam",Slider,74,0.79,0.82
6,"Villar, Jake",Slider,142,0.78,0.82
7,"King, Devon",Slider,71,0.77,0.82
8,"Davidson, Garrett",Slider,32,0.76,0.82
9,"Cazares, Julian",Slider,42,0.65,0.82



--- Sweeper (Player-Friendly) ---


Unnamed: 0,Pitcher,TaggedPitchType,Pitches,TriKirby Index Score,NCAA D1 Average
0,"Villar, Jake",Sweeper,48,0.88,0.64


In [39]:
import os

# directory to save CSVs
OUT_DIR = "TriKirby_Player_CSVs"
os.makedirs(OUT_DIR, exist_ok=True)

print(f"Saving player-facing CSVs to: {OUT_DIR}/")

for pt in sorted(ucsd_spread[PITCHTYPE_COL].dropna().unique()):
    df_pt = ucsd_spread[
        ucsd_spread[PITCHTYPE_COL].astype(str).str.lower() == str(pt).lower()
    ].copy()

    if df_pt.empty:
        continue

    # columns to export (player-friendly)
    export_cols = [
        PITCHER_COL,
        PITCHTYPE_COL,
        COUNT_COL if COUNT_COL in df_pt.columns else None,
        "TriKirby_0_1_byPitch",
        "NCAA_D1_Avg_0_1_byPitch"
    ]
    export_cols = [c for c in export_cols if c in df_pt.columns]

    df_out = (
        df_pt[export_cols]
        .sort_values("TriKirby_0_1_byPitch", ascending=False)  # best → worst
        .rename(columns={
            "TriKirby_0_1_byPitch": "TriKirby Index (0–1)",
            "NCAA_D1_Avg_0_1_byPitch": "NCAA D1 Average (0–1)",
            COUNT_COL: "Pitches"
        })
        .reset_index(drop=True)
    )

    # round for presentation
    for c in ["TriKirby Index (0–1)", "NCAA D1 Average (0–1)"]:
        if c in df_out.columns:
            df_out[c] = df_out[c].round(2)

    # filename
    fname = f"UCSD_TriKirby_{pt.replace(' ', '')}.csv"
    path = os.path.join(OUT_DIR, fname)

    df_out.to_csv(path, index=False)
    print(f"✔ Saved {fname}")

Saving player-facing CSVs to: TriKirby_Player_CSVs/
✔ Saved UCSD_TriKirby_ChangeUp.csv
✔ Saved UCSD_TriKirby_Curveball.csv
✔ Saved UCSD_TriKirby_Cutter.csv
✔ Saved UCSD_TriKirby_Fastball.csv
✔ Saved UCSD_TriKirby_FourSeamFastBall.csv
✔ Saved UCSD_TriKirby_Sinker.csv
✔ Saved UCSD_TriKirby_Slider.csv
✔ Saved UCSD_TriKirby_Sweeper.csv
