
# NFL Props — Weekly Data & Model Explainer

This notebook documents the weekly **props pipeline** end‑to‑end:

1. **Load** inputs (props feed, weekly model params, merged edges).
2. **Normalize** markets and join keys (shared mapping via `common_markets`).
3. Explain the **model families**:
   - Normal O/U (yards, attempts, receptions)
   - Poisson O/U (counts like pass TDs, interceptions)
   - Binary Anytime TD
4. **Diagnostics**: coverage by market, unmatched reasons, and sample rows.
5. **Examples**: per‑player probability for selected markets.

> Run this from the repo root so the relative paths resolve.


## 0) Parameters

In [1]:

SEASON = 2025
WEEK   = 2

PATH_PROPS  = "data/props/latest_all_props.csv"
PATH_PARAMS = f"data/props/params_week{WEEK}.csv"
PATH_EDGES  = f"data/props/props_with_model_week{WEEK}.csv"

print("Using:")
print("  props :", PATH_PROPS)
print("  params:", PATH_PARAMS)
print("  edges :", PATH_EDGES)


Using:
  props : data/props/latest_all_props.csv
  params: data/props/params_week2.csv
  edges : data/props/props_with_model_week2.csv


## 1) Imports & helpers

In [2]:

import math, re, json, itertools
from pathlib import Path
import numpy as np
import pandas as pd

# Try to use repo's shared normalizer; fall back to minimal identity mapping.
try:
    from common_markets import standardize_input as _std_input  # repo helper
    HAVE_STD = True
except Exception as e:
    HAVE_STD = False
    def _std_input(df: pd.DataFrame) -> pd.DataFrame:
        df = df.copy()
        if 'market_std' not in df.columns and 'market' in df.columns:
            df['market_std'] = df['market'].astype(str).str.lower().str.strip()
        elif 'market_std' not in df.columns:
            df['market_std'] = ''
        if 'name' not in df.columns and 'over_under' in df.columns:
            df['name'] = df['over_under']
        if 'point' not in df.columns and 'line' in df.columns:
            df['point'] = pd.to_numeric(df['line'], errors='coerce')
        if 'name_std' not in df.columns and 'player' in df.columns:
            s = df['player'].astype(str).str.lower().str.replace(r"[^a-z0-9\s]","",regex=True)
            df['name_std'] = s.str.replace(r"\s+"," ",regex=True).str.strip()
        if 'player_key' not in df.columns and 'name_std' in df.columns:
            df['player_key'] = df['name_std'].str.replace(" ","-")
        return df

def point_key(v):
    try:
        f = float(v)
        return str(int(f)) if math.isfinite(f) and float(f).is_integer() else str(f)
    except Exception:
        return "" if pd.isna(v) else str(v)

def american_to_prob(price) -> float:
    try:
        p = float(price)
    except Exception:
        return np.nan
    if p > 0:
        return 100.0 / (p + 100.0)
    elif p < 0:
        return (-p) / ((-p) + 100.0)
    return np.nan

def normal_cdf(x, mu, sigma):
    if sigma is None or not np.isfinite(sigma) or sigma <= 0:
        return np.nan
    z = (x - mu) / (sigma * math.sqrt(2.0))
    return 0.5 * (1.0 + math.erf(z))

def poisson_cdf(k, lam):
    if lam is None or lam < 0 or not np.isfinite(lam):
        return np.nan
    k = int(math.floor(float(k)))
    s = 0.0
    for i in range(0, k + 1):
        s += math.exp(i * math.log(lam) - lam - math.lgamma(i + 1))
    return min(max(s, 0.0), 1.0)

def modeled_market_kind(m):
    m = str(m or "").lower()
    if m in {"recv_yds","receptions","rush_yds","rush_attempts","pass_yds","pass_attempts","pass_completions"}:
        return "normal_ou"
    if m in {"pass_tds","interceptions"}:
        return "poisson_ou"
    if m in {"anytime_td"}:
        return "binary_anytime"
    return "other"


## 2) Load inputs

In [3]:

props  = pd.read_csv(PATH_PROPS)
params = pd.read_csv(PATH_PARAMS)
edges  = pd.read_csv(PATH_EDGES) if Path(PATH_EDGES).exists() else None

# Normalize and create point_key in each
for df in (props, params):
    df[:] = _std_input(df)
    if 'point' not in df.columns:
        df['point'] = pd.NA
    df['point_key'] = df['point'].map(point_key)

print("props shape :", props.shape)
print("params shape:", params.shape)
print("edges shape :", None if edges is None else edges.shape)


props shape : (14731, 14)
params shape: (1041, 15)
edges shape : (14731, 23)



## 3) Normalization & join keys

We standardize each frame with `standardize_input(df)` to ensure:

- `market_std` — canonical market key (e.g., `player_reception_yds` → `recv_yds`)
- `name` — the bet side label (Over/Under/Yes/No)
- `point` — numeric threshold for O/U markets
- `name_std`, `player_key` — normalized player identifiers

For O/U joins, we also derive a canonical **`point_key`** from `point` (e.g., `65.0` → `"65"`).


## 4) Coverage diagnostics

In [4]:

e = edges.copy() if edges is not None else None

if e is None:
    print("No merged edges file present yet. Run make_props_edges.py to create it.")
else:
    e['market_std'] = e['market_std'].astype(str).str.lower()
    modeled = e[['model_prob','mkt_prob']].notna().all(axis=1)
    coverage = (e.assign(modeled=modeled)
                  .groupby('market_std', as_index=False)
                  .agg(total=('market_std','size'),
                       modeled=('modeled','sum'))
                  .assign(modeled_pct=lambda d: (d['modeled']/d['total']).round(3))
                  .sort_values(['modeled_pct','total'], ascending=[True, False]))
    import caas_jupyter_tools as cj
    cj.display_dataframe_to_user("Modeled coverage by market", coverage)

    un = e.loc[~modeled].copy()
    un['why'] = un.apply(
        lambda r: "no_params" if pd.isna(r.get('mu')) and pd.isna(r.get('lam')) 
        else "missing_mu_sigma" if pd.isna(r.get('mu')) or pd.isna(r.get('sigma'))
        else "missing_lam" if pd.isna(r.get('lam')) 
        else "unknown", axis=1)
    reason = (un.groupby(['market_std','why'])
                .size().reset_index(name='count')
                .sort_values(['count'], ascending=[False]))
    cj.display_dataframe_to_user("Unmodeled reasons by market", reason)

    # Example rows from two markets with low coverage
    low = coverage.loc[coverage['modeled_pct'] < 0.2, 'market_std'].head(3).tolist()
    sample = e[e['market_std'].isin(low)][['player','market_std','name','point','price','mu','sigma','lam','model_prob','mkt_prob']].head(30)
    cj.display_dataframe_to_user("Sample rows from low-coverage markets", sample)


ModuleNotFoundError: No module named 'caas_jupyter_tools'


## 5) Model families

### 5.1 Normal O/U
For markets like `recv_yds`, `rush_yds`, `receptions`, `pass_yds`, we use a Normal distribution:
\[ X \sim \mathcal{N}(\mu, \sigma^2) \]
- **Over** at threshold *t*: \( P(X > t) = 1 - \Phi((t-\mu)/\sigma) \)
- **Under** at threshold *t*: \( P(X < t) = \Phi((t-\mu)/\sigma) \)

### 5.2 Poisson O/U
For discrete counts like `pass_tds`, `interceptions`:
\[ X \sim \mathrm{Poisson}(\lambda) \]
- **Over** *t*: \( P(X > t) = 1 - F(\lfloor t \rfloor) \)
- **Under** *t*: \( P(X < t) = F(\lfloor t \rfloor) \)

### 5.3 Binary Anytime TD
We model the chance a player scores **at least one** TD as:
\[ P(\text{TD} \ge 1) = 1 - e^{-\lambda} \]
(Some feeds may present this as "Over 0.5" instead of "Yes".)


## 6) Example — per‑player probability

In [5]:

# Pick an example from edges that is modeled
if edges is not None:
    e = edges.copy()
    ex = e[e[['model_prob','mkt_prob']].notna().all(axis=1)].head(1)
    if len(ex):
        row = ex.iloc[0].to_dict()
        print("Example row:", {k: row.get(k) for k in ['player','market_std','name','point','mu','sigma','lam','price','model_prob','mkt_prob']})
    else:
        print("No modeled rows found yet — check coverage diagnostics above.")
else:
    print("No edges yet.")


Example row: {'player': 'Jahmyr Gibbs', 'market_std': 'anytime_td', 'name': 'Yes', 'point': nan, 'mu': 0.25, 'sigma': nan, 'lam': 0.25, 'price': -180, 'model_prob': 0.2211992169285951, 'mkt_prob': 0.6428571428571429}
