Skip to content

JDL440/nfl-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

698 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NFL Content Intelligence Platform

A v2 TypeScript platform for planning, drafting, reviewing, and publishing NFL content.

Overview

This repository now centers on a real application stack:

  • TypeScript application under src/
  • Hono + HTMX dashboard for the editorial workstation
  • SQLite repository for pipeline state, artifacts, stage runs, and usage tracking
  • Multi-provider LLM gateway with pluggable providers and stage-aware model routing
  • Service integrations for Substack, Twitter/X, image generation, and NFL data ingestion

The original v1 markdown-first system has been removed (available in git history for reference).

Quick Start

Prerequisites

  • Node.js 22+
  • npm
  • One LLM option:
    • Copilot via GITHUB_TOKEN or gh auth login
    • LM Studio via local OpenAI-compatible server
    • Mock via MOCK_LLM=1

First run

npm install
npm run v2:init
npm run v2:serve

Open http://localhost:3456.

Environments

The project uses three isolated environments. Each has its own data directory, database, and configuration to prevent test artifacts from polluting production and to ensure auth can't be accidentally disabled on the live site.

Environment Data directory Auth Port Launch
Development ~/.nfl-lab-dev off 3457 .\dev.ps1
Test auto-created tmpdir off npm run v2:test
Production ~/.nfl-lab-prod local (required) 3456 .\prod.ps1 or NSSM service

Development

.\dev.ps1              # source-mode via tsx — no build needed
.\dev.ps1 -Built       # builds first, then runs from dist/
.\dev.ps1 -WithMcp     # opens MCP debug windows alongside the dashboard
.\dev.ps1 -Port 3500   # override the default port
  • Uses ~/.nfl-lab-dev with its own config/.env (auth off, port 3457)
  • Source mode (npm run v2:serve) is the default — no rebuild required
  • -WithMcp opens separate PowerShell windows for npm run mcp:server and npm run v2:mcp; leave it off for normal work

Testing

npm run v2:test                          # full test suite (1600+ tests)
npm run v2:test -- tests/dashboard/      # run a specific directory
npm run v2:test -- -t "search filter"    # run tests matching a pattern
  • Tests automatically create a temporary data directory and force NODE_ENV=test
  • Each test gets a fresh SQLite database — no shared state, no cleanup needed
  • The production database has a safety check that refuses to open if the path looks like a test directory

Production

Option A — Manual start:

.\prod.ps1             # builds and starts from dist/
.\prod.ps1 -NoBuild    # skip build, start from existing dist/

Option B — Windows service (recommended):

The production server runs as an NSSM service named NFL-Lab-Prod:

nssm start NFL-Lab-Prod    # start the service
nssm stop NFL-Lab-Prod     # stop the service
nssm restart NFL-Lab-Prod  # restart after config changes

Service configuration:

Setting Value
Path C:\Program Files\nodejs\node.exe
Arguments dist\cli.js serve
Startup directory C:\github\nfl-eval
Environment NFL_DATA_DIR=~/.nfl-lab-prod, NODE_ENV=production, NFL_PORT=3456
Logs ~/.nfl-lab-prod/logs/service-stdout.log (rotate at 1 MB)

Production requires:

  • DASHBOARD_AUTH_MODE=local with credentials set (preflight check will block startup otherwise)
  • A separate data directory from dev/test
  • Credentials stored only in ~/.nfl-lab-prod/config/.env (not in the repo)

Backups

.\backup-db.ps1                   # back up the production database
.\restore-test.ps1                # verify the latest backup is valid
  • Uses SQLite's .backup API for safe online backups (not file copy)
  • Backups are stored in ~/.nfl-lab-prod/backups/ with 30-day retention
  • Schedule daily backups via Windows Task Scheduler

Common environment variables

Each environment stores its .env in {dataDir}/config/.env. The app loads .env from the data directory first, then falls back to the repo root.

Core runtime:

  • NFL_DATA_DIR — data directory (~/.nfl-lab-dev, ~/.nfl-lab-prod, etc.)
  • NFL_PORT — dashboard port (default 3456)
  • NFL_LEAGUE — league code (default nfl)
  • NFL_CONTEXT_PRESET — article context preset: rich (default) or balanced

LLM selection:

  • MOCK_LLM=1 — force the mock provider for tests and local UI work
  • LLM_PROVIDER=lmstudio — prefer LM Studio
  • LMSTUDIO_URL — LM Studio base URL, default http://localhost:1234/v1
  • LMSTUDIO_MODEL — optional LM Studio default model override
  • GITHUB_TOKEN — GitHub Models / Copilot auth when using the Copilot provider
  • COPILOT_CLI_MODEnone (default, text-only) or article-tools (guarded web search + repo MCP for Copilot CLI only)
  • COPILOT_CLI_WEB_SEARCH — set to 0 to disable Copilot CLI web search access (default enabled)
  • COPILOT_CLI_MCP_CONFIG — override the repo-scoped Copilot MCP config file (default .copilot/mcp-config.json)
  • COPILOT_CLI_SESSION_REUSE — opt into the guarded Copilot session-reuse experiment; currently traces the request and falls back to one-shot mode

Optional service integrations:

  • GEMINI_API_KEY — image generation
  • SUBSTACK_TOKEN
  • SUBSTACK_PUBLICATION_URL
  • TWITTER_API_KEY

Dashboard auth:

  • DASHBOARD_AUTH_MODEoff (default) or local
  • DASHBOARD_AUTH_USERNAME — required when DASHBOARD_AUTH_MODE=local
  • DASHBOARD_AUTH_PASSWORD — required when DASHBOARD_AUTH_MODE=local
  • DASHBOARD_SESSION_COOKIE — optional cookie name override
  • DASHBOARD_SESSION_TTL_HOURS — session lifetime, default 24
  • DASHBOARD_SECURE_COOKIES — set to true to add the Secure flag to cookies (enable when using TLS)

TLS (optional — enables HTTPS):

  • NFL_TLS_CERT — path to TLS certificate PEM file
  • NFL_TLS_KEY — path to TLS private key PEM file
  • NFL_HTTP_PORT — port for the HTTP→HTTPS redirect server (default 80)

When both NFL_TLS_CERT and NFL_TLS_KEY are set, the server starts in HTTPS mode and launches a secondary HTTP listener that 301-redirects all traffic to HTTPS. Enable DASHBOARD_SECURE_COOKIES=true when using TLS.

Dashboard auth

The dashboard stays open by default so existing tests and solo local development continue to work. Production mode enforces auth — the server refuses to start with DASHBOARD_AUTH_MODE=off when NODE_ENV=production.

Auth switches the dashboard to a simple local login flow backed by SQLite sessions and an httpOnly cookie. Protected dashboard pages, HTMX endpoints, JSON APIs, SSE, and unpublished image routes require a valid session. Static assets, /login, and published image URLs remain public.

Security headers

All responses include:

  • X-Frame-Options: DENY — prevents clickjacking
  • X-Content-Type-Options: nosniff — prevents MIME-type sniffing

Architecture

Dashboard (Hono + HTMX)
  -> Pipeline Engine (state machine)
    -> Agent Runner
      -> LLM Gateway
        -> Providers (Copilot, LM Studio, Mock, others)

SQLite Repository
  -> articles
  -> artifacts
  -> stage transitions / stage runs
  -> editor reviews / publisher pass
  -> usage events and cost tracking

Services
  -> Substack
  -> Twitter/X
  -> Image generation
  -> Data (nflverse and related sources)

Layer summary

  • Dashboard: server-rendered editorial UI in src/dashboard/
  • Pipeline Engine: stage transition rules and validation in src/pipeline/
  • Agent Runner: loads charters, skills, and memory for article work
  • Agent Runner tool loop: app-managed JSON tool calling for in-app agents, with bounded retries and trace capture
  • LLM Gateway: resolves models and routes requests across providers
  • Repository: SQLite-backed persistence in src/db/
  • Services: outbound publishing, media generation, and data adapters in src/services/

Configuration

LLM providers

The v2 dashboard can run with different providers:

  • Copilot — default when GitHub auth is available
  • LM Studio — local OpenAI-compatible endpoint
  • Mock — deterministic test/development provider

The broader gateway also includes additional provider implementations under src/llm/providers/.

Model routing

Stage-aware routing lives in:

  • ~/.nfl-lab/config/models.json

ModelPolicy uses that file to resolve stage defaults, depth-aware panel sizing, token budgets, and tier precedence before requests hit the gateway.

Agent charters and skills

Agent knowledge is loaded from the data directory at runtime. On a fresh install, npm run v2:init seeds default charters, skills, and bootstrap memory.

Skills can also advertise tool groups in frontmatter. The runtime resolves those through a shared registry used by both the in-app AgentRunner and the MCP surfaces, so dashboard/pipeline agents and MCP clients see the same tool catalog.

  • Charters: ~/.nfl-lab/agents/charters/{league}/ — agent identity and boundaries
  • Skills: ~/.nfl-lab/agents/skills/ — workflow instructions and output formats
  • Memory: ~/.nfl-lab/agents/memory.db — persistent learnings, decisions, and domain knowledge (storage remains, but live prompt injection is currently disabled)

See docs/knowledge-system.md for the full knowledge architecture, bootstrap process, and multi-league extensibility guide.

The repo also includes a proof-of-concept structured knowledge slice for issue #85:

  • Glossary seeds: src/config/defaults/glossaries/
  • Team identity sheets: content/data/team-sheets/

Dashboard Pages

The editorial dashboard includes:

  • Home (/) — ready-to-publish queue, pipeline summary, recent ideas, recent publishing activity, filters
  • Article Detail (/articles/:id) — action panel, artifacts, revision history, usage, metadata, publishing checks, and trace access
  • New Idea (/ideas/new) — prompt-driven article creation with team selection and optional auto-advance
  • Config (/config) — active provider, model routing, runtime paths, prompt inventory, env var status, and refresh-all maintenance
  • Trace Pages (/articles/:id/traces, /traces/:id) — dedicated LLM trace timelines and per-trace inspection

Legacy Agents, Memory, and Runs dashboard pages were removed from the primary UI surface. Memory storage still exists for bootstrap/refresh workflows, but the dashboard now represents it as a deprecated backend capability rather than an active browser surface.

Pipeline Stages

Every article moves through the same eight-stage pipeline:

  1. Idea Generation — create the article concept and metadata
  2. Discussion Prompt — generate the central framing question
  3. Panel Composition — choose the right experts/agents for the topic
  4. Panel Discussion — gather the core analysis and disagreements
  5. Article Drafting — turn analysis into a draft
  6. Editor Pass — review for accuracy, issues, and revision needs
  7. Publisher Pass — finalize metadata and prep for publication
  8. Published — live on the target publication

Repository Layout

src/
  agents/       Agent runner + memory integration
  cli/          CLI helpers and export logic
  config/       App config + seeded defaults
  dashboard/    Hono server, HTMX handlers, views, SSE
  db/           SQLite repository and artifact store
  llm/          Gateway, model policy, providers
  migration/    v1 -> v2 migration support
  pipeline/     Engine, scheduler, actions, audit
  services/     Substack, Twitter, image, markdown, data
.squad/         Squad team config, agent charters, decisions, skills
mcp/            Legacy/local MCP entrypoints and smoke tests
tests/          Unit, integration, and e2e coverage
dev.ps1         Development server launcher (source-mode, isolated data dir)
prod.ps1        Production server launcher (builds first, enforces auth)
backup-db.ps1   SQLite online backup with 30-day retention
restore-test.ps1  Backup verification and integrity check
ralph-watch.ps1   Local Ralph outer loop (PowerShell)

Services and MCP Tools

Runtime and publishing integrations live in src/services/.

Key areas:

  • Substack publishing workflows
  • Twitter/X promotion support
  • Image generation and render helpers
  • Data services for nflverse-backed analysis
  • MCP tooling via npm run v2:mcp and legacy tooling under mcp/

The current tool stack uses one shared catalog:

  • src/agents/local-tools.ts — in-process executor and tool filtering for in-app agent runs
  • src/tools/pipeline-tools.ts — pipeline tool definitions shared with the v2 MCP server
  • mcp/local-tool-registry.mjs — extension-backed local tool registry shared by the legacy/local MCP server

For the legacy extension-oriented MCP tooling, see .github/extensions/README.md. For the main dashboard runtime, prefer the v2 CLI commands over the archived v1 dashboard scripts in package.json.

Development

Commands

npm test
npm run v2:test
npm run v2:build
npm run v2:dev
npm run v2:status

Retrospective digest workflow

Use the manual retrospective digest when you want to mine recent article retrospectives into bounded, human-reviewable follow-up work.

npx tsx src/cli.ts retrospective-digest --limit 25
npx tsx src/cli.ts retro-digest --json --limit 10
  • Reads structured data from article_retrospectives and article_retrospective_findings
  • Produces a bounded digest with:
    • issue-ready process-improvement candidates
    • learning-update candidates
    • grouped supporting evidence by role + finding type
  • Stays read-only in v1: review the output first, then manually turn approved items into GitHub issues or decision/knowledge updates
  • Use --json when you want the same bounded report shape for downstream tooling or copy/paste review
  • Use --limit N to control how many recent retrospectives are scanned

Recommended operator loop:

  1. Run the digest on demand after a meaningful batch of recent article completions.
  2. Review the promoted candidate sections before the grouped evidence section.
  3. Manually promote approved process changes into issues and reusable learnings into team docs/decision inbox entries.

Test layout

  • tests/llm/ — provider, routing, and model policy tests
  • tests/pipeline/ — state machine and transition behavior
  • tests/dashboard/ — rendering and endpoint coverage
  • tests/e2e/ — end-to-end article workflow behavior

Run the full suite with:

npx vitest run

Squad — AI Team Coordination

This project uses Squad to coordinate a team of AI agents that manage the issue backlog, write code, and move the project board.

Team Roster

Agent Role Scope
Lead Triage & architecture Coordination, cross-functional work, design decisions
Code Core developer TypeScript, Hono, vitest, code reviews, refactoring
Data Data engineer nflverse queries, Python, NFL analytics, statistical analysis
Publisher Content distribution Substack publishing, Twitter/X, Markdown→HTML
Research Documentation & analysis Tech research, knowledge management, reports
DevOps Infrastructure GitHub Actions, CI/CD, MCP tools, .github/extensions/
UX Dashboard & frontend HTMX views, SSE, CSS, user experience
Ralph Work monitor Issue queue scanning, project board automation, heartbeat
Scribe Session logger Decisions, meeting notes, cross-agent context

Agent charters live in .squad/agents/*/charter.md. Routing rules are in .squad/routing.md.

Note: These Squad agents handle project coordination. The 47 article pipeline agents in src/config/defaults/charters/nfl/ are a separate system loaded by the pipeline engine for content production.

GitHub Issues + Project Board

Issues are the task system. Create an issue with the squad label and the team handles the rest.

Project board: github.com/users/JDL440/projects/1

Status Meaning
Todo New work ready to start
In Progress Agent actively working on it
Pending User Needs human decision or input
Blocked Cannot proceed (blocker in comments)
For Review PR created, ready for review
Done Completed and merged

Labels for routing:

Label Routes to
squad General — Lead triages
squad:code Code agent
squad:data Data agent
squad:publisher Publisher agent
squad:research Research agent
squad:devops DevOps agent
squad:ux UX agent
squad:ralph Ralph (work monitor)
squad:lead Lead agent
squad:scribe Scribe agent

Talking With Your Squad

Every agent writes comments on the issue thread — analysis, questions, progress updates. Each comment starts with a TLDR so you can skim.

Workflow:

  1. Read the TLDR
  2. Reply with instructions or guidance in the issue comments
  3. Set status back to "Todo" if more work is needed
  4. The agent picks it up on the next Ralph cycle

The Ralph Loop

Ralph watches the issue queue and spawns agents for actionable work. Two modes:

Local (PowerShell):

.\ralph-watch.ps1
  • Runs every 5 minutes with a system-wide mutex guard (single instance)
  • Pulls latest code before each round
  • Spawns copilot --agent squad with a parallelism-maximizing prompt
  • Structured logging to ~/.squad/ralph-watch.log
  • Heartbeat file at ~/.squad/ralph-heartbeat.json

GitHub Actions:

The squad-heartbeat.yml workflow runs on a cron schedule (*/30 * * * *) to scan for untriaged issues, auto-route to agents, and reconcile the pipeline.

Creating Tasks

Create an issue with:

  • A descriptive title
  • The squad label (or a specific squad:* label for direct routing)
  • Optionally: priority:p0 / priority:p1 / priority:p2
  • Optionally: type:bug / type:feature / type:chore / type:docs

That's it. Ralph picks it up, assigns the right agent, moves the board, and reports back.

Squad File Layout

.squad/
  team.md              Team roster (parsed by workflows for label routing)
  routing.md           Keyword → agent routing rules
  decisions.md         Append-only decision ledger
  ceremonies.md        Team rituals and cadences
  agents/
    lead/charter.md    Agent identity, scope, and behavior rules
    code/charter.md
    data/charter.md
    devops/charter.md
    publisher/charter.md
    research/charter.md
    ux/charter.md
    ralph/charter.md
    scribe/charter.md
  skills/
    github-project-board/SKILL.md   Project board status workflow + IDs
  casting/             Agent creation/retirement registry
  identity/            Team identity state
  log/                 Append-only activity log
  orchestration-log/   Cross-agent coordination log

History

The original v1 implementation has been removed from the working tree (available in git history for reference). All active development targets the v2 TypeScript application in src/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors