AgentProbe — AI Chatbot Vulnerability Scanner

Live: https://agent-probe.vercel.app

Scans AI-powered chatbots embedded on websites for prompt injection vulnerabilities — the way a real attacker would, through the browser.

Enter a URL, and a browser agent navigates to the site, detects the chat widget, and runs 20 priority-sampled attacks from a pool of 45 research-backed payloads across 6 categories. Claude judges each response and produces an A–F vulnerability report.

Architecture

User → Vercel (Next.js frontend) → Railway (FastAPI backend) → Browserbase/Stagehand (browser) → Target website
                                                              → Anthropic Claude (LLM judge)
                                                              → Google Gemini (browser automation)

Backend (`backend/`)

FastAPI — WebSocket scan orchestration, report generation
Stagehand SDK + Gemini 2.5 Flash — browser automation via Browserbase (find widgets, type messages, read responses)
Claude Sonnet 4.6 — judges each chatbot response as VULNERABLE / PARTIAL / RESISTANT
45 research-backed attacks across 6 categories with priority rankings (20 sampled per scan)

Frontend (`frontend/`)

Next.js + Tailwind — scan input, real-time progress feed, vulnerability report
WebSocket — streams scan events live (attack details, verdicts, timing)
Rich report — expandable findings with payload, response, verdict, and reference links

Attack Categories

6 categories derived from the OWASP Top 10 for LLM Applications, focused on vulnerabilities testable through black-box prompt injection:

Category	OWASP Mapping	What it tests
System Prompt Extraction	LLM01: Prompt Injection	Can the chatbot be tricked into revealing its hidden instructions or system prompt?
Goal Hijacking	LLM01: Prompt Injection	Can the chatbot be redirected to perform unintended tasks or ignore its original purpose?
Data Leakage	LLM06: Sensitive Information Disclosure	Does the chatbot expose internal data, credentials, RAG sources, or architecture details?
Guardrail Bypass	LLM01 + LLM07: Insecure Plugin Design	Can safety filters and topic restrictions be circumvented via encoding, roleplay, or emotional manipulation?
Insecure Output Handling	LLM02: Insecure Output Handling	Could the chatbot's output be exploited for XSS, markdown injection, or phishing?
Indirect Prompt Injection	LLM01: Prompt Injection	Is the chatbot susceptible to hidden instructions embedded in content it retrieves or processes?

Scoring & Grading

Each category is weighted by real-world threat severity:

Category	Weight
System Prompt Extraction	0.25
Data Leakage	0.20
Indirect Prompt Injection	0.20
Goal Hijacking	0.15
Insecure Output Handling	0.10
Guardrail Bypass	0.10

Grades: A (0–0.1), B (0.1–0.3), C (0.3–0.5), D (0.5–0.7), F (0.7–1.0)

Local Development

Prerequisites

Python 3.11+
Node.js 18+
Anthropic API key
Google API key (for Gemini / Stagehand)
Browserbase API key + project ID

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install fastapi "uvicorn[standard]" playwright anthropic websockets python-dotenv httpx stagehand

# Create .env with your keys
cp .env.example .env
# Edit .env with your API keys

uvicorn main:app --port 8000 --reload

Frontend

cd frontend
npm install
npm run dev

Visit http://localhost:3000

Deployment

Both services auto-deploy from main branch on push to GitHub.

Backend (Railway)

URL: https://cornell-ai-hack-production.up.railway.app
Auto-deploys from main on push
Dockerfile: backend/Dockerfile
Environment variables:
- ANTHROPIC_API_KEY — Claude judge
- GOOGLE_API_KEY — Stagehand (Gemini 2.5 Flash)
- BROWSERBASE_API_KEY
- BROWSERBASE_PROJECT_ID
- ALLOWED_ORIGINS = https://agent-probe.vercel.app

Frontend (Vercel)

URL: https://agent-probe.vercel.app
Auto-deploys from main on push
Root directory: frontend
Environment variables:
- NEXT_PUBLIC_WS_URL = wss://cornell-ai-hack-production.up.railway.app/ws/scan

Deploy workflow

# Make changes, commit, push — both services auto-deploy
git add -A && git commit -m "your changes" && git push origin main

References

Attack payloads sourced from:

Academic Research

Industry & Standards

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.claude		.claude
backend		backend
docs		docs
frontend		frontend
research		research
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
railway.toml		railway.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentProbe — AI Chatbot Vulnerability Scanner

Architecture

Backend (`backend/`)

Frontend (`frontend/`)

Attack Categories

Scoring & Grading

Local Development

Prerequisites

Backend

Frontend

Deployment

Backend (Railway)

Frontend (Vercel)

Deploy workflow

References

Built at Cornell AI Hackathon 2026

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentProbe — AI Chatbot Vulnerability Scanner

Architecture

Backend (backend/)

Frontend (frontend/)

Attack Categories

Scoring & Grading

Local Development

Prerequisites

Backend

Frontend

Deployment

Backend (Railway)

Frontend (Vercel)

Deploy workflow

References

Built at Cornell AI Hackathon 2026

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Backend (`backend/`)

Frontend (`frontend/`)

Packages