A local application with UI for benchmarking multiple Models (SLMs) running via Microsoft Foundry Local.
π Read the full story: How we built FLPerformance - Learn about the architecture decisions, challenges faced, and how to get real-world LLM performance metrics on your local hardware.
Windows users: If you have Node.js installed, just run .\START_APP.ps1 to start everything! Opens 2 terminals + browser automatically. π
- Complete Benchmark System: Full end-to-end benchmarking with accurate metrics
- Enhanced Visualizations: Performance cards, comparison charts, and radar graphs
- Real-time Progress: Polling-based status updates every 2 seconds during runs
- Results Export: JSON and CSV export functionality
- Hardware Detection: Comprehensive system information capture
- Storage System: JSON-based storage with optional SQLite support
FLPerformance Foundry Local Performance enables you to:
- Manage Foundry Local service using the official JavaScript SDK
- Load and benchmark multiple models simultaneously
- Run standardized benchmark tests across models
- Display clear performance statistics with tables and charts
- Export results for analysis
Required: Install Microsoft Foundry Local first
# Windows
winget install Microsoft.FoundryLocal
# macOS
brew tap microsoft/foundrylocal
brew install foundrylocal
# Or download from: https://aka.ms/foundry-local-installerVerify installation:
foundry --versionStep 1: Navigate to project directory
cd C:\Users\YourUsername\path\to\FLPerformanceStep 2: Install Node.js (if not already installed)
# Windows - Install Node.js LTS
winget install --id OpenJS.NodeJS.LTS --accept-package-agreements --accept-source-agreements
# After installation, RESTART YOUR TERMINAL for PATH updatesmacOS:
brew install nodeOr download from: https://nodejs.org/
Step 3: Run installation script
# Windows
.\scripts\install.ps1
# macOS/Linux
chmod +x scripts/install.sh && ./scripts/install.shNote: Installation uses --no-optional flag to skip SQLite database (requires build tools).
Results are saved as JSON files instead. This works perfectly for all features!
Step 4: Start the application
# Easy Mode - Opens 2 terminals + browser automatically (Windows)
.\START_APP.ps1
# Manual Mode - Starts both servers
npm run devOnce the server starts, open your browser:
You'll see:
- Models tab - Add and load AI models
- Benchmarks tab - Run performance tests
- Results tab - View comparison charts
- Click Models β Initialize Foundry Local (one-time setup)
- Click Add Model β Select
phi-3-mini-4k-instruct - Click Load Model (downloads ~2GB, takes 2-5 minutes)
- Go to Benchmarks β Select your model β Run Benchmark
- View results in Results tab
-
Microsoft Foundry Local
- Download from: https://aka.ms/foundry-local-installer
- Verify installation:
foundry --version - Note: Foundry Local CLI must be in your PATH
-
Node.js & NPM
- Node.js v18 or higher
- NPM v9 or higher
- Download from: https://nodejs.org/
- Verify:
node --versionandnpm --version
-
System Requirements
- Windows 10/11, macOS, or Linux
- Minimum 16GB RAM (32GB+ recommended for multiple models)
- GPU with CUDA support (optional but recommended)
- Adequate disk space for model storage (varies by model, typically 5-50GB per model)
If the automated script doesn't work:
npm install --no-optional
# Install frontend dependencies
cd src/client
npm install
cd ../..
# Create results directory
mkdir resultsWant SQLite database support? Install Visual Studio Build Tools first:
# Windows only - needed for better-sqlite3
winget install Microsoft.VisualStudio.2022.BuildTools --silent --override "--wait --passive --add Microsoft.VisualStudio.Workload.VCTools"
# Then install with optional dependencies
npm install
# Create results directory
mkdir results# Development mode (with hot reload)
```bash
npm run devAccess the application at: http://localhost:3000
The application will be available at:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:3001
- Open the UI at http://localhost:3000
- Navigate to the Models tab
- Click "Initialize Foundry Local" to start the service
- Click "Add Model"
- Select a model from the available Foundry Local catalog (e.g.,
phi-3-mini-4k-instruct) - Click "Load Model" to download (if needed) and load the model into memory
Note: Foundry Local uses a single service instance that can load multiple models simultaneously. Models are differentiated by their model ID when making inference requests.
- Navigate to the Benchmarks tab
- Select the "default" benchmark suite
- Choose one or more models to benchmark
- Configure settings (iterations, concurrency, etc.)
- Click "Run Benchmark"
- Watch live progress as tests execute
- Navigate to the Results tab
- View comparison tables and charts
- Filter by run, model, or benchmark type
- Export results as JSON or CSV
FLPerformance/
βββ src/
β βββ server/ # Backend API
β β βββ index.js # Express server entry point
β β βββ orchestrator.js # Foundry Local service orchestration
β β βββ benchmark.js # Benchmark engine
β β βββ storage.js # Results storage (JSON + SQLite)
β β βββ logger.js # Structured logging
β βββ client/ # Frontend UI (React/Vue)
β βββ public/
β βββ src/
β βββ components/ # UI components
β βββ pages/ # Page views
β βββ utils/ # Client utilities
βββ benchmarks/
β βββ suites/
β βββ default.json # Default benchmark suite definition
βββ docs/
β βββ ARCHITECTURE.md # System architecture
β βββ API.md # REST API reference
β βββ SETUP.md # Setup documentation
β βββ BENCHMARK_GUIDE.md # Troubleshooting guide
βββ scripts/
β βββ helpers/ # Utility scripts
βββ results/
β βββ example/ # Example benchmark results
βββ package.json
βββ README.md
- Unified service management using foundry-local-sdk
- Add/remove models from Foundry Local catalog
- Load multiple models simultaneously in a single service
- Monitor model health and status in real-time
- Automatic model download and caching
- Throughput (TPS): Tokens generated per second (overall)
- Latency: Time to first token (TTFT), time per output token (TPOT), and end-to-end completion time
- Generation Speed (GenTPS): Token generation rate after first token (1000/TPOT)
- Stability: Error rate and timeout tracking
- Resource Usage: CPU, RAM, and GPU utilization (platform-dependent)
- Side-by-side model comparison tables
- Interactive charts for TPS, latency distributions (p50/p95/p99), error rates
- "Best model for..." recommendations based on metrics
- Export results as JSON or CSV
Default settings can be modified in the Settings tab:
- Default iterations per benchmark
- Concurrency level
- Request timeout values
- Results storage path
- Streaming mode (if supported)
FLPerformance uses the official foundry-local-sdk JavaScript package to manage the Foundry Local service:
- Single Service Instance: One Foundry Local service handles all models
- Multiple Loaded Models: Models are loaded on-demand and run simultaneously
- OpenAI-Compatible API: Standard OpenAI client for inference requests
- Model Differentiation: Models are identified by their model ID in API calls
See Architecture Documentation for details.
- Ensure Foundry Local is installed:
foundry --version - Verify Foundry Local CLI is in your PATH
- Check that port 8080 is available (default Foundry Local port)
- View logs in the Models tab for specific error messages
- Verify sufficient disk space for model download
- Check network connectivity for first-time downloads
- Ensure adequate RAM for model size
- Try manually loading with Foundry Local CLI:
foundry model run <model-name>
- Increase timeout values in Settings
- Reduce concurrency level
- Check system resource availability (RAM, GPU memory)
- Use the Test button in the Models tab to verify inference works
- Successful test ensures model will work in benchmarks
- Test validates both model loading and inference response
- Quick way to catch configuration issues early
- Run the appropriate installation script (install.ps1 or install.sh) for detailed diagnostics
- Check Quick Start Guide for common installation issues
- Verify Node.js version:
node --version(must be v18+)
For more detailed information, see:
- Quick Start Guide - Comprehensive getting started guide
- Quick Reference - Commands and code patterns cheat sheet
- Architecture Documentation - System design and SDK integration
- API Reference - REST API endpoint documentation
- Setup Guide - Detailed installation and configuration
- Benchmark Guide - Troubleshooting and testing guide
- Testing Checklist - Comprehensive test cases
For issues or questions:
- Check the documentation in
/docs - Review logs in the UI under each service
- Examine results in
/resultsdirectory
MIT License