FitBuddy is an AI-powered personal trainer that provides real-time video and audio feedback on exercise posture and form. The agent can see users through their camera and offer helpful, encouraging guidance during workouts.
- Real-time Video Analysis: See user posture and provide immediate feedback
- Voice Interaction: Natural conversation with American accent and friendly tone
- Exercise Guidance: Offer helpful feedback on posture and form
- Multimedia Sessions: Video streaming with bidirectional audio
- Session Management: Configurable session limits and timeouts
- Framework: Built with
vision-agentsfor multimedia AI agents - Web Server: FastAPI for HTTP API endpoints
- Video Streaming: Stream.io for edge video transport
- AI Processing: OpenAI Realtime for language understanding
- Audio Processing: Deepgram for speech-to-text and text-to-speech
- Language: Python 3.12+ with dependency management via uv
- Python 3.12 or higher
- uv (Python package manager)
- API keys for required services (see Configuration section)
-
Clone the repository:
git clone <repository-url> cd fitbuddy
-
Install dependencies:
uv sync
-
Activate the virtual environment:
uv shell
-
Copy environment template:
cp .env.example .env
-
Edit
.envfile with your API keys:OPENAI_API_KEY=your_openai_api_key STREAM_API_KEY=your_stream_api_key_here STREAM_API_SECRET=your_stream_api_secret_here DEEPGRAM_API_KEY=your_deepgram_api_key_here
-
Optional configuration (default values shown):
MAX_CONCURRENT_SESSIONS=5 MAX_SESSIONS_PER_CALL=1 MAX_SESSION_DURATION_SECONDS=3600 AGENT_IDLE_TIMEOUT=120
The instructions for the AI agent are in the personal_trainer.md file.
Start the FitBuddy in development mode:
uv run main.py runIf you want to run the API server (with API):
uv run main.py serveThe server will start and be ready to handle incoming calls and sessions.
- Session Start: When a user joins, the AI personal trainer greets them
- Exercise Monitoring: The agent watches the user's posture through video
- Real-time Feedback: Provides voice guidance on form and technique
- Interactive Dialogue: Users can ask questions and receive responses
- Session End: Automatically disconnects after timeout or when session completes
The AI personal trainer:
- Speaks with a friendly, encouraging tone
- Uses an American accent
- Focuses on posture and exercise form
- Provides constructive feedback
- Maintains engaging conversation
- MAX_CONCURRENT_SESSIONS: Maximum total concurrent agent sessions (default: 5)
- MAX_SESSIONS_PER_CALL: Agents per individual call (default: 1)
- MAX_SESSION_DURATION_SECONDS: Maximum session length in seconds (default: 3600)
- AGENT_IDLE_TIMEOUT: Disconnect after inactivity in seconds (default: 120)
The system tracks:
- User interruptions during agent speech
- Turn completion and speaking duration
- Session metrics and logs
fitbuddy/
├── main.py # Main application server
├── personal_trainer.md # Agent instructions
├── .env.example # Environment template
├── pyproject.toml # Project dependencies
└── README.md # This file
Key packages:
vision-agents[deepgram,getstream,openai]- Core agent frameworkfastapi- Web serverpython-dotenv- Environment managementdeepgram- Audio processingopenai- AI language model
- API Key Errors: Ensure all required API keys are properly set in
.env - Connection Issues: Check network connectivity and API service status
- Audio/Video Problems: Verify camera and microphone permissions
- Session Timeouts: Adjust timeout values in configuration if needed