Marvin is a realtime voice-controlled assistant designed to interact with a map interface. It allows users to perform actions such as zooming to locations, adding markers, finding routes, and toggling map layers using voice commands. This project integrates a web-based frontend with a Python backend to provide a seamless user experience.
- Voice-Controlled Map Interaction: Control map functions using voice commands.
- Realtime Transcription: Transcribes voice commands in realtime.
- Location Search and Zoom: Zooms into specified locations on the map.
- Marker Placement: Adds markers to the map based on voice commands.
- Route Finding: Calculates and displays routes between two locations.
- Layer Control: Toggles satellite and highway layers on the map.
- Current Location: Finds and marks the user's current location.
- Frontend:
- HTML
- CSS
- JavaScript
- OpenLayers
- Axios
- Font Awesome
- Backend:
- Python 3.9
- FastAPI
- Transformers (Hugging Face)
- Torch
- Services:
- Nominatim (OpenStreetMap) for geocoding
- Openrouteservice for route calculation
-
Models Used:
- ✔️ Whisper (openai/whisper-small.en): Used for realtime speech transcription.
- ✔️ AST Speech Commands (MIT/ast-finetuned-speech-commands-v2): Used for wake word detection and audio classification.
-
Approach:
- ➜ Speech Transcription: Audio input is streamed to the Whisper model, which converts speech to text in realtime.
- ➜ Wake Word Detection: The audio is also classified using the AST model to detect the wake word ("Marvin") with a defined threshold.
- ➜ Command Parsing: The transcribed text is then parsed using custom regex logic to determine user commands (e.g., zoom, add marker, route).
Before you begin, ensure you have the following installed:
- Python 3.9+
- Docker (optional, for containerized deployment)
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/Mallikarjunreddy3015/hackathon.git cd hackathon -
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Linux/macOS venv\Scripts\activate # On Windows
-
Install the required Python packages:
pip install --no-cache-dir -r requirements.txt
-
Run the FastAPI backend:
uvicorn server:app --host 127.0.0.1 --port 8080 --reload
This command starts the server locally. The
--reloadflag enables automatic reloading upon code changes. -
Open the interface in your web browser:
Navigate to
http://127.0.0.1:8080to access the Realtime Assistant Interface. -
Using the Interface:
- Ensure your microphone is enabled and accessible to the browser.
- Wait for the "Waiting for wake word..." status.
- Say the wake word ("Marvin") followed by your command.
- Examples:
- "Marvin, zoom into Australia"
- "Marvin, Navigate me to Mumbai"
- "Marvin, Show satellite view"
-
Build the Docker image:
docker build -t rokvin/spacecon-realtime . -
Run the Docker container:
docker run -d -p 8080:8080 rokvin/spacecon-realtime
This command runs the container in detached mode, mapping port 8080 on the host to port 8080 in the container.
-
Access the application:
Open your web browser and go to
http://localhost:8080.
-
Install system dependencies:
Ensure
ffmpegis installed on your deployment environment.sudo apt-get update sudo apt-get install ffmpeg
-
Transfer the project files:
Copy all project files to your deployment server.
-
Set up the environment:
Create a virtual environment and install the dependencies as described in the Installation section.
-
Run the application:
uvicorn server:app --host 0.0.0.0 --port 8080
This makes the application accessible from any IP address. Consider using a process manager like
systemdorsupervisorto ensure the application restarts automatically if it crashes.
- Docker Hub: Official Docker Image
- Live Demo deployed as a Docker Image in Google Cloud: Marvin Demo
- YouTube Video: Marvin Demo Video
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
Thank you.
