Built by Belgrave Warriors.
Public Voice turns a simple query into a visual pulse-check of what people are saying online: which topics dominate, and how those topics feel.
Serious insights, fun interface.
- Scrapes TikTok videos related to your query
- Extracts audio with ffmpeg
- Transcribes audio with whisper.cpp
- Cleans and processes text
- Runs sentiment analysis + topic modeling
- Serves topic coverage and sentiment via FastAPI
- Renders interactive topic bars in the React frontend
- Frontend: React + Vite + Recharts
- Backend API: FastAPI
- NLP/ML: Transformers + BERTopic
- Speech: whisper.cpp
- Media processing: ffmpeg / ffmpeg-python
bath_hack_26/
├── src/ # Frontend app
├── backend/
│ ├── centre_back.py # FastAPI server
│ ├── model.py # Main NLP + topic pipeline
│ ├── scraper.py # TikTok scrape + audio + transcription
│ └── clean_data.py # Text cleaning helpers
├── whisper.cpp/ # Local whisper.cpp source/build
├── pyproject.toml # Python dependencies
├── package.json # Frontend dependencies
└── README.md
On Ubuntu/Debian:
sudo apt update
sudo apt install -y python3 python3-venv python3-pip ffmpeg cmake build-essential pkg-configFrom repo root:
npm installFrom repo root:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .
pip install fastapi uvicorn requests ffmpeg-python python-dotenvCreate .env in repo root and add:
API_KEY=your_rapidapi_key
API_HOST=free-tiktok-api-scraper-mobile-version.p.rapidapi.com
OPENROUTER_API_KEY=your_openrouter_keyFrom repo root:
cmake -S whisper.cpp -B whisper.cpp/build
cmake --build whisper.cpp/build --config Release -- -j$(nproc)Optional model download:
cd whisper.cpp
./models/download-ggml-model.sh tiny.en
cd ..From repo root:
source .venv/bin/activate
cd backend
python -m uvicorn centre_back:app --reloadBackend URL: http://localhost:8000
From repo root:
npm run devFrontend URL: http://localhost:5173
This is the core flow inside Public Voice.
- User query enters frontend search bar.
- Frontend calls
GET /api/search?keyword=.... - FastAPI route in
backend/centre_back.pycallsmain(keyword)inbackend/model.py. backend/scraper.py:- Finds TikTok videos for the keyword
- Downloads audio with ffmpeg
- Transcribes with whisper.cpp
backend/model.py:- Cleans transcripts
- Runs sentiment model (positive/negative)
- Runs BERTopic for topic grouping
- Computes topic coverage percentages (
values) - Computes sentiment scores per topic (
sentiments)
- Backend returns JSON:
{
"labels": ["Topic A", "Topic B", "Topic C"],
"values": [0.42, 0.33, 0.25],
"sentiments": [0.81, 0.47, 0.22]
}- Frontend graph logic (
src/components/GraphPanel.jsx):- Bar height comes from
values(coverage as percent of total inputs) - Bar color is scaled from
sentiments- lower sentiment -> redder
- higher sentiment -> greener
- Bar height comes from
So in one glance:
- Height tells you how much that topic is being discussed
- Color tells you how positive/negative that topic is
API_KEY not set: ensure.envexists and hasAPI_KEY.whisper executable not found: verifywhisper.cpp/build/bin/whisper-cliexists.ValueError: k must be less than or equal to the number of training points: input set is too small for BERTopic/HDBSCAN; rerun with more scraped samples.ffmpegfailures: verify system ffmpeg is installed and available on PATH.
Made with intent by Belgrave Warriors.
Product: Public Voice.