Need Guidance for AI Interview Coach Project (Voice Analysis & Transcription Issues) #198196
Replies: 3 comments 1 reply
-
|
Hard to give specific advice without knowing more about what's actually failing. What stack are you using (Whisper, AssemblyAI, Web Speech API, something else for STT? What for voice analysis?)? And when you say "stability issues," do you mean crashes, memory leaks, inconsistent results, or something else? The more concrete you can be about the actual errors or unexpected behavior you're seeing, the easier it is for people here to point you in the right direction. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @chhavi-1234, That sounds like an interesting project! AI interview coaches combine several challenging areas, so it's normal to encounter issues during development. Here are a few suggestions based on the problems you mentioned: Speech-to-Text Accuracy
Grammar and Content Quality
Voice Analysis
Project Stability
It would also help if you shared your tech stack (Python, React, Node.js, Streamlit, etc.) and the specific errors you're seeing. That will make it easier for others to provide targeted advice. Good luck with your project! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @chhavi-1234, I've built a couple of voice analysis pipelines in Python and ran into pretty much the same two problems you're describing, so hopefully this saves you some time. The inconsistent transcription is almost certainly coming from from faster_whisper import WhisperModel
# load this once when the app starts, not per request
stt_model = WhisperModel("small", device="cpu", compute_type="int8")
segments, info = stt_model.transcribe("answer.wav", vad_filter=True)
transcript = " ".join(s.text for s in segments)The One more thing on transcription: check what format your audio actually arrives in. Browser recordings usually come in as WebM/Opus at 48 kHz, and feeding that in directly is a classic source of flaky results. Just normalize everything to 16 kHz mono WAV first: For the slow processing, two things to check. First, make sure you're not creating your spaCy / SentenceTransformer / STT models inside the request handler. That's a super common mistake with Flask and it means every single request pays several seconds of model loading. Load them once as module level globals at startup. Second, don't run the whole pipeline inside the HTTP request. Save the upload, return a job id right away, do the transcription and analysis in a background worker (RQ or Celery, or honestly even a ThreadPoolExecutor is fine for a student project), and let the frontend poll something like On the analysis side, praat-parselmouth is faster and more stable than librosa's pyin for pitch and speaking rate. And I'd drop TextBlob for grammar, it's pretty weak. language-tool-python gives consistent results, and if you score on a fixed rubric (words per minute, filler word count, pause ratio, similarity to a reference answer) your numbers become reproducible instead of jumping around. Honestly, most of your "inconsistent analysis" will fix itself once the transcript is stable, since everything downstream depends on it. Good luck with the project! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🏷️ Discussion Type
Question
💬 Feature/Topic Area
Study Projects & Resources
Hi everyone,
Hi everyone, I am currently working on an AI Interview Coach with Voice Analysis project and I am facing several technical challenges that are affecting the overall functionality of the application.
Speech-to-Text Accuracy Problems
Grammar and Content Quality
Voice Analysis Issues
Project Stability
If anyone has worked on similar AI, speech recognition, or voice analysis projects, I would be grateful for any guidance, resources, or recommendations.
Beta Was this translation helpful? Give feedback.
All reactions