Need Guidance for AI Interview Coach Project (Voice Analysis & Transcription Issues) #198196

chhavi-1234 · 2026-06-06T09:00:21Z

chhavi-1234
Jun 6, 2026

🏷️ Discussion Type

Question

💬 Feature/Topic Area

Study Projects & Resources

Hi everyone,

Hi everyone, I am currently working on an AI Interview Coach with Voice Analysis project and I am facing several technical challenges that are affecting the overall functionality of the application.
Speech-to-Text Accuracy Problems
Grammar and Content Quality
Voice Analysis Issues
Project Stability
If anyone has worked on similar AI, speech recognition, or voice analysis projects, I would be grateful for any guidance, resources, or recommendations.

Crackle2K · 2026-06-06T14:37:31Z

Crackle2K
Jun 6, 2026

Hard to give specific advice without knowing more about what's actually failing. What stack are you using (Whisper, AssemblyAI, Web Speech API, something else for STT? What for voice analysis?)? And when you say "stability issues," do you mean crashes, memory leaks, inconsistent results, or something else? The more concrete you can be about the actual errors or unexpected behavior you're seeing, the easier it is for people here to point you in the right direction.

1 reply

chhavi-1234 Jun 6, 2026
Author

My project is built using Python and Flask. For speech-to-text conversion, I am using the SpeechRecognition library with Google's Web Speech API. For audio processing and conversion, I use pydub, FFmpeg, and librosa. For NLP and text analysis, I use TextBlob, spaCy, and Sentence Transformers. SQLite is used as the database.
I am not facing memory leak issues. The main problem is inconsistent transcription and analysis results.
Processing takes a long time before results are displayed.

tanvishinde017 · 2026-06-06T15:44:16Z

tanvishinde017
Jun 6, 2026

Hi @chhavi-1234,

That sounds like an interesting project! AI interview coaches combine several challenging areas, so it's normal to encounter issues during development.

Here are a few suggestions based on the problems you mentioned:

Speech-to-Text Accuracy

Try models such as OpenAI Whisper, Whisper.cpp, or Azure Speech Services.
Improve microphone quality and reduce background noise.
Consider language-specific models if your users have different accents.

Grammar and Content Quality

Use an LLM or grammar-checking service to evaluate responses.
Define clear scoring criteria (clarity, relevance, confidence, structure, etc.).
Save transcripts so you can compare outputs and improve prompts over time.

Voice Analysis

Libraries like librosa, pyAudioAnalysis, or Praat can extract features such as pitch, speaking rate, and energy.
Avoid relying solely on emotion detection since it can be inconsistent across speakers and environments.

Project Stability

Separate your application into modules (audio capture, transcription, analysis, UI).
Add logging and error handling around API calls.
Test each component independently before integrating everything.

It would also help if you shared your tech stack (Python, React, Node.js, Streamlit, etc.) and the specific errors you're seeing. That will make it easier for others to provide targeted advice.

Good luck with your project!

0 replies

DanjalZockt · 2026-06-10T19:37:14Z

DanjalZockt
Jun 10, 2026

Hi @chhavi-1234,

I've built a couple of voice analysis pipelines in Python and ran into pretty much the same two problems you're describing, so hopefully this saves you some time.

The inconsistent transcription is almost certainly coming from recognize_google(). The SpeechRecognition library uses an unofficial demo endpoint from Google, so it's rate limited, has no SLA, and you genuinely get different quality depending on the day. I'd swap it for Whisper running locally via faster-whisper. The small model runs fine on CPU and gives you the same result for the same audio every time:

from faster_whisper import WhisperModel

# load this once when the app starts, not per request
stt_model = WhisperModel("small", device="cpu", compute_type="int8")

segments, info = stt_model.transcribe("answer.wav", vad_filter=True)
transcript = " ".join(s.text for s in segments)

The vad_filter=True part also takes care of silence and segmentation for you.

One more thing on transcription: check what format your audio actually arrives in. Browser recordings usually come in as WebM/Opus at 48 kHz, and feeding that in directly is a classic source of flaky results. Just normalize everything to 16 kHz mono WAV first:

ffmpeg -i in.webm -ar 16000 -ac 1 out.wav

For the slow processing, two things to check. First, make sure you're not creating your spaCy / SentenceTransformer / STT models inside the request handler. That's a super common mistake with Flask and it means every single request pays several seconds of model loading. Load them once as module level globals at startup.

Second, don't run the whole pipeline inside the HTTP request. Save the upload, return a job id right away, do the transcription and analysis in a background worker (RQ or Celery, or honestly even a ThreadPoolExecutor is fine for a student project), and let the frontend poll something like /result/<job_id>. Nice side effect: you can show the transcript as soon as it's done and let the analysis come in after, which feels much faster to the user.

On the analysis side, praat-parselmouth is faster and more stable than librosa's pyin for pitch and speaking rate. And I'd drop TextBlob for grammar, it's pretty weak. language-tool-python gives consistent results, and if you score on a fixed rubric (words per minute, filler word count, pause ratio, similarity to a reference answer) your numbers become reproducible instead of jumping around.

Honestly, most of your "inconsistent analysis" will fix itself once the transcript is stable, since everything downstream depends on it. Good luck with the project!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Need Guidance for AI Interview Coach Project (Voice Analysis & Transcription Issues) #198196

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Need Guidance for AI Interview Coach Project (Voice Analysis & Transcription Issues) #198196

Uh oh!

chhavi-1234 Jun 6, 2026

🏷️ Discussion Type

💬 Feature/Topic Area

Hi everyone,

Replies: 3 comments · 1 reply

Uh oh!

Crackle2K Jun 6, 2026

Uh oh!

chhavi-1234 Jun 6, 2026 Author

Uh oh!

tanvishinde017 Jun 6, 2026

Speech-to-Text Accuracy

Grammar and Content Quality

Voice Analysis

Project Stability

Uh oh!

DanjalZockt Jun 10, 2026

chhavi-1234
Jun 6, 2026

Replies: 3 comments 1 reply

Crackle2K
Jun 6, 2026

chhavi-1234 Jun 6, 2026
Author

tanvishinde017
Jun 6, 2026

DanjalZockt
Jun 10, 2026