Deepgram nova-3-medical model mis-transcribes medical terminology & drug names — how to improve accuracy? #1466
Replies: 4 comments 1 reply
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
|
Any update from DeepGram on this thread. We are experiencing similar issues with DeepGram with medical grade transcriptions. e.g. Blood Pressure spoken as onehundred over eighty is being returned as 100 over 80. This far from what a medical transcription should be doing. I am sure we are doing something wrong. Kindly advise. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
We are evaluating Deepgram’s nova-3-medical ASR model, but we are not getting the required accuracy for medical terminology. Many drug names and clinical terms are mis-transcribed, which is not acceptable for our use case.
Issue Description
Even with recommended settings, the transcription output fails on multiple important medical terms.
We tested using this publicly available dataset:
🔗 https://observer.med.upenn.edu/dataset/explore
Below is a comparison between the reference transcript and our Deepgram output:
(Example: anonymized_deepgram_enhanced_diff_1_1.html)
anonymized_deepgram_enhanced_diff 1 1.html
Several drug names and medical terms are incorrectly transcribed.
Current Request Configuration
request = audio_data
model = "nova-3-medical"
language = "en-US"
smart_format = True
diarize = True
utterances = True
We also tried alternate formatting and diarization settings, but accuracy did not improve significantly.
What We Tried
Smart formatting on/off
Diarization on/off
Verified audio sampling rate
Tested premium/enhanced settings
Multiple files from the dataset
Still no major improvement in medical-term accuracy.
Looking for Suggestions
Does anyone have advice or experience improving transcription accuracy for medical terminology/drug names using Deepgram?
Specifically, looking for:
boosting options
Any additional nova-3-medical parameters
Preprocessing tips for clinical audio
Alternative Deepgram models for medical ASR
Known limitations of nova-3-medical
Any recommendations or insights would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions