-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcription having lot of spelling errors and getting wrong word segments(although phonetically correct some times) #1817
Comments
Please elaborate. You took 0.3.0 model
What are the exact original and converted audio specs ? Conversion could add artifacts that messes up with recognition. |
@lissyx i didnt use the 0.3.0 model because it gave that error no softmax layer. I used the reuben release 0.2.0-ctc-decode which i got from another github issue. The link for the model is https://github.com/reuben/DeepSpeech/releases/tag/v0.2.0-prod-ctcdecode All the other files lm, trie i have given links to which i used. And the original audio that i downloaded from youtube and converted to audio with youtube-dl package - RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz And the converted audio with ffmpeg(specs) - RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz and i also want to know if conversion can mess up recognition where and how do i get the audio specs deep speech will work with . I mean like how do i convert because mostly audio i have will be in a different sample rate mostly 44100k than 16k and even if i train i will train with the converted samples only. and if converted samples create problem how to resolve this Also one more thing i want to ask i was trying to train but i couldnt find the checkpoint for the reuben's release so how to train and what to put in checkpoint_dir to train from pre release model? do i have to put 0.2.0 checkpoint or the output_graph files can be used for training above them . And how to use output_graph to train if that is possible because in the readme its written only to give the checkpoint directory? |
As you look to have a mix-and-match model, what might be easier, instead of tracking down the problem, is to just wait until Monday when we are planning on doing the 0.4.0 release, then use that. |
So you have noisy, music background PCM 16-bits stereo 44.1kHz converted to PCM 16-bits mono 16kHz ? Devil's lies in details, it might also come from how you perform the ffmpeg conversion. Check on Discourse there are some good examples. |
@lissyx i converted the exact same way as specified in one of the discourse posts....with parameters -acodec pcm_s16le -ac 1 -ar 16000 . And i get it if noisy sample is giving errors but i have checked on atleast 15 samples witg interviews of tim cook and elon musk....clear non noisy data but still there is spelling error like here becomes hear and evolution is evil lution. Above i only gave two samples one noisy background and one without noisy background. How do i improve this result. And i also want to know if you could tell me when i train on the pre trained model what all files do i need to give in checkpoint_dir.... Which version of the checkpoint( as suggested by kdavis i can wait till monday for the latest checkpoint as you would release 0.4.0) and does it need to contain lm and trie or do i need to give thembas seperate arguments because in the readme it shows that i only need to give checkpoint_dir. And as i am new to training i wanted to know if i am thinking correctly that i cant train on top of output graph file but i can only train on the checkpoint file right? |
Can you please start by explaining exactly how you run things ? And use proper formatting ? Your first post is barely readable, it's painful to distinguish between your statements, your questions, and your console output. Can you verify with the basic tools, like |
@lissyx sorry for not formatting it well. Please have a look at the details below: I used the latest alpha release of deep speech 0.4.0-alpha.3
To convert the audio i was using:
Now my questions:
I still have to use evaluate.py . I ll use it and update my comment. Hopefully this is better formatting. Thanks in advance |
Thanks, that's much more readable, though transcriptions could have been in a separate gist :) |
Interesting. I know we did some demo of some live youtube video transcription taking audio output from the system directly, and getting pretty good results with the streamng, likely much better. This was using the streaming API, but with some other VAD. Also, I don't know what |
AAC being lossy, we're likely getting some artifacts. @raghavk92 Could you give a try by not passing |
This the exact output of the youtube-dl package.
according to this output it seems that it is downloading m4a file with the help of the extractor as i have given below and then converting to wav with the post processor using ffmpeg for which i have provided the link below.So i dont think AAC stereo file is being fetched but m4a file is and i think its being converting with ffmpeg. Do you want me to convert m4a to wav myself with ffmpeg? Or am i not understanding something that you said..please explain if i am wrong...please suggest what to do. i think the extractor being used is from this link https://github.com/rg3/youtube-dl/blob/4bede0d8f5b6fc8d8e46ee240f808935e03eafa2/youtube_dl/extractor/youtube.py and the post processor for audio extraction is ffmpeg from this link : https://github.com/rg3/youtube-dl/blob/4bede0d8f5b6fc8d8e46ee240f808935e03eafa2/youtube_dl/postprocessor/ffmpeg.py |
@raghavk92 Well, that's exactly what's I'm saying. Youtube-dl fetches raw audio in the m4a and it's AAC:
And then when you ask for
And in the current description of the issue, you are using that last result WAV to again convert. You should try to do the ffmpeg conversion from the m4a. |
@lissyx I tried with converting directly from m4a file to wav 16k file with the command: I am attaching the output in a file but the output transcription has no change than with the previous file. Please suggest if i should try something differently |
Can you please be more exhaustive when you say "no change" ? How did you run the transcription ? |
@lissyx I run this command and got the transcription :
I am using the general way of running a transcription in virtual environment. deepspeech 0.4.0 alpha 3. I am not sure if this is what you asked for . If i gave wrong information , Please tell ..i ll give whatever i can. With no change i mean i compared both the transcription(the earlier one which i have in the earlier post and the one that i attached now) . I read both and compared them side by side. Didnt compare programmatically but just compared by reading both and their seems to be no change in the transcription by directly converting from m4a to 16k wav file) |
Honestly, I don't know. Maybe there's some weird unaudible noise in the original recording / from the original upload that breaks us ? Have you tested the ffmpeg-based VAD tool ? |
are you reffering to this: https://github.com/mozilla/DeepSpeech/tree/master/examples/ffmpeg_vad_streaming I havent tried the above link but i have tried this below: I ll try the ffmpeg_vad_streaming but like do you want me to pass the audio file like or do you want me to do this rtmp stream like: |
Help yourself and read the documenation, as well as the code, |
Even extracting the first 10 secs of audio converted directly from the AAC does not help. |
@raghavk92 A good point mentionned by @reuben and that I forgot: features computations changed, so model from the link you have with binaries 0.4.0-alpha.3 will produce broken output. So if you could include the full output including versions, we would all win some time ... |
@raghavk92 Ok, we now have released a new 0.4.1, could you re-test on your side ? Early testing here shows it improves. |
@lissyx Hi, I have 3 questions regarding a few problems i am facing :
So some transcriptions got better some got worse.
The accuracy was 85% after this. But i tried to automate this with sox package for ubuntu
(i also tried with different levels of aggressivness like 0.3,0.05,0.1 etc but not much change in transcription) The trancription became bad. I think it damaged the voice audio while noise reducing with sox.Do you know a better way for noise reduction and get better transscription.? And if i need to better transcribe a file which has background music is there any other way(like would training help and how many samples would i be needing)? Thanks |
@raghavk92 All those questions would need to be on Discourse, Github issues are really only dedicated to bugs / features in the codebase. |
@lissyx i think there is a bug. I ll post the rest of the queestions on discourse. So the file that i tested are as below: The old transcription files and new transcription files are being attached . You will be able to see most of the correct part is transcribed wrongly with the new version. Only very small things like chinaman converts to china man or tradeetions has become correctly trade tensions but other things have gone wrong attachments below: I ll give examples:
I dont know if this is a bug but there is some problem that has been introduced in the new version. |
There's way too much noise on your message for me to be able to understand anything. Once you say it's okay, then you say it's not. |
@lissyx So i am saying that some places where 0.4.0 was correct is now wrongly tanscribed. Some places have become correct which were earlier wrong(This has a lesser frequency than the correct words becoming wrong) I hope the examples format are clear to understand where 0.4.0 was correct has now become wrong with 0.4.1 So i dont know what the exact problem is. Shouldn't it be like the older transcription should become better with the new version without damaging the earlier correctly transcribed parts. |
if you are refering to my earlier messages(before the ones i wrote today) where i said there is not much change between 0.4.0 and 0.4.1 . At that time i didnt change the models folder. So that was wrong feedback. |
I'm sorry, but your bug is really too messy right now, you refer to a lot of different trials, I have absolutely no idea of your system setup / training status at this point. Forget about |
@lissyx So i thought all the other details i mentioned are in the earlier posts about the system. i thought i have already given them.But again i ll mention. I didnt know i have to write the information with every post.Details below:
everything of the setup is as you had told me to do . thanks |
Thing is you tested a lot of different combination, it's hard to know exactly what's your current status if you don't describe it.
Here again, it's complicated to track "first comment today", either explicit it, or link to it. We are not all on the same timezone, your "today" might be different than mine.
So only
At some point, I'm starting to wonder if this video is just a bad case, maybe the audio contains inaudible noise that breaks our current models ? |
@lissyx Sorry for not linking my earlier post: the files are in that post and the examples(which are exerpts) are also in that post Yes i used only native_client . deepspeech command from cmd. Ya i dont know if this is because its a bad example or not because some of the parts which were correct in 0.4.0 now became wrong with 0.4.1 I was telling you so that you know about the problem. So it can be corrected if you see a bug. And because i am not sure which model to do my training(transfer learning) on that is also why i wanted to let you know the problems with the current version we are having . i am thinking of doing on 0.4.0 but Which model version do you suggest and why. And if i train from scratch from the common voice data . Which version is better to train on. Should i use 0.4.0 files or 0.4.1 files because we were thinking of using 0.4.0 for training because it had better transcription . Is that a correct metric? |
Don't use 0.4.0 for anything, we uploaded the wrong checkpoint and model, from a completely unrelated training job. Either use 0.3.0, or 0.4.1. |
@reuben thanks for that info but just wanted to give one example
So its just that the wrong model is giving correct and better result. So should we wait for the next release(would that be happening soon) or train with this because if we train and again get the similar mistakes with 0.4.1 then training job from scratch will cost us in aws and wont get a good model also. Sorry for asking again but Thanks |
With 0.4.0 comes from 0.3.0 but was then trained further on Italian data. If it's working better for you, then try using 0.3.0. I recommend using 0.4.1 for any experiments, as it has a lot of small improvements that add up to a higher quality model. |
Closing due to inactivity. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Hi,
I was trying to transcribe two different audio samples.
One has a bit of backgroud music. I actually extracted audio from an apple ad where jonathan ive speaks with a really clear voice but has background music.I converted to 16000 samples a second as required by deepspeech I found a lot of spelling errors.
Mistakes like evolution is spelt evil lution. And its an apple watch ad. So how do i correct this. I tried to use the latest lm , trie models still the transcription is bad.
I ll list what i used but please tell what should i use.
I used the latest alpha release of deep speech 0.4.0-alpha.3 as the stable release was giving really bad results. I used output_graph from reuben’s release because the 0.3.0 was giving very bad results as it was just gibberish and nothing of vcalue was there in the transcription for 0.3.0 models and this fix was providing in the github issue #1156
output graph of reuben’s release:
GitHub 1
reuben/DeepSpeech
A TensorFlow implementation of Baidu's DeepSpeech architecture - reuben/DeepSpeech
lm and trie i used from https://github.com/mozilla/DeepSpeech/tree/master/data/lm
and alphabet.txt i used from the 0.3.0 models release in the github readme.The alphabets.txt maybe from this link but i am not sure right now: https://github.com/mozilla/DeepSpeech/tree/master/data
So the transcription that i get for apple ad : https://www.youtube.com/watch?v=6EiI5_-7liQ
transcription is : e e e in i an an an enemple agh seres for is more than an evil lution erepresents a fundamental redesin anryengineering of apple watchretaining the riginal i comicg design veloped ury find the for olsimanaging to make it fine be new display is now oven birty percen larger and is seemlessly integrated into the product the interface as been read deigned fron you tiplay providing more information with rich a detail the heard wore hand the software combine to define a very new and truly intergrated singular design novigating with the digital crown olready one of the most intricat makhalisms wit ever created has been intirely igreengineeredwith hapti feeback dilivering a presise ecannical field as idrol in addition to an obtea hasanco the is a new applepizine ilectrical hars and se to the lousutitake in electra cardia graham or easy ge to share with your doctor a momnentesichievement for a were of a divice placing a finger on the tigital crownd i eeplose cerkid with a lectrods on the bank providing dater the easy g busesanaliz your harid whole understanding hea health is a sential to ou well bei aditional features in in harmsmans in courag es ti live and overall healther or tantive life the excela romiter girescove an alfliter allow you to recall youtypes of workelse measure runs withincreased presision and tra your all day activity with great accuracy in hart selilar connectiv ity in tabu something prulyliberating the obility distaklinected with just your wach fon case music streaming and even a mergency essistence ol immediately evolable from your restch eries for is a device so powerful so postnal so liperating i con change the way ou liveach day
and for the other file link is : https://www.youtube.com/watch?v=GnGI76__sSA
and the transcipption with vad transcriber is -
DEBUG:root:Processing chunk 00
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.720s for 5.880s audio file.
DEBUG:root:Transcript: stevies to um saye o me and heused to saye is a lut
DEBUG:root:Processing chunk 01
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.292s for 1.470s audio file.
DEBUG:root:Transcript: jonny
DEBUG:root:Processing chunk 02
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.337s for 1.620s audio file.
DEBUG:root:Transcript: is it that the idea
DEBUG:root:Processing chunk 03
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.282s for 1.530s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 04
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.772s for 3.750s audio file.
DEBUG:root:Transcript: and sometimes they wore
DEBUG:root:Processing chunk 05
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.639s for 3.180s audio file.
DEBUG:root:Transcript: really do pe
DEBUG:root:Processing chunk 06
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.918s for 4.410s audio file.
DEBUG:root:Transcript: sometimes they would tru to dreadful
DEBUG:root:Processing chunk 07
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.632s for 3.090s audio file.
DEBUG:root:Transcript: sometimes they of the air from the room
DEBUG:root:Processing chunk 08
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.638s for 3.000s audio file.
DEBUG:root:Transcript: an me liftis poth completely silent
DEBUG:root:Processing chunk 09
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.845s for 4.200s audio file.
DEBUG:root:Transcript: od crazy magninificen ideas
DEBUG:root:Processing chunk 10
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.403s for 2.010s audio file.
DEBUG:root:Transcript: whire simple ones
DEBUG:root:Processing chunk 11
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.371s for 1.890s audio file.
DEBUG:root:Transcript: hin this sufflety
DEBUG:root:Processing chunk 12
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.288s for 1.470s audio file.
DEBUG:root:Transcript: tee tal
DEBUG:root:Processing chunk 13
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.352s for 1.740s audio file.
DEBUG:root:Transcript: eatto e profound
DEBUG:root:Processing chunk 14
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.366s for 1.860s audio file.
DEBUG:root:Transcript: just i speve
DEBUG:root:Processing chunk 15
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.382s for 1.950s audio file.
DEBUG:root:Transcript: loved ydeas
DEBUG:root:Processing chunk 16
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.434s for 2.160s audio file.
DEBUG:root:Transcript: an loved maan stuff
DEBUG:root:Processing chunk 17
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.513s for 2.550s audio file.
DEBUG:root:Transcript: he treated the process
DEBUG:root:Processing chunk 18
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.094s for 5.370s audio file.
DEBUG:root:Transcript: treativeity with the rare and a wonderful reverence
DEBUG:root:Processing chunk 19
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.871s for 4.260s audio file.
DEBUG:root:Transcript: is the i think he better than any one understood
DEBUG:root:Processing chunk 20
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.017s for 5.010s audio file.
DEBUG:root:Transcript: wile ideas oltemately can be so powerful
DEBUG:root:Processing chunk 21
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.598s for 2.970s audio file.
DEBUG:root:Transcript: egin as fratile
DEBUG:root:Processing chunk 22
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.383s for 1.920s audio file.
DEBUG:root:Transcript: e fomd thoughts
DEBUG:root:Processing chunk 23
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.123s for 5.490s audio file.
DEBUG:root:Transcript: so esily mistd so easily compromise so isily josquift
DEBUG:root:Processing chunk 24
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.909s for 4.230s audio file.
DEBUG:root:Transcript: on love the way that he listened so intendly
DEBUG:root:Processing chunk 25
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.432s for 2.190s audio file.
DEBUG:root:Transcript: loved his perseption
DEBUG:root:Processing chunk 26
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.582s for 2.910s audio file.
DEBUG:root:Transcript: is remarkable sensitive ity
DEBUG:root:Processing chunk 27
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.544s for 2.700s audio file.
DEBUG:root:Transcript: nd his surgecly preciseieinion
DEBUG:root:Processing chunk 28
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.350s for 1.920s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 29
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.551s for 2.700s audio file.
DEBUG:root:Transcript: i really believe there was a beuty
DEBUG:root:Processing chunk 30
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.869s for 4.410s audio file.
DEBUG:root:Transcript: e sehela how meen his insih was
DEBUG:root:Processing chunk 31
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.456s for 2.280s audio file.
DEBUG:root:Transcript: sometimes et could spey
DEBUG:root:Processing chunk 32
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.585s for 3.030s audio file.
DEBUG:root:Transcript: as um suremany you know
DEBUG:root:Processing chunk 33
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.022s for 4.920s audio file.
DEBUG:root:Transcript: steve didn’t comfined his sensif excellent to make him products
DEBUG:root:Processing chunk 34
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.544s for 2.610s audio file.
DEBUG:root:Transcript: you a wo we travel together
DEBUG:root:Processing chunk 35
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.356s for 1.770s audio file.
DEBUG:root:Transcript: wold check hin
DEBUG:root:Processing chunk 36
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.387s for 1.920s audio file.
DEBUG:root:Transcript: t gop to my room
DEBUG:root:Processing chunk 37
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.868s for 4.260s audio file.
DEBUG:root:Transcript: nat leave my bags thery needly but te door
DEBUG:root:Processing chunk 38
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.239s for 6.390s audio file.
DEBUG:root:Transcript: with numat
DEBUG:root:Processing chunk 39
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.814s for 4.080s audio file.
DEBUG:root:Transcript: gon si on the bed
DEBUG:root:Processing chunk 40
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.061s for 5.220s audio file.
DEBUG:root:Transcript: on si on the bed next to the fhun
DEBUG:root:Processing chunk 41
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.283s for 1.470s audio file.
DEBUG:root:Transcript: wat
DEBUG:root:Processing chunk 42
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.434s for 2.130s audio file.
DEBUG:root:Transcript: n evetible fone cal
DEBUG:root:Processing chunk 43
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.631s for 12.990s audio file.
DEBUG:root:Transcript: ony this hoodself soctless go
DEBUG:root:Processing chunk 44
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.308s for 1.560s audio file.
DEBUG:root:Transcript: used to joe
DEBUG:root:Processing chunk 45
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.631s for 3.150s audio file.
DEBUG:root:Transcript: lunitics a takean over the assinem
DEBUG:root:Processing chunk 46
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.576s for 2.760s audio file.
DEBUG:root:Transcript: swe shard gedioxsignment
DEBUG:root:Processing chunk 47
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.090s for 5.070s audio file.
DEBUG:root:Transcript: spending months and months working on a part of a product
DEBUG:root:Processing chunk 48
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.493s for 2.310s audio file.
DEBUG:root:Transcript: nobody with ever see
DEBUG:root:Processing chunk 49
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.290s for 1.380s audio file.
DEBUG:root:Transcript: owith the rese
DEBUG:root:Processing chunk 50
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.872s for 4.020s audio file.
DEBUG:root:Transcript: did it because we because we really believed that it was right
DEBUG:root:Processing chunk 51
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.276s for 1.410s audio file.
DEBUG:root:Transcript: cause we cared
DEBUG:root:Processing chunk 52
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.542s for 2.520s audio file.
DEBUG:root:Transcript: elieved that there was a grammidty
DEBUG:root:Processing chunk 53
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.751s for 3.570s audio file.
DEBUG:root:Transcript: umast ascensive civic responsibility
DEBUG:root:Processing chunk 54
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.452s for 2.280s audio file.
DEBUG:root:Transcript: so care wavbyyongs
DEBUG:root:Processing chunk 55
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.619s for 2.940s audio file.
DEBUG:root:Transcript: and e sot of functional imperative
DEBUG:root:Processing chunk 56
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.108s for 0.630s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 57
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.340s for 1.800s audio file.
DEBUG:root:Transcript: wok
DEBUG:root:Processing chunk 58
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.488s for 2.340s audio file.
DEBUG:root:Transcript: hoopfully appeared in evi table
DEBUG:root:Processing chunk 59
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.309s for 1.560s audio file.
DEBUG:root:Transcript: hid simple
DEBUG:root:Processing chunk 60
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.225s for 1.140s audio file.
DEBUG:root:Transcript: teasy
DEBUG:root:Processing chunk 61
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.301s for 1.500s audio file.
DEBUG:root:Transcript: really cost
DEBUG:root:Processing chunk 62
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.323s for 1.650s audio file.
DEBUG:root:Transcript: cost te soledin i
DEBUG:root:Processing chunk 63
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.460s for 2.190s audio file.
DEBUG:root:Transcript: you know i cost him most
DEBUG:root:Processing chunk 64
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.312s for 1.500s audio file.
DEBUG:root:Transcript: cared the most
DEBUG:root:Processing chunk 65
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.956s for 4.620s audio file.
DEBUG:root:Transcript: he wo in the most deeply he constantly questioned
DEBUG:root:Processing chunk 66
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.290s for 1.380s audio file.
DEBUG:root:Transcript: this good enough
DEBUG:root:Processing chunk 67
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.245s for 1.230s audio file.
DEBUG:root:Transcript: this right
DEBUG:root:Processing chunk 68
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.530s for 2.610s audio file.
DEBUG:root:Transcript: dispite all his successis
DEBUG:root:Processing chunk 69
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.404s for 2.040s audio file.
DEBUG:root:Transcript: his achievements
DEBUG:root:Processing chunk 70
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.089s for 5.220s audio file.
DEBUG:root:Transcript: never presued he never assumed thet we would get there in the end
DEBUG:root:Processing chunk 71
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.397s for 2.010s audio file.
DEBUG:root:Transcript: nideas didn’t come
DEBUG:root:Processing chunk 72
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.529s for 2.640s audio file.
DEBUG:root:Transcript: the proace it types faled
DEBUG:root:Processing chunk 73
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.778s for 3.840s audio file.
DEBUG:root:Transcript: it was with great intent with faith
DEBUG:root:Processing chunk 74
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.477s for 2.400s audio file.
DEBUG:root:Transcript: he decided to believe
DEBUG:root:Processing chunk 75
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.298s for 1.530s audio file.
DEBUG:root:Transcript: then shally
DEBUG:root:Processing chunk 76
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.317s for 1.530s audio file.
DEBUG:root:Transcript: a something greaght
DEBUG:root:Processing chunk 77
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.539s for 2.730s audio file.
DEBUG:root:Transcript: joy of getting man
DEBUG:root:Processing chunk 78
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.526s for 2.640s audio file.
DEBUG:root:Transcript: i loved is infhusiasm
DEBUG:root:Processing chunk 79
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.484s for 2.430s audio file.
DEBUG:root:Transcript: simple thelight
DEBUG:root:Processing chunk 80
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.474s for 2.370s audio file.
DEBUG:root:Transcript: ma i mixed with serilief
DEBUG:root:Processing chunk 81
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.423s for 2.130s audio file.
DEBUG:root:Transcript: the year we got there
DEBUG:root:Processing chunk 82
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.319s for 1.590s audio file.
DEBUG:root:Transcript: we got there in the end
DEBUG:root:Processing chunk 83
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.233s for 1.140s audio file.
DEBUG:root:Transcript: ahe was good
DEBUG:root:Processing chunk 84
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.448s for 2.250s audio file.
DEBUG:root:Transcript: conceise smile conye
DEBUG:root:Processing chunk 85
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.010s for 4.710s audio file.
DEBUG:root:Transcript: selebration of making something grat for everybody
DEBUG:root:Processing chunk 86
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.662s for 3.270s audio file.
DEBUG:root:Transcript: enjoying the defeat of sinisism
DEBUG:root:Processing chunk 87
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.439s for 6.600s audio file.
DEBUG:root:Transcript: rjection of reason the rejection of being told a hundred times in condo that
DEBUG:root:Processing chunk 88
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.733s for 3.570s audio file.
DEBUG:root:Transcript: so hes i think was in victory for beauty
DEBUG:root:Processing chunk 89
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.307s for 1.560s audio file.
DEBUG:root:Transcript: pperity
DEBUG:root:Processing chunk 90
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.605s for 2.970s audio file.
DEBUG:root:Transcript: he would say for givein at dham
DEBUG:root:Processing chunk 91
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.840s for 4.140s audio file.
DEBUG:root:Transcript: he was my closeess and we must loa friend
DEBUG:root:Processing chunk 92
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.090s for 9.300s audio file.
DEBUG:root:Transcript: together fornerly fitteen years and he still laughed to the way i sad ali minum
DEBUG:root:Processing chunk 93
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.487s for 2.340s audio file.
DEBUG:root:Transcript: past tothe weeks
DEBUG:root:Processing chunk 94
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.968s for 4.410s audio file.
DEBUG:root:Transcript: wh we ill bing struggling to find ways to save tood by
DEBUG:root:Processing chunk 95
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.342s for 1.620s audio file.
DEBUG:root:Transcript: t smooning
DEBUG:root:Processing chunk 96
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.380s for 1.920s audio file.
DEBUG:root:Transcript: smply once who weren
DEBUG:root:Processing chunk 97
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.372s for 1.860s audio file.
DEBUG:root:Transcript: ank you staye
DEBUG:root:Processing chunk 98
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.628s for 3.000s audio file.
DEBUG:root:Transcript: f youl remarkable vision
DEBUG:root:Processing chunk 99
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.332s for 1.620s audio file.
DEBUG:root:Transcript: ichis inited
DEBUG:root:Processing chunk 100
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.319s for 1.590s audio file.
DEBUG:root:Transcript: nspired
DEBUG:root:Processing chunk 101
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.526s for 2.550s audio file.
DEBUG:root:Transcript: this extraordinary groups of people
DEBUG:root:Processing chunk 102
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.525s for 2.580s audio file.
DEBUG:root:Transcript: for the oll the weav hof men from you
DEBUG:root:Processing chunk 103
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.781s for 3.660s audio file.
DEBUG:root:Transcript: nfor all thet we will continue to learn from each other
DEBUG:root:Processing chunk 104
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.200s for 1.050s audio file.
DEBUG:root:Transcript: st
DEBUG:root:Processing chunk 105
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.926s for 9.900s audio file.
DEBUG:root:Transcript: ee
The results are sometime phonetically correct but the transcription is full of spelling errors as above.
So how should i improve this transcription. should i use different models but where do i get them from. How can i improve this without training because i dont have annotated samples.
And if it needs training how much minimum training it needs and how do i train it in the most minimum way possible to get a good transcription . And how many minimum samples would i need to annotate and train to get a good transcription if training is needed.
I used discourse but didnt get any response
Thanks in advance
Raghav
The text was updated successfully, but these errors were encountered: