Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech after a particular interval #68

Open
IrfanAli17899 opened this issue Dec 19, 2023 · 10 comments
Open

Speech after a particular interval #68

IrfanAli17899 opened this issue Dec 19, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@IrfanAli17899
Copy link

IrfanAli17899 commented Dec 19, 2023

Hi, first of all thank you so much for this awesome package, its working great for me, the only thing that i am wondering if i can get the speech after each 5 seconds instead of pause, because right now if user continuously speaks so it doesn't give us results and this shows a kind of latency in the UI. please help thanks in advance.

@HayatoYagi
Copy link
Collaborator

Thanks for your proposal! Your idea is nice, but it has a problem that the audio can be splitted in the middle of a word.
I'm not sure it should be implemented.

When I wanted to make each speech segment shorter, I tried reducing redemptionFrames and increasing negativeSpeechThreshold.

@ricky0123
Copy link
Owner

Hi @IrfanAli17899, I'm glad this package has been useful for you. So in other words, throughout a continuous speech period, you would like to have a callback that runs on a regular 5 second interval and takes as an argument the current raw audio of the speech segment? Can I ask what kind of UI updates you are referring to? I would like to understand the use case better.

@IrfanAli17899
Copy link
Author

IrfanAli17899 commented Dec 19, 2023

Hi @ricky0123 yes you are right, actually the audio i am getting from your package, i am feeding that audio to gpt for transcription and translation and then i show those results on the frontend, i am trying to make a real time translator, the problem is the library doesn't provide audio segments untill user don't stop speaking, i want a smooth audio segment on a regular interval so that if user contineously speaks without stopping then i could still show the transcription and translation results. do you get it? let me know if you need more explanation of the use case, thanks much.

@ricky0123
Copy link
Owner

Hi @IrfanAli17899, thanks for the clarification. Have you considered streaming audio from the browser to your server and doing all of the audio processing there, instead of using this package?

Potentially what we could do is provide a method on the vad object that allows you to get the current audio segment. That would allow you to experiment by creating a timer that queries the current audio and sends it to your server.

@IrfanAli17899
Copy link
Author

yeah i tried the browser media recorder api to stream audio to the server, but as the first chunk is playable because it has all the necessary headers rest of the chunks are not so i had to merge all the chunks on the backend and then crop the last 5 seconds for the transcription so it was a very lengthy hectic solution that is why i tried your package.

yes it will be very helpful if there is a prop which takes the callback function and also another prop for interval and it can provide me chunks but each chunk should be playable itself i guess, then it will be useful for me. let me know what do you think, thanks @ricky0123

@ricky0123
Copy link
Owner

Hi @IrfanAli17899 what I'm saying is that we probably won't add a callback that runs on an interval, but I would be open to adding a method for you to get the raw audio at any given time, so you could do something like

myvad = await vad.MicVAD.new(...)
mytimer = createTimer(myIntervalLength, function() {
    const audio = myvad.getCurrentAudio()
    // send audio to server, etc
})

This would be easy to implement and more general. I'm not sure if the method you're describing of sending audio to your server on an interval will work, but this would at least allow you to try it out.

@IrfanAli17899
Copy link
Author

IrfanAli17899 commented Dec 20, 2023 via email

@IrfanAli17899
Copy link
Author

Hi, pardon @ricky0123 will that audio be vad model processed?

@ricky0123 ricky0123 added the enhancement New feature or request label Dec 22, 2023
@ashwin-maurya
Copy link

ashwin-maurya commented Apr 14, 2024

Hi @IrfanAli17899 what I'm saying is that we probably won't add a callback that runs on an interval, but I would be open to adding a method for you to get the raw audio at any given time, so you could do something like

myvad = await vad.MicVAD.new(...)
mytimer = createTimer(myIntervalLength, function() {
    const audio = myvad.getCurrentAudio()
    // send audio to server, etc
})

This would be easy to implement and more general. I'm not sure if the method you're describing of sending audio to your server on an interval will work, but this would at least allow you to try it out.

@ricky0123 Hello Ricky, thanks for the great package, is the above feature implemented in the package? I am looking for the same thing to do a real time stream of audio to the server!!

@IrfanAli17899
Copy link
Author

to @ricky0123 Hi, is it done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants