Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a microphone #80

Closed
yakovw opened this issue Jun 19, 2023 · 5 comments
Closed

Using a microphone #80

yakovw opened this issue Jun 19, 2023 · 5 comments
Labels
enhancement New feature or request examples

Comments

@yakovw
Copy link

yakovw commented Jun 19, 2023

Is there a way in the library to use the microphone and not just transcribe an existing recording?
because the original library has
in whisper.cpp

@sandrohanea
Copy link
Owner

sandrohanea commented Jun 19, 2023

@adamnova added something like this to the demo in: #9

However, I didn't wanted to add this Naudio dependency on the full demo, but since then, each example is done in a different project where there is no problem to have NAudio.

I think it makes sense to move it as a standalone example.

Also, for the best mic support, continuous recognition is also a must: #25
Otherwise, transcript can be bad near "merging" segments.

I would also add some mic example for blazor, as that would be pretty cool.

@adamnova
Copy link
Contributor

My demo was basically a proof of concept, it is not very usable in practice. Without the continuous recognition, all you get is somewhat repeating lines of text.

@sandrohanea sandrohanea added the enhancement New feature or request label Jul 1, 2023
@jbienz
Copy link

jbienz commented Oct 23, 2023

It appears there is now continuous recognition here:

https://github.com/sandrohanea/whisper.net/tree/main/examples/ContinuousRecognition

Though it appears that's an example rather than part of core, is there a chance of getting a microphone sample now?

@danroot
Copy link

danroot commented Nov 7, 2023

I was able to get realtime transcription from the mic working on my M1 Mac using the code below, which uses OpenTK.OpenAL. This is stitched together from various SO posts, and could be improved, but may be helpful to others looking to do similar. I ended up having to get the CoreML model manually, unzipping, and putting it in the current folder. Ideally IMO Whisper.net would "just work" and download this model when on apple silicon, similar to how it does the base .bin model.

The other "gotcha" I ran into was that I needed to specify a float[] buffer and ALFormat.MonoFloat32Ext capture.

     var modelName = "ggml-base.bin";
        //TODO: also https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base-encoder.mlmodelc.zip
        if (!File.Exists(modelName))
        {
            Console.WriteLine("Downloading whisper model...");
            using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(GgmlType.Base);
            using var fileWriter = File.OpenWrite(modelName);
            await modelStream.CopyToAsync(fileWriter);
        }
        using var whisperFactory = WhisperFactory.FromPath(modelName);

        using var processor = whisperFactory.CreateBuilder()
            .WithLanguage("en")
            .Build();
        int bufferLength = 10 * 16000;//10 sec
        var mic = ALC.CaptureOpenDevice(null, 16000, ALFormat.MonoFloat32Ext, bufferLength);
        Console.WriteLine("Using:");
        Console.WriteLine(ALC.GetString(new ALDevice(mic.Handle), AlcGetString.DeviceSpecifier));
        var currentInput = new StringBuilder();
        ALC.CaptureStart(mic);
        var buffer = new float[bufferLength];
       
        for (int i = 0; i < 100; ++i)
        {
            Thread.Sleep(1000);
            int samplesAvailable = ALC.GetAvailableSamples(mic);
            ALC.CaptureSamples(mic, buffer, samplesAvailable);

            if (samplesAvailable > 0)
            {            
               await foreach (var resultData in processor.ProcessAsync(buffer[..samplesAvailable]))
                {
                   Console.WriteLine("RAW:" + resultData.Text);   
                }
            }

        }
        ALC.CaptureStop(mic);
        ALC.CaptureCloseDevice(mic);
       
`

@sandrohanea
Copy link
Owner

Will close any issue related to streaming processing as linked to: #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request examples
Projects
None yet
Development

No branches or pull requests

5 participants