## Building a Demo

Now that we've fine-tuned our model we can build a demo to show
off its ASR capabilities! We'll make use of 🤗 Transformers
`pipeline`, which will take care of the entire ASR pipeline,
right from pre-processing the audio inputs to decoding the
model predictions.

Running the example below will generate a Gradio demo where we
can record speech through the microphone of our computer and input it to
our fine-tuned Whisper model to transcribe the corresponding text:

In [None]:
"""
from transformers import pipeline
import gradio as gr

#/content/drive/MyDrive/testing_model

#pipe = pipeline(model="jdowling/whisper-small-hi")  # change to "your-username/the-name-you-picked"
pipe = pipeline(model="mkbackup/testing_model")

def transcribe(audio):
    text = pipe(audio)["text"]
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text",
    title="Whisper Small Swedish",
    description="Realtime demo for Swedish speech recognition using a fine-tuned Whisper small model.",
)

iface.launch()
"""

'\nfrom transformers import pipeline\nimport gradio as gr\n\n#/content/drive/MyDrive/testing_model\n\n#pipe = pipeline(model="jdowling/whisper-small-hi")  # change to "your-username/the-name-you-picked"\npipe = pipeline(model="mkbackup/testing_model")\n\ndef transcribe(audio):\n    text = pipe(audio)["text"]\n    return text\n\niface = gr.Interface(\n    fn=transcribe,\n    inputs=gr.Audio(source="microphone", type="filepath"),\n    outputs="text",\n    title="Whisper Small Swedish",\n    description="Realtime demo for Swedish speech recognition using a fine-tuned Whisper small model.",\n)\n\niface.launch()\n'

In [None]:

# SKIP IF DOING FROM START

!pip install git+https://github.com/huggingface/transformers
!pip install gradio
!pip install torch

from huggingface_hub import notebook_login

notebook_login()

# https://huggingface.co/settings/tokens

In [None]:
# The old method
# Use Directly from Audio File

from transformers import pipeline
import gradio as gr

#/content/drive/MyDrive/testing_model

#pipe = pipeline(model="jdowling/whisper-small-hi")  # change to "your-username/the-name-you-picked"
pipe = pipeline(model="mkbackup/final_model")

def transcribe(audio_path):
    text = pipe(audio_path)["text"]
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(type="filepath", label="Upload Audio File"),
    outputs="text",
    title="Whisper Small Bengali",
    description="Real-time demo for Bengali speech recognition using a fine-tuned Whisper small model.",
)

iface.launch()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://cdc680772eca3a52fd.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
# INSTALL THIS FOR YOUTUBE TRANSCRIPTION

!pip install pytube

Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/57.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytube
Successfully installed pytube-15.0.0


In [44]:
# Test YouTube link transcription
# https://www.youtube.com/watch?v=0xKQcfaXzKA #

from pytube import YouTube
import os
from pydub import AudioSegment
import gradio as gr

# CHANGE TO YOUR MODEL
pipe = pipeline(model="mkbackup/final_model")

def downsample_audio(input_file, output_file, target_sample_rate=16000, output_format="mp3"):

    audio = AudioSegment.from_file(input_file)
    audio = audio.set_frame_rate(target_sample_rate)
    audio.export(output_file, format=output_format)


def download_and_downsample(youtube_url, target_sample_rate=16000):
    yt = YouTube(youtube_url)
    video = yt.streams.filter(only_audio=True).first()

    # Saving in current directory
    destination = "."
    out_file = video.download(output_path=destination)

    # Save the file with the name "output.mp3"
    new_file = os.path.join(destination, "output.mp3")
    os.rename(out_file, new_file)

    # Downsample the audio and save to a new file
    downsampled_file = os.path.join(destination, "output_16kHz.mp3")
    downsample_audio(new_file, downsampled_file)

    # Return the path to the downsampled audio file
    text = pipe(downsampled_file)["text"]
    return text


# Gradio Interface for downloading and downsampling
iface = gr.Interface(
    fn=download_and_downsample,
    inputs=gr.Textbox(type="text", label="YouTube Video Link"),
    outputs=gr.Text(type="text", label="Transcribed Audio"),
    title="YouTube Audio Transcription",
    description="Download audio from YouTube video, downsample to 16 kHz, and transcribe the youtube video",
)

iface.launch(debug=True)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://49c76d88e2c07c07cd.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7872 <> https://49c76d88e2c07c07cd.gradio.live


