<a href="https://colab.research.google.com/github/jabriomar873/PCD-project-team/blob/main/speech_recognition_system_building_deploying_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deploying a Speech Recognition System Using the Whisper Model & Gradio

*This project aims to build and deploy a real-time speech recognition system using the Whisper model from OpenAI and the Gradio framework for easy web deployment. The system allows users to record audio using their microphone, process it through the Whisper model to convert speech into text, and display the transcribed text on the screen instantly.*

**Key Features:**

Advanced Speech Recognition: Uses the Whisper v3 model for high-accuracy transcription of multiple languages.

Flexible Input: Transcribes audio either through the user's microphone or by uploading an audio file.

Interactive Interface: Gradio-powered UI for an easy and intuitive user experience.

Scalable and Customizable: Easy to adapt and extend for different needs.


**Deployment:**

Deployable on Hugging Face Spaces .

Gradio simplifies deployment, while Whisper ensures accurate real-time transcription.

## 1. Setting up the Working Environment

In [15]:
#  installation of necessary packages such as HuggingFace’s transformers
#  and datasets, as well as soundfile, librosa, and gradio.

!pip install transformers
!pip install -U datasets
!pip install soundfile
!pip install librosa
!pip install gradio

Collecting tokenizers<0.19,>=0.14 (from transformers)
  Downloading tokenizers-0.15.2-cp311-none-win_amd64.whl.metadata (6.8 kB)
Downloading tokenizers-0.15.2-cp311-none-win_amd64.whl (2.2 MB)
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/

  You can safely remove it manually.

[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-win_amd64.whl.metadata (13 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
Downloading xxhash-3.5.0-cp311-cp311-win_amd64.whl (30 kB)
Installing collected packages: xxhash, multiprocess, datasets

   ------------- -------------------------- 1/3 [multiprocess]
   -------------------------- ------------- 2/3 [datasets]
   -------------------------- ------------- 2/3 [datasets]
   ---------------------------------------- 3/3 [datasets]

Successfully installed datasets-3.6.0 multiprocess-0.70.16 xxhash-3.5.0



[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting soundfile
  Downloading soundfile-0.13.1-py2.py3-none-win_amd64.whl.metadata (16 kB)
Downloading soundfile-0.13.1-py2.py3-none-win_amd64.whl (1.0 MB)
   ---------------------------------------- 0.0/1.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.0 MB ? eta -:--:--
   ---------- ----------------------------- 0.3/1.0 MB ? eta -:--:--
   ------------------------------ --------- 0.8/1.0 MB 1.5 MB/s eta 0:00:01
   ---------------------------------------- 1.0/1.0 MB 1.7 MB/s eta 0:00:00
Installing collected packages: soundfile
Successfully installed soundfile-0.13.1



[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting librosa
  Downloading librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Collecting audioread>=2.1.9 (from librosa)
  Downloading audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting soxr>=0.3.2 (from librosa)
  Downloading soxr-0.5.0.post1-cp311-cp311-win_amd64.whl.metadata (5.6 kB)
Downloading librosa-0.11.0-py3-none-any.whl (260 kB)
Downloading audioread-3.0.1-py3-none-any.whl (23 kB)
Downloading soxr-0.5.0.post1-cp311-cp311-win_amd64.whl (166 kB)
Installing collected packages: soxr, audioread, librosa

   -------------------------- ------------- 2/3 [librosa]
   ---------------------------------------- 3/3 [librosa]

Successfully installed audioread-3.0.1 librosa-0.11.0 soxr-0.5.0.post1



[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Deploy Application Demo with Gradio

Gradio is an open-source Python package that enables rapid development of demos or web applications for machine learning models, APIs, or any Python function. It allows you to quickly share your demo via a link using its built-in sharing features.

First, we will import the Gradio package and create an instance of the Blocks class. Blocks allows the creation of complex web applications by defining a layout with interactive components (e.g., buttons, sliders, text boxes) arranged in blocks.

In [1]:
import gradio as gr
demo = gr.Blocks()

### 2.1 MySQL Database

Next, we’ll connect to our MySQL database using the mysql library’s connector to enable database operations like querying and updating data.

In [9]:
import mysql
import mysql.connector

class connect :
    def __init__(self):
        self.db = mysql.connector.connect(
            host="127.0.0.1",
            user="root",
            password="root",
            database="database"
        )
        self.cursor = self.db.cursor()
    def close(self):
        self.cursor.close()
        self.db.close()
    def get(self):
        return (self.db, self.cursor)


here we will define all of the variables that will be used through out our deploiment process

In [10]:
db, cursor = connect().get()

session = {"authenticated": False, "username": None}

languages = {
    "french": "fr",
    "german": "de",
    "spanish": "es",
    "italian": "it",
    "chinese": "zh-cn",
    "japanese": "ja",
    "russian": "ru",
    "arabic": "ar"
}


### 2.2 Authentication

The following functions are responsible for handling the core authentication processes of the application, including user login, registration, and logout. These functions manage user input, validate credentials, interact with the database to store or retrieve user data, and maintain session states to ensure secure and seamless user access throughout the application.

In [11]:
def login(username, password):
    cursor.execute("SELECT password, is_admin FROM user WHERE username = %s", (username,))
    result = cursor.fetchone()
    if result and result[0] == password:
        session["authenticated"] = True
        session["username"] = username
        is_admin = result[1] == 1  # Check if is_admin column is 1
        return (
            gr.update(visible=False), gr.update(visible=False),
            gr.update(visible=not is_admin), gr.update(visible=is_admin),
            f"✅ Welcome {'Admin' if is_admin else username}!"
        )
    return (gr.update(), gr.update(), gr.update(), gr.update(), "❌ Invalid username or password.")

def register(new_user, new_pass, new_origin):
    cursor.execute("SELECT * FROM user WHERE username = %s", (new_user,))
    if cursor.fetchone():
        return "⚠️ Username already exists!"
    cursor.execute("INSERT INTO user (username, password, is_admin, origin) VALUES (%s, %s, 0, %s)", (new_user, new_pass, new_origin))
    db.commit()
    return "✅ Account created successfully!"

def logout():
    session["authenticated"] = False
    session["username"] = None
    return (
        gr.update(visible=True),
        gr.update(visible=False),
        gr.update(visible=False),
        gr.update(visible=False),
        "You have been logged out."
    )


### 2.3 Plot model comparision

The following code handles the plotting of the model comparision by utilizing the plotly to handle graphic interface and pandas to handles the datapoints .

In [12]:
import plotly.graph_objects as go
import pandas as pd
from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="distil-whisper/distil-small.en")
# Plotly chart for model comparison
def create_plotly_chart(selected_models, selected_metrics):
    data = {
        "Dataset": ["LibriSpeech"] * 5,
        "Model": [
            "Residual CNN + BiRNN",
            "Residual CNN + BiLSTM",
            "Residual CNN + BiGRU",
            "Transformer",
            "Whisper"
        ],
        "Accuracy (%)": [67.37, 88.79, 93.19, 98.32, 99.11],
        "WER (%)": [32.63, 11.21, 6.81, 1.68, 0.79],
        "CER (%)": [33.25, 13.27, 8.88, 3.45, 1.35],
        "Validation Loss (%)": [5.77, 3.54, 3.46, 1.92, 0.37]
    }
    df = pd.DataFrame(data)
    df_filtered = df[df["Model"].isin(selected_models)]

    fig = go.Figure()
    colors = ["#636EFA", "#EF553B", "#00CC96", "#AB63FA"]

    for i, metric in enumerate(selected_metrics):
        fig.add_trace(go.Bar(
            x=df_filtered["Model"],
            y=df_filtered[metric],
            name=metric,
            marker_color=colors[i % len(colors)]
        ))

    fig.update_layout(
        barmode='group',
        title="Model Comparison",
        xaxis_title="Model",
        yaxis_title="Percentage",
        legend_title="Metric",
        template="plotly_white"
    )
    return fig





Device set to use cpu


### 2.4 User management and audio translation 

the following cell handles all the operation done by the admin such as Creating, updating, viewing, deleting users. Also admin can promote other admins or demote them.To make sure no admin can delete all admins, a main admin called "admin" is not demotable.

In [13]:
from deep_translator import GoogleTranslator
from gtts import gTTS

def verify_text(text):
    return text.isalnum()
def read_user(username):
    cursor.execute("SELECT * FROM user where username = %s",(username, ))
    result =cursor.fetchall() 
    if not result :
        return "❌ Failed to find username ."
    return "\n".join([f"ID: {r[0]}, Username: {r[1]}, Admin: {r[3]}" for r in result])

def create_user(username, password,origin):
    if not username or not password or not origin:
        return "⚠️ Username and password cannot be empty."
    if not verify_text(password) or not verify_text(username) or not verify_text(origin):
        return "⚠️ password and login can only contain letters and digits ."
    try:
        cursor.execute("INSERT INTO user (username, password, is_admin, origin) VALUES (%s, %s, 0, %s)", (username, password, origin))
        db.commit()
        return "✅ User created successfully."
    except mysql.connector.errors.IntegrityError:
        return "⚠️ Username already exists."

def update_user_password(username, new_password):
    if not verify_text(new_password):
        return "⚠️ password can only contain letters and digits ."

    cursor.execute("UPDATE user SET password = %s WHERE username = %s", (new_password, username))
    db.commit()
    return "✅ Password updated successfully."

def add_admin(username):
    # Fetch the user's info from the user table
    cursor.execute("SELECT username, password, is_admin FROM user WHERE username = %s", (username,))
    result = cursor.fetchone()

    if not result:
        return "❌ User not found."

    username, password, is_admin = result
    if is_admin:
        return "⚠️ Failed to grant admin rights. "
    cursor.execute("""
        INSERT INTO admin (username, password, is_admin, is_main_admin)
        VALUES (%s, %s, %s, %s)
    """, (username, password, 1, 0))

    # Optionally update the user table to reflect admin status
    cursor.execute("UPDATE user SET is_admin = 1 WHERE username = %s", (username,))
    
    db.commit()
    return "✅ Admin rights granted and user added to admin table successfully."

def remove_admin(username):
    cursor.execute("SELECT is_main_admin FROM admin WHERE username = %s", (username,))
    result =cursor.fetchone() 
    if not result:
        return "❌ Failed to remove admin rights ."
    is_main_admin = result[0]
    if is_main_admin:
        return "❌ Failed to remove admin rights ."

    cursor.execute("DELETE FROM admin WHERE username = %s", (username,))
    
    cursor.execute("UPDATE user SET is_admin = 0 WHERE username = %s", (username,))

    db.commit()
    return "✅ Admin rights removed successfully."

def delete_user(username):
    cursor.execute("SELECT is_admin FROM user WHERE username = %s", (username,))
    row = cursor.fetchone()
    if not row:
        return "❌ Failed to find user ."
    is_admin=row[0]

    if is_admin :
        cursor.execute("SELECT is_main_admin FROM admin WHERE username = %s", (username, ))
        admin_row = cursor.fetchone()
        if admin_row[0]:
            return "❌ can't delete this admin ."
        else:
            cursor.execute("DELETE FROM admin WHERE username = %s", (username,))
    cursor.execute("DELETE FROM user WHERE username = %s", (username,))
    db.commit()
    return "✅ User deleted successfully."

def transcribe(audio, target_language="en"):
    if not audio:
        return "No audio input detected.", "", None

    result = asr(audio)
    original_text = result["text"]
    translated_text = ""
    translated_audio_path = None

    if target_language != "en":
        try:
            target_code = languages.get(target_language.lower())
            if target_code is None:
                raise ValueError(f"Language not supported: {target_language}")
            translated_text = GoogleTranslator(source='auto', target=target_code).translate(original_text)
            tts = gTTS(translated_text, lang=target_code)
            translated_audio_path = "translated_audio.mp3"
            tts.save(translated_audio_path)
        except Exception as e:
            translated_text = f"Translation error: {str(e)}"

    return original_text, translated_text, translated_audio_path




### 2.5 Main

In [14]:
def main():
    with gr.Blocks() as app:
        login_block = gr.Row(visible=True)
        register_block = gr.Row(visible=False)
        user_block = gr.Column(visible=False)
        admin_block = gr.Column(visible=False)

        with login_block:
            gr.Column(scale=1)
            with gr.Column(scale=2):
                gr.Markdown("### 🔐 Login to Speech Recognition App")
                login_user = gr.Text(label="Username")
                login_pass = gr.Text(label="Password", type="password")
                login_status = gr.Textbox(label="Status", interactive=False)
                login_btn = gr.Button("Login")
                go_to_register = gr.Button("Create Account")
            gr.Column(scale=1)

        with register_block:
            gr.Column(scale=1)
            with gr.Column(scale=2):
                gr.Markdown("### 📝 Create a New Account")
                reg_user = gr.Text(label="Username")
                reg_pass = gr.Text(label="Password")
                reg_origin = gr.Text(label="origin")
                reg_status = gr.Textbox(label="Status", interactive=False)
                register_btn = gr.Button("Register")
                back_to_login = gr.Button("Back to Login")
            gr.Column(scale=1)
      
        with user_block:
            gr.Markdown("### 🎤 Speech Recognition System")
            with gr.Tabs():
                with gr.TabItem("Transcribe Microphone"):
                    with gr.Row():
                        with gr.Column():
                            mic_input = gr.Audio(sources="microphone", type="filepath")
                            language_dropdown = gr.Dropdown(label="Target Language", choices=list(languages.keys()))
                            mic_transcribe_btn = gr.Button("Transcribe")
                        with gr.Column():
                            original_text = gr.Textbox(label="Original Transcription", lines=4)
                            translated_text = gr.Textbox(label="Translated Transcription", lines=4)
                            translated_audio = gr.Audio(label="Translated Audio", visible=True)
                    mic_transcribe_btn.click(fn=transcribe, inputs=[mic_input, language_dropdown],
                                                outputs=[original_text, translated_text, translated_audio])

                with gr.TabItem("Transcribe Audio File"):
                    with gr.Row():
                        with gr.Column():
                            file_input = gr.Audio(sources="upload", type="filepath")
                            file_language_dropdown = gr.Dropdown(label="Target Language", choices=list(languages.keys()))
                            file_transcribe_btn = gr.Button("Transcribe")
                        with gr.Column():
                            file_original_text = gr.Textbox(label="Original Transcription", lines=4)
                            file_translated_text = gr.Textbox(label="Translated Transcription", lines=4)
                            file_translated_audio = gr.Audio(label="Translated Audio", visible=True)
                    file_transcribe_btn.click(fn=transcribe, inputs=[file_input, file_language_dropdown],
                                                outputs=[file_original_text, file_translated_text, file_translated_audio])
            # Add Log Out Button
            logout_btn = gr.Button("Log Out")
            logout_btn.click(fn=logout, outputs=[login_block, register_block, user_block, admin_block, login_status])

        with admin_block:
            gr.Markdown("### 🎤 Speech Recognition System")
            with gr.Tabs():
                with gr.TabItem("Transcribe Microphone"):
                    with gr.Row():
                        with gr.Column():
                            mic_input = gr.Audio(sources="microphone", type="filepath")
                            language_dropdown = gr.Dropdown(label="Target Language", choices=list(languages.keys()))
                            mic_transcribe_btn = gr.Button("Transcribe")
                        with gr.Column():
                            original_text = gr.Textbox(label="Original Transcription", lines=4)
                            translated_text = gr.Textbox(label="Translated Transcription", lines=4)
                            translated_audio = gr.Audio(label="Translated Audio", visible=True)
                    mic_transcribe_btn.click(fn=transcribe, inputs=[mic_input, language_dropdown],
                                                outputs=[original_text, translated_text, translated_audio])

                with gr.TabItem("Transcribe Audio File"):
                    with gr.Row():
                        with gr.Column():
                            file_input = gr.Audio(sources="upload", type="filepath")
                            file_language_dropdown = gr.Dropdown(label="Target Language", choices=list(languages.keys()))
                            file_transcribe_btn = gr.Button("Transcribe")
                        with gr.Column():
                            file_original_text = gr.Textbox(label="Original Transcription", lines=4)
                            file_translated_text = gr.Textbox(label="Translated Transcription", lines=4)
                            file_translated_audio = gr.Audio(label="Translated Audio", visible=True)
                    file_transcribe_btn.click(fn=transcribe, inputs=[file_input, file_language_dropdown],
                                                outputs=[file_original_text, file_translated_text, file_translated_audio])
                with gr.TabItem("Model Comparison"):
                    gr.Markdown("## 📊 Compare ASR Models")
                    with gr.Row():
                        model_selector = gr.CheckboxGroup(label="Select models",
                            choices=["Residual CNN + BiRNN", "Residual CNN + BiLSTM", "Residual CNN + BiGRU", "Transformer", "Whisper"],
                            value=["Whisper", "Transformer"])
                        metric_selector = gr.CheckboxGroup(label="Select metrics",
                            choices=["Accuracy (%)", "WER (%)", "CER (%)", "Validation Loss (%)"],
                            value=["Accuracy (%)", "WER (%)"])
                    compare_btn = gr.Button("Compare Models")
                    plot_output = gr.Plot()
                    compare_btn.click(fn=create_plotly_chart, inputs=[model_selector, metric_selector], outputs=plot_output)
                with gr.TabItem("Manage Users"):
                    gr.Markdown("### 🛡️ Admin Panel - Manage Users")
                    with gr.Row():
                        create_username = gr.Text(label="New Username")
                        create_password = gr.Text(label="New Password")
                        create_origin = gr.Text(label="New Origin")
                        create_user_btn = gr.Button("➕ Create User")
                        # create_status = gr.Textbox(label="Status", interactive=False)
                    with gr.Row():
                        update_username = gr.Text(label="Username to Update")
                        update_new_password = gr.Text(label="New Password")
                        update_user_btn = gr.Button("✏️ Update Password")
                        # update_status = gr.Textbox(label="Status", interactive=False)
                    with gr.Row():
                        admin_username = gr.Text(label="Username to Promote/Demote")
                        with gr.Column():
                            promote_admin_btn = gr.Button("➕ Promote to Admin")
                            demote_admin_btn = gr.Button("➖ Demote from Admin")
                        # admin_action_status = gr.Textbox(label="Status", interactive=False)
                    with gr.Row():
                        delete_username = gr.Text(label="Username to Delete")
                        delete_user_btn = gr.Button("🗑️ Delete User")
                        # delete_status = gr.Textbox(label="Status", interactive=False)
                    with gr.Row():
                        view_user = gr.Text(label="Username")
                        view_user_btn = gr.Button("🔍 View user")

                    OUTOUT = gr.Textbox(label="Query", lines=4, interactive=False, elem_classes="big-textbox-output")

            # Click Events
            login_btn.click(fn=login, inputs=[login_user, login_pass],
                            outputs=[login_block, register_block, user_block, admin_block, login_status])
            go_to_register.click(fn=lambda: (gr.update(visible=False), gr.update(visible=True)),
                                outputs=[login_block, register_block])
            register_btn.click(fn=register, inputs=[reg_user, reg_pass], outputs=[reg_status])
            register_btn.click(fn=register, inputs=[reg_user, reg_pass, reg_origin], outputs=[reg_status])
            back_to_login.click(fn=lambda: (gr.update(visible=True), gr.update(visible=False)),
                                outputs=[login_block, register_block])
            view_user_btn.click(fn=read_user, inputs=[view_user], outputs=[OUTOUT])
            create_user_btn.click(fn=create_user, inputs=[create_username, create_password, create_origin], outputs=[OUTOUT])
            update_user_btn.click(fn=update_user_password, inputs=[update_username, update_new_password], outputs=[OUTOUT])
            delete_user_btn.click(fn=delete_user, inputs=[delete_username], outputs=[OUTOUT])
            promote_admin_btn.click(fn=add_admin, inputs=[admin_username], outputs=[OUTOUT])
            demote_admin_btn.click(fn=remove_admin, inputs=[admin_username], outputs=[OUTOUT])
            
            # Add Log Out Button
            logout_btn = gr.Button("Log Out")
            logout_btn.click(fn=logout, outputs=[login_block, register_block, user_block, admin_block, login_status])

    app.launch()

# Start the app
main()





* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Finally, we’ll build a Gradio-based web application offering two main features: transcribing audio from a microphone and transcribing uploaded audio files. The app uses a tabbed interface to toggle between these functions, is launched with sharing enabled, and listens on a port specified by an environment variable. This setup provides an interactive and user-friendly platform for speech-to-text transcription tasks.